Learned Indices and Software 2.0
ML Research
Last week’s newsletter talked a lot about transformer explainability. This week, we have a new interesting paper that builds on top of LRP(Layer-wise Relevance Propagation) and its code is freely available. It results better results than LRP and the object highlights in terms of attention looks much more focused after tuning the parameters. Apart from its usefulness in terms of explainability, this approach would be also useful for image segmentation/detection along with the classification tasks.
I recently came across this paper on learned indices and how they are superior in terms of efficiency comparing to “traditional indices”. Its code is also available. Instead of building data structures prior to problem/data, one can learn what these data structures would look like after it sees the problem/data. I would love to see more applications in this area as we will see a large improvements in “Software 2.0” domain.
There are multiple reasons why there is a large gap:
Solutions to data problems should be according to the data distribution. However, most of the traditional software does not take into account of this simple premise. Even so, we do not have a single database that changes according to the data it stores.
The systems are not flexible and adaptable enough. As you see different types of customer behavior over time, the system itself does not change/adapt based on the behavior. One simple example is that, if you have a system for read path optimized(database lookups), when behavior changes and writes become higher in terms of percentage, the overall efficiency of the system decreases. However, since the system is built on this pre-conceived notion, it will not adapt and creates inefficiency.
Articles
Matthias Bal wrote about how attention mechanism can be considered as an implicit mixture of energy models. He motivates RBM(Restricted Boltzman Machines) and Hopfield networks to give an introduction how energy models are developed and concluding how one can use these to develop different types of attention mechanism. This reminded me this paper and there was this nice blog post.
TheGradient wrote a length post on Bert and Lottery ticket hypothesis. For BERT who do not know, this is a good introduction. A good introduction of Lottery Ticket Hypothesis is here. The post is mainly outlining the paper where it concludes that the hypothesis holds true for BERT. It further talks about magnitude based pruning to prove that indeed some good subnetworks exist and even after pruning, these subnetworks can reach the neural network’s full potential(if it is not pruned at all). However, for structured pruning, the hypothesis does not hold true as the impact even after retraining/tuning is not recoverable.
Recently, there was a paper that shows large models can “memorize” certain information in the dataset that it has trained on. This is problematic as this memorized information can be extracted/exploited if the training dataset is not clearly anonymized. BAIR writes a follow-up article on this paper.
If you want to find out how PyTorch JIT works, I recently came across these two excellent blog posts: 1, 2. Both of the blog posts are going in deep PyTorch JIT internals and gives a very good overview of how it works. If you are trying to product ionize PyTorch models, it might be very useful to understand how JIT works.
OpenAI published very interesting blog post this week where they trained GPT-3 like model for image generation. This builds on top of their previous work. Main premise of the model is to give free form text speech for the image that you want and the model outputs an image. Generation of images yields of very interesting and plausible results even for texts that does not sound “realistic”. They also published another blog post for their CLIP where they use CLIP mechanism to rank the images for Dall-E. CLIP is itself very interesting where it uses zero-shot learning similar to this paper to predict text on the images that the model has not seen previously. The model is also available in here.
Both generative models and zero-shot models are very interesting from search perspective. This not only makes the discovery of various images, but also actually “searches” among a number of good possible candidates. I especially likes even it can bring a number of good results for search queries like “an illustration of a baby daikon radish in a tutu walking a dog”.
For general search engines, instead of finding the exact document, what if the search engine itself generates the document and then brings to you?
For recommendation engines, rather than limiting on the entities that you have, e.g. if you have multiple videos for a topic, why not combine/mix them together to present this to the user if you do not have exactly the video that they are looking for?
Libraries
Dalex is a new explainability/interpretability library written in Python. It has a pretty good “fairness” module for responsible use cases as well.
Datasette is a library for publishing/consuming data easily.
Maia Chess is a chess bot that you can play with(pretty strong). Code is available.
Optuna supports a variety of libraries out of the box for hyper parameter optimization. I suggest checking out PyTorch example if you want to get your hands dirty.
Deformable DETR is an excellent end to end object detector. If you
Notebooks
Reformer notebook is great if you want to train a reformer in PyTorch.
Residual Networks notebook is excellent if you want to understand how residual networks can be implemented in Python.