There was no newsletter last week, I wasy busy attending ICLR, sorry about that. I will create an ICLR specific newsletter that gives an overview of what I watched in the next weekend.
Without further ado, we have a pretty packed agenda, let’s dive in right away!
Articles
Facebook published a post on self-supervised learning, the new technique called DINO which beats many state of art models in a variety of tasks. Self-supervised models are notoriously known that they require no labels and Facebook doubles down on that self-supervised is the future especially on the image/video domain where labeling a large amount of images/videos accurately is a hard and complicated task.
Tanel wrote a good article on some of the pitfalls that you should pay attention when you are using PyTorch data loader with Numpy. It could be a very easy miss/problem if you are using data loader and makes a difference in your training loops.
Dropbox talks about how they built an image search for all of the photos that we upload! They do both vector embedding of the words, and do keyword extraction of the images to enable keyword based search.
HuggingFace wrote about how to scale and make a pre-trained BERT model in CPU efficient(reduce latency and increase throughput). It is an excellent read if you have a large scale model and you want to tune the performance characteristics of the model.
HuggingFace published and released a new library called Accelerate. This makes it easy to write PyTorch code in a hardware agnostic manner which makes it easy to write concise training loops.
SensiML writes about how to build a machine learning application using Tensorflow through micro Tensorflow.
Facebook wrote about dataset and its usage in a much more FAIR settings for object recognition with SEER model.
Google had an excellent “laboratory” that uses a phonetics based machine learning model that can break down english words based on how they are spelled. The model is also available. Be careful, it is a totally time sink!
Papers
Stanford MML research group publishes a paper for metadata normalization. The main idea is to reduce the feature distribution effect on the learning procedure(if one class is less represented) by extending the idea on the batch normalization.
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy says that pruning is good, but after pruning of neural networks, one should not be concerned only on the evaluation on test dataset, but rather other metrics such as out of vocabulary prediction tasks. The paper concludes that for the complicated inference tasks, the evaluation set of pruned models do not give an accurate reading on how pruning actually degraded the model accuracy. The code is also available in PyTorch.
How to train BERT on an academic budget talks about various ways to train large scale models in a much more modest hardware setting rather than in a fully fledged manner by employing a number of techniques(parameter optimization and masking) to reduce the hardware requirements.
Google and Berkeley published a paper Carbon Emissions and Large Scale Neural Network Training in which they are suggesting the carbon emissions and the environment impacts of large scale models should be also part of the benchmark that we should use to evaluate and compare various models. This is encouraging news and a sign that field is maturing and not just care about model accuracy but considers in a variety of different aspects of these models.
Everyone wants to do the model work, not the data work:Data Cascades in High-Stakes AI from Google talks about how important data in the machine learning pipelines and how data should be treated as first class citizen when we want to build systems. So-called data janitorial work is very important and this paper talks about first-hand experiences drawn from industry gives very good examples.
Libraries
Flyte is a library/framework to build workflows for data and machine learning jobs. If you are using Luigi or any other job dependency libraries to manage scheduled jobs, you may want to check it out.
Bentoml is a library that makes it easy to deploy and serve a variety of machine learning models from different libraries such as Tensorflow, PyTorch and Scikit-Learn.
FrugalML is a framework to leverage APIs that are provided by other library/cloud providers to accomplish tasks on the cheapest possible way(on a budget). If you want to learn more about how the system works, there is also an accompanying paper.
Seqio is a library from Google to make the sequence modeling and sequence based data processing easier. It is written for Tensorflow, but can also be used with PyTorch and Jax as well.
Videos
Stanford published a new class called Deep Learning with Graphs and all of the videos for the class was published as well.
Standford has published NLP class, it starts with a simple introduction on NLP and then builds on top of basic concepts to talk about modern techniques such as sequence models and state of art models in NLP domain.
Melanie Weber talks about manifold based learning for deep learning models in this video.
Stanford had a compression workshop in February and its videos just became available! The workshop is not only for deep learning or machine learning applications but rather a much wider variety of applications like databases, data compression and multimedia.
Carole Jean-Wu talks about how to build deep learning recommenders, it both provides high level and some details on how to build a ranking system through deep learning.