Neural Network Quantization and TRILL

Staircase Transformers, Graph Substructure Networks

Bugra Akyildiz

Jun 18, 2021

Parl.ai wrote about 2 new approaches that can make the Transformers more efficient in this post. The first approach where they introduce Hash Layers in the context of mixture of experts where a word is assigned to a fixed expert through hash layer. The second approach is staircase/ladder attention which can use the same layer multiple times while it can move in time by keeping an internal state. They show that both of these approaches introduce a lot of efficiencies in the training of the models.
- Hash Layers for Large Sparse Models paper is here.
- Staircase Attention for Recurrent Processing of Sequences is here.
- The code is here.

Twitter wrote about designing local and computational efficient graph neural networks. They are proposing a new architecture called Graph Substructure Networks(GSN)s and GSNs can do substructure counting and can use more powerful substructures which gives more powerful in terms of information representation than simple message passing. The paper has much more details on how GSNs can be constructed and how it works under the hood.
Michael Galkin wrote a great post around all of the knowledge papers in ICLR this year.
Aran Komatsuzaki wrote a post announcing that they released 6B Jax-Based Transformer which is also available in here.

This post reviews a number of papers published in 2020 for neural network quantization. It covers from codebook creation for CNNs(figure above) to embedding quantization.

Google earlier open-sourced TRILL and they are following with another post on FRILL which is a distilled model from TRILL for device specifically. The paper talks in detail around the distillation approach, how they adopted QAT(Quantization Aware Training) and a number of other details. They open-sourced the model in TF and there is also a pre-trained model available in TFHub.

Papers

A graph placement methodology for fast chip design talks about how to build a reinforcement methods through deconvnets to design various chips and use these to allow heterogeneous computation in a TensorFlow graph.

Facebook wrote about how they can brush certain pieces of text into images through self-supervised methods with style that has been written in the image. This differs from just replacing the text in the image to another text, it actually emulates and “blends” right back to the original image.

Mike Schroepfer @schrep

Our researchers built an AI model that can learn to edit the text in any image by training on just a single word. This has huge potential in augmented reality — imagine your AR glasses doing real-time translation of the world, from street signs to handwritten notes

As mentioned above, an excellent use case would be AR/VR domain where this method can replace all of the input and then translate into the language that the user understands. The paper is here and the data is also open-sourced. However, due to concerns on the fake image creation, code was not open-sourced.

Tabular Data: Deep Learning is Not All You Need talks about how/why tree ensemble methods(specifically XGBoost) should be preferred over deep learning methods for various classification/regression tasks in AutoML domain.

Libraries

Pystiche is a neural network library for neural style, written in PyTorch.
Text-Brewer is a model distillation toolkit which mainly targets NLP models.
ECIR 2021 Tutorial is not a library per se, but is a GitHub repo that has a number of tutorials in e-commerce search space.
Awesome Model Quantization has a number of libraries and code that are for model quantization.
SmartSim is a orchestrator infrastructure library for deploying machine learning applications(either through PyTorch or Tensorflow) in HPC(High Performance Computing). There is a seminar and a paper on the system design and how this library is being used for research purposes.
Distrax is a probability library written in Jax which covers most of the functionality of Tensorflow Probability library provides.

Classes

NYU Data Science published a new introductory class on the mathematical concepts of widely used data science/machine learning techniques in here.
- They cover a variety of techniques like sparse regression, Fourier/frequency domains, wiener filters and PCA. They have both slides and videos are available. All of the videos are available as a list as well in here.
Tensorflow announced a number of classes in this post. This is in collaboration with deeplearning.ai and it is mainly targeting an audience that want to learn about MLOps.

Videos

MICES has a playlist that targets e-commerce search. If you are working either in e-commerce and/or search domain, it is worth of your time.

MLSys Seminar from Stanford released another good video:

In this seminar, Karan goes through some of the characteristics of Malleable ML Systems(flexible, easy to incorporate changes). It is pretty interesting session, one example that he gives around how they do labeling the countries(German -> German soccer team) with their equivalents of the team names with the full name replacement through regex is nifty labeling trick.

Conceptual Understanding of Deep Learning Workshop tries to shed some lights on motivations of deep learning and how it might be connected to biology and learning itself, it has a quite a few interesting talks if you are interested in:

MLOps Newsletter

Discussion about this post