Neural Based Decompilers
Articles
Facebook introduced a neural-based decompiler using GraphSage. The code is also available. By feeding the low-level AST representation of the code and high level language, they are encoding these relationships. After the training step is done, decompiler can work any type of high-level language by outputting low-level language representation.
What is more interesting to me is to learn transitive language learning. Would not be great to write the code in Python and based on the decompiler, first get the low level AST and then go to another higher level language like Swift/Objective C. By doing so, you can get the productivity of Python and in case of mobile deployments, you can still use similar/same logic through Swift in case someone wants to deploy it in mobile.
Google published an interesting article on textual entailment. In the paper, they show how one can use a TAPAS model architecture to do natural language inference to understand the relationship between entities to do open-domain question answering. The code is available in Tensorflow 2.0. They have also a good notebook accompanied with the code if you want to try it out. They published various different sizes of the models through knowledge distillation.
Demistifying GPT-3 is a nice article by Lambda Labs that talks about various technical details of GPT-3 in an approachable way. Especially, the “Role of Few Shots” section is a worth of read if you want to read about intuition of overall the model training.
FairTorch writes about how responsible AI principles can be applied with PyTorch. They open-sourced the package as well.
Eric published a post on message passing in Graph Convolutional Neural Networks. He has a number of implementations in Jax as well. Worth of your time if you are interested in message passing and how this can be enabled in Graph convolutions through Jax.
Distill published an amazing article with different types of visualizations on weights, activation functions and attributions. It talks about how to think about various components of neural networks and what they show. It has a number of small multiple visualizations which tries to dissect an image into filter like components. The other articles in this thread are also really good.
Google published a nice article which talks about influence on training samples for deep learning models. The influence method results in nice outlier detection methods in the training and can be used as a clustering technique. The accompanying paper is also very approachable.
Twitter demonstrates how powerful Temporal Graph Neural Networks are. Accompanied paper gives more detail on the memory components and how these actually capture various temporal information. I like separating the temporal information from the graph based embeddings and then use this memory to update in different times and compute various probabilities. There is also paper in case you want to understand more how they work in detail.
Papers
Tensor Train Compression for Deep Learning Recommendation for Model Embeddings is an interesting paper by Facebook which makes use of Tensor-Train Decomposition to compress embeddings. Tensor-Train Decomposition can be considered as low-rank approximation of embedding matrices. This paper shows a very good application of this compression technique in large scale recommendation space. The code is available. It is a drop-in placement for EmbeddingBag class in PyTorch.
Also, if you are interested in working in this domain in NYC area, please shoot me an email. I am actively hiring engineers that have background of machine learning and systems engineering!
A Visual Tour of Bias Mitigation Techniques for Word Representations is a mouthful workshop that teaches us methods to remove bias in the word representation for natural language models. Code of the workshop is also available.
Notebooks
This notebook shows how to visualize PyTorch JIT modules. It goes over how to talk the model for JIT’ed modules so that you can get a better picture on the JIT’ted model.
HuggingFace’s notebook shows how to use HuggingFace in tandem with Weights&Biases. It is showing both how capable nice APIs are with regards to machine learning in production use case. Also, it shows the full machine learning lifecycle.
Libraries
Multihop Dense Retrieval is a retrieval method for answering open-domain questions. Its paper is also available. It uses multi-stage ranking formulating the problem into a search problem(under the hood, beam search is used to retrieve candidates) and after that, it uses the first candidate to be the answer.
Lightning Flash is a new high-level library for PyTorch. They also wrote a nice blog post on why this is needed on top of Lightning library.
Seminars
Oxford has Theoretical Foundations of Graph Neural Networks seminar on 17th February which you can watch the seminar in here. Original announcement is in here.
Classes
MIT published a class on Deep Learning for Art, Aesthetics and Creativity. I particularly watched and enjoyed Painting with the Neurons of a GAN which you can watch in the following video:
Robust Principal Component Analysis is a form of PCA which is resistant to noise in the data. In the following video, you can watch and understand both intuition as well as theory:
NYU has a class for Responsible AI and it has a good slides in the course page.
Missing link in the ML infrastructure stack talks about machine learning in production and which component is missing in the full stack. The answer is the Evaluation Store and the seminar is worth of your time if you want to understand why evaluation is such a crucial part of machine learning in production and how Snorkel thinks about building an EvaluationStore.
If you are interested in working in this domain in Facebook(NYC), please feel free to drop me a line!
Until next week, onwards!