Video Encoding through machine learning

BigBird for sparse attention in transformers and Ease.ml

Apr 17, 2021

Facebook wrote about how they do video encoding through machine learning. When you think about video encoding, it is actually more than just pure video quality and this post is an excellent way to capture all of the other interaction points and what does it mean to provide a good video downloading/watching experience to the customer.

Pinterest wrote about how they used AutoML in their Ad serving system. It explains different layers of the model architecture and what they do. If you want to understand a good recommender system that is built on top of deep-wide neural network, it is a good read.

Google wrote about how to build a longer transform mechanism, they developed a mechanism called BigBird that uses sparse attention to build efficient transformers for longer input sequence. It has a variety of visualizations that help to build intuition as well like the following in the blog post:

Projects

EPFL has a new project called Ease.ml. This project aims to provide a comprehensive solution for a model lifecycle by building various components and providing a loosely coupled mechanism.

Ease.ml/ci provides a CI/CD(Continuous Integration/Deployment) mechanism in this large framework.

Ease.ml/AutoML provides an automated way of building deep learning neural networks without any manual intervention.

Market tries to estimate if the data given as input to the model is sufficient to build a model with a strong model accuracy.

Datamagic allows to do data augmentation for a variety of NLP tasks.

Zip.ml tries to solve database for efficient training purposes. There is also a white paper that explains this project much more in detail.

CPClean is aiming to remove noise in the datasets and providing data cleansing functionality.

The paper goes in different areas much more in detail.

Papers

Transformer visualization via dictionary learning: contextualized embedding as a linear superposition of transformer factors provides a visualization mechanism to understand what is happening in the transformers. If you are familiar with dictionary learning, this paper tries to explain the learning of transformer in a dictionary learning setting that produces visually plausible filters for image classification tasks similar to dictionary learning itself.
Pervasive Label Errors in Test Sets Destabilize Machine Learning Benchmarks shows a number of labeling errors that might produce incorrect results in a variety of machine learning benchmarks and undermine the validity of some of the techniques reported previously.
A Distributional Approach to Controlled Text Generation proposes a method to specify “point wise” or “distributional“ constraints in the large language model training. The code is also available in here. The main problem that this paper is tackling is to remove certain biases that happen in large scale corpus. The presentation of the paper is also available in here.
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation provides a neural encoder that operates on the character level which removes the need of an explicit tokenizer in order to process the text.
Perceiver: General Perception with Iterative Attention tries to remove some of the inductive biases that various deep learning and transformer models bring to the table. The paper is arguing that transformer architectures bring a lot of assumptions based on the input modality which prevents us to build deep learning architectures that can work in a “modality agnostic” manner. Even though paper itself makes certain assumptions like positional encodings of the input, it is more agnostic than other transformer mechanisms for a variety of input datasets.
Language-Agnostic Representation Learning of Source Code from Structure and Context proposes an attention based transformer to learn a representation from code no matter what the language of the source code is. The code that accompanies the paper is here. The presentation for the paper is here. ICLR 2021 poster is also here.
ICLR 2021 chose a number of outstanding papers. If you are attending to the conference, drop me a note, too!

Classes

Harvard has an advanced topics for machine learning class. If you are interested in theory, this class would be an excellent fit. If you are looking for motivation, there is a blog post as well for motivation. I have not watched all of the lectures, but I enjoyed NLP class by Sasha Rush.

Libraries

Torch dreams is a library for interpretable research and producing visually plausible generated images.
NaturalProofs is a library to prove theorems to extracts mathematical references and measure how sound the proof is.

Videos

Gedas Bertasius presented his work on Video Understanding through large language models. In here, Gedas presents an attention based modeling technique that incorporate short-term and long term dependencies of the natural language processing.
Booking team talks about personalization work that they do in a variety of aspects: deep learning modeling work, uplift work, multi armed bandits and user perception on booking.com in this talk. It is very long tutorial like talk that you can learn a lot about the domain(travel/hospitality) as well as how they think about providing in a travel experience to different types of personalities. They have a nice blog post that outlines this talk in here.

Tutorials

Sequence Aware Recommender Systems is a tutorial from 2018 that goes through a number of different recommender systems with an emphasis to sequences. The code is also available in here. It is part of a book section, too. The presentation is and paper is also available that accompanies with this tutorial.
IR From Bag-of-words to BERT and Beyond through Practical Experiments is a tutorial for information retrieval use cases that span from bag of words to the recent developments such as BERT and Transformer architectures. It features Open NIR. Another excellent tutorial for ranking is also available in here.

MLOps Newsletter

Discussion about this post