Attention is not all you need

There are now transformers without attention!

Bugra Akyildiz

Jun 06, 2021

When you look at the sidewalk, do you want to see the normal noise? Some people thought this is a good idea:

TBSkyen @TBSkyen

(via discord) someone designed a goddamn sidewalk to feature GAUSSIAN BLUR

Articles

Google released a new library called TF Decision Forests. The library is intended for users that want to leverage Tensorflow framework, but decision trees in their applications. This was a significant event as Google now is pushing TF to be generic purpose library, not only for deep learning, but all of the machine learning applications. This makes sense as it can standardize and unify the APIs. It also makes the end users life much easier as an engineer can use the similar serving/deployment infrastructure as decision forests can leverage the rest of the TF ecosystem. It is available in here.

Facebook open sourced a multi-language dataset to enable researchers to do various evaluation for different languages. It covers 101 languages and very comprehensive.

Google wrote about data cascades, this follows an article that they have published which I covered previously in this newsletter. This is very important topic in ml system’s as the input to ml systems is data. Data problems actually cascade as the most of the systems ingest the data in a pipeline. Like most sequential systems, a component failure does not stay in the component in an isolated manner, but propagates all of the downstream systems. Data cascades highlight this issue and why it is much worse for ml systems.

Papers

ByT5: Towards a token-free future with pre-trained byte-to-byte models extends the idea of training model in a preprocessing-free manner towards byte to byte through UTF. The code is available in here and it has a number of pretrained models.
Barlow Twins: Self-Supervised Learning via Redundancy Reduction from Facebook proposes a Self Supervised Learning algorithm that tries to remove/reduce redundancy between images to learn various representations of images. The code is available in here.

An Attention Free Transformer from Apple proposes alternative methods to introduce “attention” to the transformer architecture. It introduces various ways including a convolution operation to replace “vanilla attention”. The architecture is both performing on par with the vanilla attention but also it is efficient. Attention may not be only what we need after all. A nice notebook that explains how to implement this in PyTorch is here.

HATECHECK: Functional Tests for Hate Speech Detection Models comes up with 29 different functional tests to evaluate hate speech detection models. For use cases that are in the domain of responsible AI and

Libraries

Hammurabi is a rule engine that allows you to write complex rules for a given domain to extract/parse information. It has a comprehensive documentation.
Haystack allows you to write search applications based on natural language leveraging deep learning. It is opinionated, but works out of the box library.

Videos

In Google I/O, Google published a new library called TF Decision trees as mentioned above, there was a short introduction video that you want to check out how the library works.

Classes

Berkeley has a Deep Reinforcement Learning class which covers a number of neural network based reinforcement learning techniques. They have also excellent deep learning class available in here if you want to brush up some basics of deep learning concepts.
There is a nice short class around reproducibility of deep learning in here. This class can be considered as a software engineering class for data scientists. If you ever ask to yourself, how we reproduce the results after modeling effort has been done, this class is for you.
Full Stack Deep Learning published a new class for 2021 spring. It includes a number of MLOps concepts and it covers most of the ML Life cycle.

MLOps Newsletter

Discussion about this post