BERT Busters

Human Interface Guideline, Recognizing people from images, SWE skills for researchers

Sep 05, 2021

BERT Busters: Outlier Dimensions that Disrupt Transformers is an article that investigates Transformers and their robustness to pruning.

A lot of papers recently proved that Transformers are robust to pruning.
However, this article argues that for certain removal of the parameters in the transformer, there is a large probability that predictive accuracy decreases.

Apple has a Human Interface guideline on how you can use ML in the iOS applications.
- They have a number of guidances on how to incorporate ML into the applications. In inputs, they explicitly call out different ways to get data into the application and how to think about feedback loops in the application. How to do calibration, corrections through inputs.
- For outputs, they call out specifically how to do proper attribution, confidence and build in such a way that it accommodates limitations of the model.

Lj Miranda wrote about his experience on how to improve software engineering skills as a researcher. It is very relevant to MLOps as MLOps involves in mostly putting the models into production environment and generally a good advice to navigate this space for a researcher is to just simply learn it. This prevents throwing the model over the wall, but also improves the process as the researcher can make better decisions through understanding of the overall system(how to create a feedback mechanism, how to collect data and so forth).

Recognizing people photos is from Apple on how to do recognition of the people’s faces in the photos. In the above figure, they first show how they are detecting the images and then output the face’s embedding model to find other images in the cluster.

The above figure shows the neural network architecture to produce face embedding. It uses AirFace like deep learning architecture which is well optimized for mobile devices and show good predictive accuracy. One of the main challenges models like this is to be able to run it efficiently in mobile devices without degrading model accuracy.

Papers

How Cute is Pikachu? Gathering and Ranking Pokemon Properties from ´ Data with Pokemon Word Embeddings is rather hilarious paper on what it tries to solve. It uses pre-trained models to extract various properties of Pokemon.
- They use pre-trained word embeddings models to define and extract properties of various pokemons.
- They found the pre-trained models namely fastText and word2vec on a large corpus and then fine-tuned in a specific dataset does not work well for these types of tasks.
- They found specific models trained on specific datasets is much better suited rather than using pre-trained models for their evaluation dataset.
Knowledge Graphs 2021: A Data Odyssey talks about what has been accomplished/learned and what remains to be done in the knowledge graphs.
- Input data quality is the key. One needs to think through the data sources and ensure that data quality is a top priority before constructing the knowledge graph. Old adage of “Garbage in, garbage out” applies specifically to knowledge graphs.
- Knowledge graph creation is not fully solved end to end pipeline that can just leverage machine learning.
- Precision versus recall tradeoff needs to be explicit and should be put a lot of thought and consideration.

Libraries

Rubrix is a library that allows you to iterate and improve the datasets that you use in your machine learning flow. You can also use this library to collect the predictions that are outputted by the models as well.

Bagua is a distributed training utility application and it has excellent tutorials in here. It has a good benchmark page as well in here.
Mistral is a utility library built for large scale language modeling from Stanford. It combines/orchestrates some of the other libraries to make the large scale modeling easier and faster.

Notebooks

Tensorflow has a good notebook shows how TF can use a mixed precision through TPU imagenet.
This notebook shows how to train classifier through PyTorch on a pre-trained model(resnet) in the image domain.
If you are using PyTorch, the following tips might come in handy:

Datasets

Common Objects in 3D is a dataset that has a lot of common spaces in 3D.
Shift15M is a dataset that you can use to test how good your ML model is against the shift in the data distribution between training and test datasets.

MLOps Newsletter

Discussion about this post