Graph Neural Networks, Neural Machine Translation

Pinteres't Analytics Stack, CleanLab, Ploomber and FlyingSquid

Bugra Akyildiz

Oct 03, 2021

Programmer Humor @PR0GRAMMERHUM0R

A machine learning algorithm walked into a bar. reddit.com/r/programmerhu…

This is how most of the machine learning algorithms work due to inductive reasoning.

Articles

Pinterest wrote about their analytics stack in 3 blog posts(1, 2, 3) which is based on Druid.

Distill wrote about Graph Neural Networks(GNNs) in this blog post. It first introduces Graph as a concept and then talks about the Graph Neural Networks and then introduces attention mechanism.

The second article goes even more detail around inductive biases that are in the GNNs. It further details on how to think about a number of linear algebra concepts through Graph based system.

OpenAI’s future plans is written by Lesswrong. Emil Wallner gave a good TLDR in the below:

Emil Wallner @EmilWallner

Takeaways: - GPT-4 is coming, but improved Codex is the focus - The next Codex release will be significant - GPT-4 will have a similar model size but better trained - DALL-E will become public - They bet that multi-modal models will beat pure text models on text generation

Rivers Have Wings @RiversHaveWings

Someone has posted some notes from Sam Altman (OpenAI)'s online Q&A yesterday (I missed it): https://t.co/dKnCbFerkF

One thing I like about OpenAI is to double down on what works rather than other research companies that spread too thin. This helps them to focus and nail down one problem very well. This is contrast to some other research companies that go after every shiny object and produce okay results.

- Lena Volta wrote about NMT(Natural Machine Translation) in this blog post. She talks about the stages that NMT model goes through:

target-side language modeling,
learning how to use source and approaching word-by-word translation,
refining translations, visible by increasingly complex reorderings, but almost invisible to standard metrics (e.g. BLEU).

FastForward Labs wrote about concept drift in their August article. They first define what concept drift is and then propose 4 different methods to detect concept drift:

Statistical test for change in feature space
Statistical test for change in response variable
Statistical test for change in margin density of response variable
Detect change in margin density of response distribution using a learned threshold

Christopher Olah wrote a blog post on the relationships between biology and deep learning.

Libraries

- Cleanlab is a library that allows you to find mislabeled data in the noisy datasets to improve the training dataset. It works with Numpy arrays and because of that you can plug into your workflow no matter which library that you are using to build machine learning models(scikit-learn, Tensorflow, PyTorch, etc)

- Human Learn is a library for you to build manual ml models. For certain types of datasets, do not you want to draw what the classifier should look like and then the model outputs itself. Yes, ml is better suited for these tasks, but it might be also good to create a crude baseline to start with as well. This library allows you to draw what the classifier should look like.

Ploomber is a library that converts notebooks into modular Python scripts that you can use to build modular and composable pipelines.

FlyingSquid is a library that allows you to iterate and build models that are using noisy and multiple source datasets.

Videos

- Communicating Uncertainty in Machine Learning Systems is a good introductory video on thinking and communicating probabilistic systems and how to communicate these systems into different audiences:

DeepMind releases a number of classes/lectures on Reinforcement Learning along with the lecture materials in here. They are mainly tailored towards people that do not know anything about RL.

MLOps Newsletter

Discussion about this post