Graph Neural Networks, Neural Machine Translation
Pinteres't Analytics Stack, CleanLab, Ploomber and FlyingSquid
This is how most of the machine learning algorithms work due to inductive reasoning.
Articles
Pinterest wrote about their analytics stack in 3 blog posts(1, 2, 3) which is based on Druid.
Distill wrote about Graph Neural Networks(GNNs) in this blog post. It first introduces Graph as a concept and then talks about the Graph Neural Networks and then introduces attention mechanism.
The second article goes even more detail around inductive biases that are in the GNNs. It further details on how to think about a number of linear algebra concepts through Graph based system.
OpenAI’s future plans is written by Lesswrong. Emil Wallner gave a good TLDR in the below:
One thing I like about OpenAI is to double down on what works rather than other research companies that spread too thin. This helps them to focus and nail down one problem very well. This is contrast to some other research companies that go after every shiny object and produce okay results.
- Lena Volta wrote about NMT(Natural Machine Translation) in this blog post. She talks about the stages that NMT model goes through:
target-side language modeling,
learning how to use source and approaching word-by-word translation,
refining translations, visible by increasingly complex reorderings, but almost invisible to standard metrics (e.g. BLEU).
FastForward Labs wrote about concept drift in their August article. They first define what concept drift is and then propose 4 different methods to detect concept drift:
Statistical test for change in feature space
Statistical test for change in response variable
Statistical test for change in margin density of response variable
Detect change in margin density of response distribution using a learned threshold
Christopher Olah wrote a blog post on the relationships between biology and deep learning.
Libraries
- Cleanlab is a library that allows you to find mislabeled data in the noisy datasets to improve the training dataset. It works with Numpy arrays and because of that you can plug into your workflow no matter which library that you are using to build machine learning models(scikit-learn, Tensorflow, PyTorch, etc)
- Human Learn is a library for you to build manual ml models. For certain types of datasets, do not you want to draw what the classifier should look like and then the model outputs itself. Yes, ml is better suited for these tasks, but it might be also good to create a crude baseline to start with as well. This library allows you to draw what the classifier should look like.
Ploomber is a library that converts notebooks into modular Python scripts that you can use to build modular and composable pipelines.
FlyingSquid is a library that allows you to iterate and build models that are using noisy and multiple source datasets.
Videos
- Communicating Uncertainty in Machine Learning Systems is a good introductory video on thinking and communicating probabilistic systems and how to communicate these systems into different audiences:
DeepMind releases a number of classes/lectures on Reinforcement Learning along with the lecture materials in here. They are mainly tailored towards people that do not know anything about RL.