Can language models can understand language?

What does understand mean anyway

Jan 24, 2021

Sebastian Ruder writes about most impactful research areas and ideas in 2020. He has 10 different research areas that he looked into: model size, retrieval augmentation, few shot learning, contrastive learning, evaluation that includes fairness, transparency and not just accuracy, bias in large language models and its impacts, multilinguality, transformers for images, machine learning for science and reinforcement learning. Very well written article with a large number of references to a variety of research papers.
Pinterest wrote about how they built a near real-time embedding service system to enable vector search. They built the article previously what they wrote about near real-time search; Manas. Both articles are great if you are interested in learning about building large scale search systems. They use Navigable Small Worlds as the main data structure to support both search mechanisms which enables them to index and retrieve embedding into the search index.
Language models are knowledge graphs is explaining how language models are related to knowledge graphs through this research paper. This is a very interesting article and paper as it tries to create relations between knowledge graphs which encode information in a structured manner vs large language models.
I mentioned Jay Alammar’s post in the library section last week’s newsletter(paid members only) and it generated good amount of discussion around text generation and how visualization can be an important tool for these models around responsibility and fairness. I want to expand on this a bit more:

I think Software 2.0 will build their own solutions on how to “unit test” various different types of models and its stack overall. But that is not going to happen over night. As we are understanding more “black box” neural networks, we will start thinking about what we want to accomplish with Software 2.0. Is only objective on this would be the accuracy of the model? Or do we want to also create other types of measures such as fairness, energy efficiency and responsibility?

Models can impact thousands or millions of people and we need to think hard on how to test and evaluate these systems. The basic principles of software engineering may not easily transferable to the Software 2.0. Test and evaluations systems will come later, but it will eventually. In software development, testing also came later, but as people understood how important unit test, regression as well as component testing to uncover bugs that might cost more money/human lives, companies start adopting that.

This is to say that visualization, understanding of these models will be an important area on how to think about testing/evaluating these models. We need to break down the models to be able to test it properly. We cannot do a component testing for the models in the way that we do it for software. We cannot do property testing(excellent article by the way) either. The reason is very simple, if we were to be able to emulate all possible inputs and outputs to the system, we would not use Software 2.0 systems in the first place! Software 1.0 is still much simpler, much more easily controlled and easier to develop and test.

How do we solve it? We need to better understand small units and components that actually compose the bigger and larger neural network. This can be accomplished through understanding and reasoning about the activation functions and understanding the neural network units in a granular level and detail. That is why Ecco and similar libraries will not only give us excellent visualization tools, but also will pave the way to provide better testing of the models as we understand their behavior in detail.

Building the data flywheel to prevent data shift explores how to think about data drift in production. It talks about reasons of data shift such as selection bias and then propose a variety of solutions on how to respond and prevent these such as statistical distance of the observed instances and novelty detection.
Salesforce wrote about how they built a data lake to power their machine learning platform. Their data platform is built using AWS and some of the open source projects like Apache Iceberg.
Christopher Potts wrote about if it is possible for language models to reach “language understanding”. The main premise of the post is discussing couple of directions on what an understanding means and if it can be achieved through language models. He discusses various points through referring to the paper which suggests that language models cannot “understand” the languages that humans do.
Mercado writes about how they build feature store in order to decrease time to market for features that they are developing.

Classes

Stanford has a new class for Machine Learning Systems Design. They cover all of the production use cases for machine learning and how to design systems(from data, data processing and machine learning components). The slides are also available, might keep an eye on the following weeks.
UW has a nice Systems for ML class as well. I liked the TVM lecture and especially Model Serving slides.

Libraries

Softpool is a new pooling technique that can be used in CNNs. The main idea is to incorporate the rate of the pixel occurrence in the average pooling mechanism with an exponential. The code is almost a single line in PyTorch if you want to take a close look. The paper is published and in spirit, it is similar to LIP(Local Importance Based Pooling).
HLS4ML is a machine learning library for FPGA(Field Programmable Gate Array)s. It is based on Keras and has a great tutorial.

Notebooks

Hyper LSTM is giving an excellent overview of what HyperLSTM is and how it can be implemented in PyTorch.
Switch Transformer gives an overview of how it can be implemented in PyTorch.

MLOps Newsletter

Discussion about this post