Pinterest improves their Closeup Recommendation System through foundational changes

Knowledge distillation made LLM deployment easy, AI Consciousness

Oct 07, 2023

this graph shows the future training data pipeline. Features and labels are joined together and tabularized. Later the data will be consumed by the training job through Ray Dataloader.

I wrote about the Pinterest’s previous closeup ranker in June 23’ in this post. Pinterest published another post on some of the improvements that they have in the following post.

They talk about some of the infrastructure and foundation elements to improve the recommendations:

Hybrid data logging: Pinterest uses a hybrid approach to data logging for the Closeup Recommendation system.This approach reduces the amount of data that needs to be logged by only logging data for a small percentage of impressions. This helps to improve efficiency and reduce costs.
Sampling: Pinterest uses sampling to train the Closeup Recommendation model on a smaller dataset. This can improve efficiency and reduce bias. For example, if the model is trained on a dataset that is biased towards certain items, sampling can be used to ensure that all items are represented equally in the training set.
Model refreshing: Pinterest uses a model refreshing framework to ensure that the Closeup Recommendation model is always up-to-date with the latest data. This framework retrains the model on a regular basis using fresh data. This helps to ensure that the model is able to adapt to changes in user behavior and trends.

This graph outlines the feature logging flow. The backend populates features to caches and the frontend calls logging service to query the feature and log the request.

The Closeup Ranking model is refreshed weekly. This is done because refreshing the model with new data can improve its performance. The model is refreshed using an automated process called the Auto-Retraining Framework (ARF). ARF trains and validates new models on a specified cadence. Once a new model is validated, it is deployed to production.

ARF is a critical component of the Closeup Recommendation system. It helps to ensure that the model is always up-to-date with the latest data and is able to adapt to changes in user behavior and trends.

ARF helps in the following ways:

Improved performance: Refreshing the model with new data can improve its performance. This is because new data can help the model to learn new patterns and trends.
Reduced bias: Refreshing the model with new data can help to reduce bias. This is because new data can help the model to learn about a wider range of users and their preferences.
Improved user experience: By improving the performance and reducing bias of the Closeup Ranking model, ARF can help to improve the user experience. This is because users are more likely to be satisfied with the recommendations that they receive if they are relevant and unbiased.

Knowledge Distillation from larger models into smaller models is one way to optimize model in terms of inference cost without degrading the model accuracy much and through that, one can get a good tradeoff between model cost and accuracy especially when the model has a lot of redundancies or dead neurons that can be compressed into a smaller number of weights.

Google published a post that adopts this idea for LLM and claim not only the models can be smaller, but it can be as accurate as large models and it can make the model to be more interpretable since distilling knowledge into a smaller model actually itself can give details about the model and its learning process.

Advantages of this approach:

Smaller models: Distilling step-by-step can be used to train smaller language models that are just as accurate as larger models. This could make it possible to deploy language models on devices that have limited resources, such as smartphones and IoT devices.
More efficient training: Distilling step-by-step can be used to train language models more efficiently. This could reduce the cost and time required to train language models.
More interpretable models: Distilling step-by-step can be used to train language models that are more interpretable. This means that it is easier to understand why the model makes the predictions that it does.

Google researchers evaluated their method on a variety of tasks, including question answering, summarization, and natural language inference. They found that their method outperformed few-shot prompted LLMs on all of the tasks they evaluated, and in some cases, even outperformed LLMs that were fine-tuned on much larger datasets.

They also found that their method was very data-efficient. For example, on the question answering task, their method was able to achieve an accuracy of 93.1% using only 10,000 training examples, while a fine-tuned LLM required 100,000 training examples to achieve the same accuracy.

Gradient wrote a piece around AI Consciousness, the authors explored the following question and try to answer somehow:

What is consciousness? There is no single agreed-upon definition of consciousness. Some experts define it as the ability to experience subjective sensations and feelings. Others define it as the ability to be aware of oneself and one's surroundings. Still others define it as the ability to have thoughts and feelings.
Can AI be conscious? If we don't know what consciousness is, it's hard to say whether or not AI can be conscious. However, some experts believe that it is theoretically possible for AI to be conscious. They argue that consciousness is not dependent on any specific physical substrate, such as the human brain.
The hard problem of consciousness: One of the biggest challenges in understanding consciousness is the "hard problem." This problem is concerned with how subjective experiences arise from physical processes. The hard problem is difficult to solve because it requires us to understand the nature of consciousness itself.
The problem of other minds: Another challenge in understanding consciousness is the "problem of other minds." This problem is concerned with how we know that other people are conscious. We can't directly experience other people's consciousness, so we have to rely on indirect evidence, such as their behavior and their reports of their own experiences.
The moral status of AI: If AI can be conscious, then it raises the question of its moral status. If AI is conscious, then we may have moral obligations to it, such as the obligation to avoid causing it pain.

LinkedIn wrote a post on how they are using the embeddings to improve the matching for job descriptions with candidates.

LinkedIn uses embedding based retrieval (EBR) to retrieve items that are semantically related to a search query. EBR works by first creating an embedding for each item in the search index. Once the embeddings have been created, EBR can be used to retrieve items that are similar to a given query by finding the items whose embeddings are closest to the query embedding. Note that in order to do this to work, embedding should be a vector representation of the item that captures its key features.

LinkedIn uses EBR in a number of ways to improve search results. For example, EBR is used to power the "Jobs You Might Be Interested In" feature, which recommends jobs to members based on their profile and activity. EBR is also used to rank the results of job searches.

In addition to improving search results, EBR is also used to improve the relevance of content that is shown to members in their feed. That is, EBR is used to recommend articles and posts that are likely to be of interest to a member based on their profile and activity.

Libraries/Models

The language model phi-1.5 is a Transformer with 1.3 billion parameters. It was trained using the same data sources as phi-1, augmented with a new data source that consists of various NLP synthetic texts. When assessed against benchmarks testing common sense, language understanding, and logical reasoning, phi-1.5 demonstrates a nearly state-of-the-art performance among models with less than 10 billion parameters.

Vizro is a toolkit for creating modular data visualization applications.

Rapidly self-serve the assembly of customized dashboards in minutes - without the need for advanced coding or design experience - to create flexible and scalable, Python enabled data visualization applications

Use a few lines of simple configuration to create complex dashboards, which are automatically assembled utilizing libraries such as Plotly and Dash, with inbuilt coding and design best practices

Define high level categories within the configuration, including:

components: create charts, tables, input/output interfaces, and more
controls: create filters, parameter inputs, and custom action controllers
pages, layouts and navigation: create multiple pages, with customizable layouts and flexible navigation across them
actions and interactions: create interactions between charts, and use pre-defined or customized actions (such as exporting)

Configuration can be written in multiple formats including Pydantic models, JSON, YAML or Python dictionaries for added flexibility of implementation

Optional high-code extensions allow almost infinite customization in a modular way, combining the best of low-code and high-code - for flexible and scalable, Python enabled data visualization applications

(Visit the "Why Vizro" section to see a more detailed explanation of Vizro use cases)

Perspective is an interactive analytics and data visualization component, which is especially well-suited for large and/or streaming datasets. Use it to create user-configurable reports, dashboards, notebooks and applications, then deploy stand-alone in the browser, or in concert with Python and/or Jupyterlab.

Features:

A fast, memory efficient streaming query engine, written in C++ and compiled for both WebAssembly and Python, with read/write/streaming for Apache Arrow, and a high-performance columnar expression language based on ExprTK.
A framework-agnostic User Interface packaged as a Custom Element, powered either in-browser via WebAssembly or virtually via WebSocket server (Python/Node).
A JupyterLab widget and Python client library, for interactive data analysis in a notebook, as well as scalable production Voila applications.

Magentic is a library that allows you to easily integrate Large Language Models into your Python code. Simply use the @prompt decorator to create functions that return structured output from the LLM. Mix LLM queries and function calling with regular Python code to create complex logic.

It is:

Compact: Query LLMs without duplicating boilerplate code.
Atomic: Prompts are functions that can be individually tested and reasoned about.
Transparent: Create "chains" using regular Python code. Define all of your own prompts.
Compatible: Use @prompt functions as normal functions, including with decorators like @lru_cache.
Type Annotated: Works with linters and IDEs.

MLOps Newsletter

Discussion about this post

MLOps Newsletter

Pinterest improves their Closeup Recommendation System through foundational changes

Knowledge distillation made LLM deployment easy, AI Consciousness

Articles

Libraries/Models

Discussion about this post