Pinterest's Closeup Recommendations Engine
Airbnb's Data Management Platform: Metis, vLLM is a fast and easy-to-use library for LLM inference and serving
Articles
Airbnb's data management platform has evolved over the past 6 years. The platform started with Dataportal, which aimed to "democratize data" at Airbnb by enabling data users to find trusted data. As data reliability and compliance regulations became important, Airbnb adopted Apache Atlas as its data lineage solution. This led to the development of Metis, which is a platform that enables anyone at Airbnb to search, discover, consume, and manage all the data and metadata in the company's offline warehouse. Metis has been serving critical roles across data compliance, data reliability, and data quality initiatives.
Airbnb published a blog post in building Metis, such as the need to support a large number of data users, the need to ensure data quality, and the need to comply with regulatory requirements.
Pinterest wrote a blog post on closeup recommendations. And you might ask what is a closeup recommendations? Closeup recommendations (aka Related Pins) is typically a feed of recommended content (primarily Pins) that they serve on any pin closeup. Closeup recommendations generate the largest amount of impressions among all recommendation surfaces at Pinterest and are uniquely critical for their users’ inspiration-to-realization journey.
Their model architecture includes an additional summarization layer in the form of MLP after the ingestion of various features(after normalization/preprocessing).
Summarization layer: groups features that are similar together (i.e. user annotations from different sources such as search queries, board, etc.) into a single feature by passing through a MLP, representing each feature group in a lower dimensional latent space
Transformer mixer: performs self-attention over groups of features
MMoE: combines the results of independent “experts” to produce predictions for each task
After the ranking layer predictions, they employ a blending layer where the order of Pin recommendations is determined. They introduce another ML model, which builds upon the multi-objective optimization framework and leverages the user and query Pin features to make real-time decisions on what to prioritize andbalance between the organic content, which optimizes for organic engagements, and shopping content, which optimizes for shopping conversion.
For future work:
They are thinking to:
Adopting a richer and longer real time user sequence signal
Improving GPU model serving performance
Model architecture iterations
Adoption of learned utility in other surfaces such as Homefeed
A reader sent me this blog post from Slack where it explains their recommender stack and ML Infra that powers the recommendations for them. Slack developed a unified framework called the Recommend API to make it easier to build and deploy machine learning models for recommendation use cases. The Recommend API provides a consistent interface for accessing data, training models, and serving predictions. This has allowed Slack to rapidly prototype and productionize ML models across the product, and it is constantly being improved to support more use cases. They also talk about monitoring and monitoring that they do in differently:
Reliability metrics: Prometheus metrics from the backend to track the number of requests and errors
Efficiency metrics: Prometheus metrics from the model serving service, such as throughput and latency, to make sure we are responding fast enough to all the requests
Online metrics: business metrics which we share with external stakeholders. Some most important metrics we track are the clickthrough rate (CTR), and ranking metrics such as discounted cumulative gain (DCG). Online metrics are frequently checked and monitored to make sure the model, plus the overall end-to-end process, is working properly in production
Offline metrics: metrics to compare various models during training time and decide which one we potentially want to experiment and productionize. We set aside the validation data, apart from the training data, so that we know the model can perform well on data it hasn’t seen yet. We track common classification and ranking metrics for both training and validation data
Feature stats: metrics to monitor feature distribution and feature importance, upon which we run anomaly detection to prevent distribution shift.
Libraries
Jumanji is helping pioneer a new wave of hardware-accelerated research and development in the field of RL. Jumanji's high-speed environments enable faster iteration and large-scale experimentation while simultaneously reducing complexity.
Klio is an ecosystem that allows you to process audio files – or any binary files – easily and at scale. Klio jobs are opinionated data pipelines in Python (streaming or batch) built upon Apache Beam and tuned for audio and binary file processing.
Klio was built by Spotify to run our large-scale audio intelligence systems and is used by teams of engineers and audio researchers to help develop and deploy next generation audio algorithms.
With OpenLLM, you can run inference with any open-source large-language models, deploy to the cloud or on-premises, and build powerful AI apps.
🚂 State-of-the-art LLMs: built-in supports a wide range of open-source LLMs and model runtime, including StableLM, Falcon, Dolly, Flan-T5, ChatGLM, StarCoder and more.
🔥 Flexible APIs: serve LLMs over RESTful API or gRPC with one command, query via WebUI, CLI, our Python/Javascript client, or any HTTP client.
⛓️ Freedom To Build: First-class support for LangChain, BentoML and Hugging Face that allows you to easily create your own AI apps by composing LLMs with other models and services.
Novel is a Notion-style WYSIWYG editor with AI-powered autocompletions.
If you have read so far, I highly recommend Marc Andreessen’s post in the new era of AI:
Marc Andreessen argues that artificial intelligence (AI) has the potential to solve some of the world's biggest problems, such as climate change, poverty, and disease. He argues mainly that the benefits of AI outweigh the risks, and that AI has the potential to make the world a better place while still calling for a global effort to ensure that AI is used for good.
The Fast Segment Anything Model(FastSAM) is a CNN Segment Anything Model trained by only 2% of the SA-1B dataset published by SAM authors. The FastSAM achieve a comparable performance with the SAM method at 50× higher run-time speed.
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
vLLM is a fast and easy-to-use library for LLM inference and serving.
vLLM is fast with:
State-of-the-art serving throughput
Efficient management of attention key and value memory with PagedAttention
Dynamic batching of incoming requests
Optimized CUDA kernels
vLLM is flexible and easy to use with:
Seamless integration with popular HuggingFace models
High-throughput serving with various decoding algorithms, including parallel sampling, beam search, and more
Tensor parallelism support for distributed inference
Streaming outputs
OpenAI-compatible API server
The code is in GitHub.