Airbnb's Search Ranking Approach

Netflix's Long Term Reward Strategy

Jan 11, 2025

Airbnb's engineering team discusses the challenges and solutions in adapting search ranking for map-based interfaces. This post is very interesting as the ranking approach, algorithm as well as optimization function has to be rethought all the way to how user actually interacts with the application itself. Also, as the user interacts with the items not necessarily a pre-determined and deterministic order, this also requires thinking about position bias and other types of considerations that might be obvious in the list ranking(as the application can enforce a particular order for the items.

TLDR:

Map interfaces require rethinking traditional ranking assumptions
User attention on maps is more complex and requires multi-faceted modeling
Balancing high-probability listings with good spatial coverage is crucial
Tiered visual representation can effectively guide user attention to find the optimal listing for a given user
Continuous iteration and A/B testing are essential for optimizing map ranking

Main differences between a simple list ranking and map ranking boils down to the attention, but there are many more dimensions which I will list in below:

User Attention:

List Ranking

User attention decays from top to bottom
Click-through rates (CTR) decrease as rank position increases
Ranking strategy: Sort listings by booking probabilities

Map Ranking

User attention is scattered across the map
No clear decay of attention based on position
Traditional ranking strategy of sorting by booking probabilities is not applicable

Adapting Ranking for Maps

Uniform User Attention Model

Airbnb tested a hypothesis of uniform user attention across map pins:

Introduced a parameter α to control the ratio of highest to lowest booking probabilities for displayed pins
Restricted the range of booking probabilities for map pins
Results:
- Reduced average impressions and clicks to discovery
- Significant improvement in bookings and quality of bookings (5-star ratings)

Tiered User Attention Model

To further refine the ranking approach, Airbnb implemented a two-tier system:

Regular oval pins with price: Highest booking probabilities
Mini-pins without price: Lower booking probabilities
Observations:
- Mini-pins had 8x lower CTR than regular pins
- Particularly useful for desktop searches with fixed number of results
Results:
- Improved uncancelled bookings
- Reduced map moves, indicating less user effort

Discounted User Attention Model

The final iteration involved a more sophisticated understanding of user attention distribution:

Analyzed CTR of map pins across different coordinates
Developed an algorithm to re-center the map based on listings with highest booking probabilities
Results:
- 0.27% improvement in uncancelled bookings
- 1.5% reduction in map moves

Other Differences

Ranking Factors
- List Ranking: Primarily based on booking probabilities
- Map Ranking: Considers booking probabilities, but also incorporates spatial distribution and user attention patterns
Result Presentation
- List Ranking: Sequential order
- Map Ranking: Scattered pins with varying visual prominence
User Interaction
- List Ranking: Scrolling through a linear list
- Map Ranking: Exploring a two-dimensional space, zooming, and panning
Optimization Goals
- List Ranking: Maximize visibility of high-probability bookings at the top
- Map Ranking: Balance between showing high-probability bookings and providing good coverage of the area
Algorithmic Approaches
- List Ranking: Straightforward sorting by booking probability
- Map Ranking: Complex algorithms considering spatial distribution, re-centering, and tiered presentation
Scalability
- List Ranking: Easily scalable to large result sets
- Map Ranking: Challenges with representing a full range of available listings
Personalization
- List Ranking: Can be personalized based on user preferences
- Map Ranking: Personalization needs to consider both user preferences and spatial context
Visual Hierarchy
- List Ranking: Achieved through position in the list
- Map Ranking: Achieved through pin size, color, and placement
User Cognitive Load
- List Ranking: Sequential processing of information
- Map Ranking: Parallel processing of spatial information
Adaptation to Screen Size
- List Ranking: Relatively consistent across devices
- Map Ranking: Significant differences between mobile and desktop interfaces
Handling of Low-Probability Results
- List Ranking: Typically pushed to later pages
- Map Ranking: Represented as mini-pins or potentially omitted
Impact of Result Density
- List Ranking: Consistent presentation regardless of result density
- Map Ranking: Challenges in areas with high listing density
User Control
- List Ranking: Limited to scrolling and page navigation
- Map Ranking: Users can actively explore different areas and zoom levels
Performance Metrics
- List Ranking: Focus on position-based CTR and conversion rates
- Map Ranking: Incorporates metrics like map moves and pin interaction rates

After last week’s post, there were a lot of positive feedback on the Netflix’s approach of building recommender systems and here is an excellent post much more detailed on the Long term rewards and how Netflix thinks about long term reward.

They focus on long-term satisfaction due to the following reasons:

User Experience: Providing members with content they genuinely enjoy over time enhances their overall experience with the platform.
Member Retention: Satisfied members are more likely to continue their subscriptions, reducing churn.
Business Performance: Improved long-term satisfaction can lead to increased viewer activity and, ultimately, better financial outcomes for Netflix.

Traditional recommender systems often optimize for short-term metrics like clicks or engagement, which may not fully capture long-term satisfaction. Netflix's approach seeks to bridge this gap by developing more sophisticated recommendation strategies.

Recommendations as Contextual Bandit

Netflix frames its recommendation problem as a contextual bandit(CB), a type of reinforcement learning algorithm where they define the following concepts in the CB:

Context: When a member visits the platform, it creates a context for the system.
Action: The system selects an action by deciding which recommendations to show.
Feedback: The member provides various types of feedback, both immediate and delayed.

This approach allows Netflix to define reward functions that reflect the quality of recommendations based on user feedback signals. The system then trains a contextual bandit policy on historical data to maximize the expected reward.

Challenges with Retention as a Direct Reward

Netflix identifies several drawbacks to using it directly as a reward:

Noise: Retention can be influenced by external factors unrelated to the service quality.
Low Sensitivity: It only captures extreme cases where members are on the verge of canceling.
Attribution Difficulty: Cancellations might result from a series of poor recommendations rather than a single event.
Slow Measurement: Retention data is only available on a monthly basis per account.

These challenges make optimizing directly for retention impractical, leading Netflix to explore alternative approaches.

Proxy Rewards

To address the limitations of using retention directly, Netflix employs proxy reward functions. These functions aim to align closely with long-term member satisfaction while being sensitive to individual recommendations

The proxy reward is defined as:

Where user interactions can include various actions such as playing, completing, or rating content.

Click-Through Rate (CTR)

CTR serves as a simple baseline proxy reward, where:

While CTR is a common and strong baseline, Netflix recognizes that over-optimizing for it can lead to promoting "clickbaity" items that may harm long-term satisfaction.

Beyond CTR

To better align with long-term satisfaction, Netflix considers a wide range of user actions and their implications:

Fast season completion: Completing a TV show season quickly is seen as a strong positive signal.
Negative feedback after completion: Finishing a show over weeks but giving it a thumbs-down indicates low satisfaction despite significant time investment.
Brief engagements: Short viewing sessions are ambiguous and require careful interpretation.
Genre discovery: Engaging with new genres after recommendations is viewed very positively.

Reward Engineering

Netflix employs an iterative process called reward engineering to refine their proxy reward functions. This process involves four stages:

Hypothesis formation
Defining a new proxy reward
Training a new bandit policy
A/B testing

This approach allows Netflix to continuously improve their reward functions based on observed data and member behavior.

Dealing with Delayed Feedback

A significant challenge in Netflix's recommendation system is handling delayed or missing feedback. Users may take weeks to complete a show or may never provide explicit feedback like ratings. To address this, Netflix has developed a sophisticated approach:

Predict Missing Feedback: Instead of waiting for delayed feedback, Netflix predicts it using already observed data and other relevant information up to the training time.
Proxy Reward Calculation: The proxy reward is calculated using both observed and predicted feedback for each training example.
Bandit Policy Update: These training examples are then used to update the bandit policy.

This approach allows Netflix to update their recommendation policy quickly while still incorporating long-term feedback signals.

There is also a paper that goes into much more detail about the reward building process as well as technical implementation details.

Libraries

Byte Latent Transformer architecture (BLTs), a new byte-level LLM architecture that for the first time, matches tokenization-based LLM performance at scale, with significant improvements in inference efficiency and robustness. BLT encodes bytes into dynamically sized patches, which serve as the primary units of computation. Patches are segmented dynamically based on the entropy of the next byte, allocating more compute and model capacity where there is more data complexity. The BLT architecture includes new attention mechanisms to maximize the information flow between byte and patch hidden representations and a new type of byte-sequence memory.

Harbor is a containerized LLM toolkit that allows you to run LLMs and additional services. It consists of a CLI and a companion App that allows you to manage and run AI services with ease. Effortlessly run LLM backends, APIs, frontends, and services with one command.

DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code. It is also provided powerful user interface, dedicated to solving complex task dependencies in the data pipeline and providing various types of jobs available out of the box

The key features for DolphinScheduler are as follows:

Easy to deploy, provide four ways to deploy which including Standalone, Cluster, Docker and Kubernetes.
Easy to use, workflow can be created and managed by four ways, which including Web UI, Python SDK and Open API
Highly reliable and high availability, decentralized architecture with multi-master and multi-worker, native supports horizontal scaling.
High performance, its performance is N times faster than other orchestration platform and it can support tens of millions of tasks per day
Cloud Native, DolphinScheduler supports orchestrating multi-cloud/data center workflow, and supports custom task type
Versioning both workflow and workflow instance(including tasks)
Various state control of workflow and task, support pause/stop/recover them in any time
Multi-tenancy support
Others like backfill support(Web UI native), permission control including project and data source

flow_matching is a PyTorch library for Flow Matching algorithms, featuring continuous and discrete implementations. It includes examples for both text and image modalities. This repository is part of Flow Matching Guide and Codebase.

Polars is a DataFrame interface on top of an OLAP Query Engine implemented in Rust using Apache Arrow Columnar Format as the memory model.

Lazy | eager execution
Multi-threaded
SIMD
Query optimization
Powerful expression API
Hybrid Streaming (larger-than-RAM datasets)
Rust | Python | NodeJS | R | ...

MLOps Newsletter

Discussion about this post