Swiggy Improves Search Autocomplete Using Real Time Machine Learning Ranking
Briefly

Swiggy Improves Search Autocomplete Using Real Time Machine Learning Ranking
Autocomplete requests require very low latency because each keystroke triggers a new query. The system uses a two-stage pipeline. Candidate generation retrieves a broad set of suggestions using OpenSearch lexical retrieval plus embedding-based similarity search, optimized for recall and fast response. A ranking layer then reorders candidates using machine learning models that predict relevance. The ranking uses real-time signals such as user interaction history, click behavior, query context, and item popularity. Offline-trained models are deployed for online inference. A feature store serves both precomputed and streaming features to avoid expensive real-time computations while still incorporating recent behavior. Learning-to-rank is integrated directly into OpenSearch, avoiding extra services or network hops.
"Swiggy detailed the architecture of the company's real-time machine-learning ranking system for autocomplete search suggestions, describing how the platform combines OpenSearch retrieval, feature stores, and learning-to-rank models while operating under strict latency requirements. The system replaced a hand-tuned heuristic ranking approach with a learned ranking model running directly inside OpenSearch, avoiding additional services or network hops while improving autocomplete relevance."
"According to the company, autocomplete requests are particularly sensitive to latency because every keystroke can trigger a new search query. Traditional autocomplete systems, therefore, tend to rely on lexical matching and static ranking rules optimized for speed. Swiggy's newer approach separates the workflow into two stages: candidate generation and ranking."
"When a user begins typing, the system first retrieves a broad set of candidate suggestions using OpenSearch lexical retrieval combined with embedding-based similarity search. This retrieval layer is optimized for recall and fast response times. The candidate suggestions are then passed into a ranking layer where machine learning models reorder results based on predicted relevance."
"The ranking system incorporates real-time signals such as user interaction history, click behavior, query context, and item popularity. These features are combined with offline-trained models that are deployed for online inference. A feature store is used to serve both precomputed and streaming features, enabling the system to avoid expensive real-time computations while still reacting to recent user behavior."
Read at InfoQ
Unable to calculate read time
[
|
]