
"At its core, S2R is a technology that directly interprets and retrieves information from a spoken query without the intermediate, and potentially flawed, step of having to create a perfect text transcript. It represents a fundamental architectural and philosophical shift in how machines process human speech."
"The move to S2R-powered voice search isn't a theoretical exercise; it's a live reality. In a close collaboration between Google Research and Search, these advanced models are now serving users in multiple languages, delivering a significant leap in accuracy beyond conventional cascade systems."
"🆕 Huge update for Voice Search -> now its powered by Speech-to-Retrieval engine and this new process don't convert speech to a text transcript & then do a web search rather this new technique uses an audio encoder for converting sound into audio embeddings which then is used to... https://t.co/iv2q4Kp0Qt pic.twitter.com/bCGwIfKNEh- Gagan Ghotra (@gaganghotra_) October 8, 2025"
Speech-to-Retrieval (S2R) replaces automatic speech recognition cascades by interpreting audio directly and retrieving relevant results without producing an intermediate text transcript. S2R converts spoken queries into audio embeddings via an audio encoder, enabling retrieval models to match audio semantics to documents. This approach reduces errors caused by imperfect speech-to-text transcription that can change query meaning and produce wrong results. S2R models are in active deployment across multiple languages through collaboration between research and search teams, delivering higher accuracy and faster responses. The change represents an architectural shift in how machines process human speech for search applications.
Read at Search Engine Roundtable
Unable to calculate read time
Collection
[
|
...
]