Building Embedding Models for Large-Scale Real-World Applications
Briefly

Building Embedding Models for Large-Scale Real-World Applications
"What happens under the hood? How is the search engine able to take that simple query, look for images in the billions, trillions of images that are available online? How is it able to find this one or similar photos from all that? Usually, there is an embedding model that is doing this work behind the hood."
"I'm co-leading the team at Google that's building the Gemini embedding models, as well as the infrastructure. Recently, I had the pleasure to work on the Gemini Embedding paper. I'm really proud of this team, because together, we have built the best embedding model that's available on all the known benchmarks."
"How are these models formed? How are they able to generate these embeddings? Next, we'll look at the training techniques. How are these models trained? Then we'll see, once you have trained these larger size models, how are you going to distill them into smaller models that can actually be used in production? Next, we will see how we can evaluate these models. It might be non-trivial."
Embedding models map queries and items into vector representations to enable similarity search and retrieval across billions or trillions of objects. These models power retrieval tasks such as image search by transforming diverse modalities into comparable embeddings. Key engineering concerns include model architecture, training techniques, and metrics for non-trivial evaluation. Large trained models often require distillation into smaller, production-ready models to meet latency and resource constraints. Practical deployment requires infrastructure and mitigation strategies for large-scale challenges. The Gemini embedding models and accompanying infrastructure achieved top performance on known benchmarks, demonstrating state-of-the-art capabilities.
Read at InfoQ
Unable to calculate read time
[
|
]