GitHub Introduces New Embedding Model to Improve Code Search and Context

"GitHub has introduced a new embedding model for Copilot, now integrated into Visual Studio Code. The model is designed to improve how Copilot understands programming context, retrieves relevant code, and suggests completions. According to GitHub, this update provides a 37.6% improvement in retrieval quality, doubles throughput speed, and reduces memory usage for code indexing by a factor of eight. The new model powers all of Copilot's main modes: chat, agent, edit, and ask."

"To train the system, GitHub used contrastive learning with InfoNCE loss and introduced a method called Matryoshka Representation Learning. This technique allows the model to handle embeddings at multiple levels of granularity, meaning it can represent both small code fragments and entire files effectively. The training process also relied on hard negatives, examples of code that appear similar but are functionally incorrect, to help the model better distinguish between valid and invalid suggestions."

GitHub's new Copilot embedding model is integrated into Visual Studio Code and improves how Copilot understands programming context, retrieves relevant code, and suggests completions. The update delivers a 37.6% improvement in retrieval quality, doubles throughput speed, and reduces memory usage for code indexing by a factor of eight. The model powers Copilot's chat, agent, edit, and ask modes and raises average embedding scores from 0.362 to 0.498. Developers using C# and Java report accepted suggestion rates roughly doubling. Training used contrastive learning with InfoNCE loss, Matryoshka Representation Learning for multi-granularity embeddings, and hard negatives to reduce erroneous retrievals. Embedding index compression improves local efficiency and integration within VS Code.

#copilot-embeddings #matryoshka-representation-learning #contrastive-learning-infonce #vs-code-integration

Read at InfoQ

Unable to calculate read time

Collection

[

...

]

GitHub Introduces New Embedding Model to Improve Code Search and ContextGitHub Introduces New Embedding Model to Improve Code Search and Context Briefly

GitHub Introduces New Embedding Model to Improve Code Search and Context
GitHub Introduces New Embedding Model to Improve Code Search and Context
Briefly