New project makes Wikipedia data more accessible to AI | TechCrunch
Briefly

New project makes Wikipedia data more accessible to AI | TechCrunch
"On Wednesday, Wikimedia Deutschland announced a new database that will make Wikipedia's wealth of knowledge more accessible to AI models. Called the Wikidata Embedding Project, the system applies a vector-based semantic search - a technique that helps computers understand the meaning and relationships between words - to the existing data on Wikipedia and its sister platforms, consisting of nearly 120 million entries."
"Wikidata has offered machine-readable data from Wikimedia properties for years, but the pre-existing tools only allowed for keyword searches and SPARQL queries, a specialized query language. The new system will work better with retrieval-augmented generation (RAG) systems that allow AI models to pull in external information, giving developers a chance to ground their models in knowledge verified by Wikipedia editors."
Wikimedia Deutschland announced the Wikidata Embedding Project, applying vector-based semantic search to nearly 120 million entries across Wikipedia and sister platforms. The system pairs vector embeddings with support for the Model Context Protocol (MCP) to let large language models query data in natural language. Wikimedia Deutschland collaborated with Jina.AI and DataStax on the development. The system goes beyond keyword search and SPARQL by supporting retrieval-augmented generation (RAG), enabling models to pull externally verified information. The data includes structured semantic context, translations, images, and related concept extrapolations. The database is publicly accessible on Toolforge and a developer webinar is scheduled for October 9.
Read at TechCrunch
Unable to calculate read time
[
|
]