Wikidata Embedding Project Makes Wikipedia Data AI-Friendly
Wikimedia Deutschland has launched the Wikidata Embedding Project, designed to make Wikipedia’s massive knowledge base more accessible to artificial intelligence systems through semantic search and modern data protocols.
Wikipedia has long been a critical source of information, but its structured data access relied heavily on keyword searches and SPARQL queries, limiting ease of use. With AI models demanding high-quality, verified datasets, this new project bridges the gap between open knowledge and advanced AI.
The Wikidata Embedding Project was developed in collaboration with Jina.AI and DataStax (IBM-owned). It applies vector-based semantic search, allowing AI systems to understand relationships between terms, and introduces support for the Model Context Protocol (MCP) to improve natural language queries.
This upgrade makes nearly 120 million entries from Wikipedia and its sister platforms more usable for retrieval-augmented generation (RAG), helping AI models ground responses in fact-checked content.
For example, searching “scientist” will not only show nuclear scientists but also Bell Labs researchers, translations, related terms like scholar, and Wikimedia-approved images.
Project manager Philippe Saadé stated:
“This Embedding Project launch shows that powerful AI doesn’t have to be controlled by a handful of companies. It can be open, collaborative, and built to serve everyone.”

Experienced Content Writer & Creative Strategist
I am an experienced writer passionate about creating engaging, research-driven content across technology, AI, fintech, and cryptocurrency. My goal is to inform, inspire, and connect audiences through impactful storytelling while helping brands build trust and a strong digital presence.
