Salta al contenuto principale


AI Translations Are Adding ‘Hallucinations’ to Wikipedia Articles


AI translated articles swapped sources or added unsourced sentences with no explanation, while others added paragraphs sourced from completely unrelated material.

Wikipedia editors have implemented new policies and restricted a number of contributors who were paid to use AI to translate existing Wikipedia articles into other languages after they discovered these AI translations added AI “hallucinations,” or errors, to the resulting article.

The new restrictions show how Wikipedia editors continue to fight the flood of generative AI across the internet from diminishing the reliability of the world’s largest repository of knowledge. The incident also reveals how even well-intentioned efforts to expand Wikipedia are prone to errors when they rely on generative AI, and how they’re remedied by Wikipedia’s open governance model.

The issue in this case starts with an organization called the Open Knowledge Association (OKA), a non-profit organization dedicated to improving Wikipedia and other open platforms.

“We do so by providing monthly stipends to full-time contributors and translators,” OKA’s site says. “We leverage AI (Large Language Models) to automate most of the work.”

The problem is that editors started to notice that some of these translations introduced errors to articles. For example, a draft translation for a Wikipedia article about the French royal La Bourdonnaye family cites a book and specific page number when discussing the origin of the family. A Wikipedia editor, Ilyas Lebleu, who goes by Chaotic Enby on Wikipedia, checked that source and found that the specific page of that book “doesn't talk about the La Bourdonnaye family at all.”

“To measure the rate of error, I actually decided to do a spot-check, during the discussion, of the first few translations that were listed, and already spotted a few errors there, so it isn't just a matter of cherry-picked cases,” Lebleu told me. “Some of the articles had swapped sources or added unsourced sentences with no explanation, while 1879 French Senate election added paragraphs sourced from material completely unrelated to what was written!”

As Wikipedia editors looked at more OKA-translated articles, they found more issues.

“Many of the results are very problematic, with a large number of [...] editors who clearly have very poor English, don't read through their work (or are incapable of seeing problems) and don't add links and so on,” a Wikipedia page discussing the OKA translation said. The same Wikipedia page also notes that in some cases the copy/paste nature of OKA translators’ work breaks the formatting on some articles.

Wikipedia editors investigated how OKA was operating and found that it was mostly relying on cheap labor from contractors in the Global South, and that these contractors were instructed to copy/paste articles to popular LLMs to produce translations.

For example, a public spreadsheet used by OKA translators to keep track of what articles they’re translating instructs them to “pick an article, copy the lead section into Gemini or chatGPT, then review if some of the suggestions are an improvement to readability. Make edits to the Wiki articles only if the suggestions are an improvement and don't change the meaning of the lead. Do not change the content unless you have checked that what Gemini says is correct!”

Lebleu told me, and other editors have noted in their public on-site discussion of the issue, that these same instructions previously told OKA translators to use Grok, Elon Musk’s LLM, for the same purpose. Grok, which also produces an entirely automated alternative to Wikipedia called Grokepedia, is prone to errors precisely because it does not use humans to vet its output.

“The use of Grok proved controversial, notably given the reasons for which Grok has been in the news recently, and a recent in-house study showed ChatGPT and Claude perform more accurately, leading them to switch a few days ago, although they still recommend Grok as ‘valuable for experienced editors handling complex, template-heavy articles,’” Lebleu told me.

Ultimately the editors decided to implement restrictions against OKA translators who make multiple errors, but not block OKA translation as a rule.

“OKA translators who have received, within six months, four (correctly applied) warnings about content that fails verification will be blocked without further warning if another example is found,” the Wikipedia editors wrote. “Content added by an OKA translator who is subsequently blocked for failing verification may be presumptively deleted [...] unless an editor in good standing is willing to take responsibility for it.”

A job posting for a “Wikipedia Translator” from OKA offers $397 a month for working up to 40 hours per week. The job listing says translators are expected to publish “5-20 articles per week (depending on size).”

“They leverage machine translation to accelerate the process. We have published over 1500 articles and the number grows every day,” the job posting says.

“Given this precarious status, I am worried that more uncertainty in the translator duties may lead to an overloading of responsibilities, which is worrying as independent contractors do not necessarily have the same protections as paid employees,” Lebleu wrote in the public Wikipedia discussion about OKA.

Jonathan Zimmermann, the founder and president of OKA, and who goes by 7804j

on Wikipedia, told me that translators are paid hourly, not per article, and that there is no fixed article quota.

“We emphasize quality over speed,” Zimmerman told me in an email. “In fact, some of the problematic cases involved unusually high output relative to time spent — which in retrospect was a warning sign. Those cases were driven by individual enthusiasm and speed rather than institutional pressure.”

Zimmerman told me that “errors absolutely do occur,” but that OKA’s process includes human review, requires translators to check their content against cited sources, and that “senior editors periodically review samples, especially from newer translators.”

“Following the recent discussion, we have strengthened our safeguards,” Zimmerman told me. “We are now rolling out a second, independent LLM review step. Translators must run the completed draft through a separate model using a dedicated comparison prompt designed to identify potential discrepancies, omissions, or inaccuracies relative to the source text. Initial findings suggest this is highly effective at detecting potential issues.”

Zimmerman added that if this method proves insufficient, OKA is considering introducing formal peer review mechanisms

Using AI to check the output of AI for errors is a method that is historically prone to errors. For example, we recently reported on an AI-powered private school that used AI to check AI-generated questions for students. Internal testing found it had at least a 10 percent failure rate.

“I agree that using AI to check AI can absolutely fail — and in some contexts it can fail at very high rates. We’re not assuming the secondary model is reliable in isolation,” Zimmerman said. “The key point is that we’re not replacing human verification with automated verification. The second model is a complement to manual review, not a substitute for it.”

“When a coordinated project uses AI tools and operates at scale, it’s going to attract attention. I understand why editors would examine that closely. Ultimately, the outcome of the discussion formalized expectations that are largely aligned with our existing internal policies,” Zimmerman added. “However, these restrictions apply specifically to OKA translators. I would prefer that standards apply equally to everyone, but I also recognize that organized, funded efforts are often held to a higher bar.”


Grokipedia Is the Antithesis of Everything That Makes Wikipedia Good, Useful, and Human


I woke up restless and kind of hungover Sunday morning at 6 am and opened Reddit. Somewhere near the top was a post called “TIL in 2002 a cave diver committed suicide by stabbing himself during a cave diving trip near Split, Croatia. Due to the nature of his death, it was initially investigated as a homicide, but it was later revealed that he had done it while lost in the underwater cave to avoid the pain of drowning.” The post linked to a Wikipedia page called “List of unusual deaths in the 21st century.” I spent the next two hours falling into a Wikipedia rabbit hole, clicking through all manner of horrifying and difficult-to-imagine ways to die.

A day later, I saw that Depths of Wikipedia, the incredible social media account run by Annie Rauwerda, had noted the entirely unsurprising fact that, behind the scenes, there had been robust conversation and debate by Wikipedia editors as to exactly what constitutes an “unusual” death, and that several previously listed “unusual” deaths had been deleted from the list for not being weird enough. For example: People who had been speared to death with beach umbrellas are “no longer an unusual or unique occurrence”; “hippos are extremely dangerous and very aggressive and there is nothing unusual about hippos killing people”; “mysterious circumstances doesn’t mean her death itself was unusual.” These are the types of edits and conversations that have collectively happened billions of times that make Wikipedia what it is, and which make it so human, so interesting, so useful.

recently discovered that wikipedia volunteers have a hilariously high bar for what constitutes "unusual death"
depths of wikipedia (@depthsofwikipedia.bsky.social) 2025-10-27T12:38:42.573Z


Wednesday, as part of his ongoing war against Wikipedia because he does not like his page, Elon Musk launched Grokipedia, a fully AI-generated “encyclopedia” that serves no one and nothing other than the ego of the world’s richest man. As others have already pointed out, Grokipedia seeks to be a right wing, anti-woke Wikipedia competitor. But to even call it a Wikipedia competitor is to give the half-assed project too much credit. It is not a Wikipedia “competitor” at all. It is a fully robotic, heartless regurgitation machine that cynically and indiscriminately sucks up the work of humanity to serve the interests, protect the ego, amplify the viewpoints, and further enrich the world’s wealthiest man. It is a totem of what Wikipedia could and would become if you were to strip all the humans out and hand it over to a robot; in that sense, Grokipedia is a useful warning because of the constant pressure and attacks by AI slop purveyors to push AI-generated content into Wikipedia. And it is only getting attention, of course, because Elon Musk does represent an actual threat to Wikipedia through his political power, wealth, and obsession with the website, as well as the fact that he owns a huge social media platform.

One needs only spend a few minutes clicking around the launch version of Grokipedia to understand that it lacks the human touch that makes Wikipedia such a valuable resource. Besides often having a conservative slant and having the general hallmarks of AI writing, Grokipedia pages are overly long, poorly and confusingly organized, have no internal linking, have no photos, and are generally not written in a way that makes any sense. There is zero insight into how any of the articles were generated, how information was obtained and ordered, any edits that were made, no version history, etc. Grokipedia is, literally, simply a single black box LLM’s version of an encyclopedia. There is a reason Wikipedia editors are called “editors” and it’s because writing a useful encyclopedia entry does not mean “putting down random facts in no discernible order.” To use an example I noticed from simply clicking around: The list of “notable people” in the Grokipedia entry for Baltimore begins with a disordered list of recent mayors, perhaps the least interesting but lowest hanging fruit type of data scraping about a place that could be done.

On even the lowest of stakes Wikipedia pages, real humans with real taste and real thoughts and real perspectives discuss and debate the types of information that should be included in any given article, in what order it should be presented, and the specific language that should be used. They do this under a framework of byzantine rules that have been battle tested and debated through millions of edit wars, virtual community meetings, talk page discussions, conference meetings, inscrutable listservs which themselves have been informed by Wikimedia’s “mission statement,” the “Wikimedia values,” its “founding principles” and policies and guidelines and tons of other stated and unstated rules, norms, processes and procedures. All of this behind-the-scenes legwork is essentially invisible to the user but is very serious business to the human editors building and protecting Wikipedia and its related projects (the high cultural barrier to entry for editors is also why it is difficult to find new editors for Wikipedia, and is something that the Wikipedia community is always discussing how they can fix without ruining the project). Any given Wikipedia page has been stress tested by actual humans who are discussing, for example, whether it’s actually that unusual to get speared to death by a beach umbrella.

Grokipedia, meanwhile, looks like what you would get if you told an LLM to go make an anti-woke encyclopedia, which is essentially exactly what Elon Musk did.

As LLMs tend to do, some pages on Grokipedia leak part of its instructions. For example, a Grokipedia page on “Spanish Wikipedia” notes “Wait, no, can’t cite Wiki,” indicating that Grokipedia has been programmed to not link to Wikipedia. That entry does cite Wikimedia pages anyway, but in the “sources,” those pages are not actually hyperlinked:

I have no doubt that Grokipedia will fail, like other attempts to “compete” with Wikipedia or build an “alternative” to Wikipedia, the likes of which no one has heard of because the attempts were all so laughable and poorly participated in that they died almost immediately. Grokipedia isn’t really a competitor at all, because it is everything that Wikipedia is not: It is not an encyclopedia, it is not transparent, it is not human, it is not a nonprofit, it is not collaborative or crowdsourced, in fact, it is not really edited at all. It is true that Wikipedia is under attack from both powerful political figures, the proliferation of AI, and related structural changes to discoverability and linking on the internet like AI summaries and knowledge panels. But Wikipedia has proven itself to be incredibly resilient because it is a project that specifically leans into the shared wisdom and collaboration of humanity, our shared weirdness and ways of processing information. That is something that an LLM will never be able to compete with.


Questa voce è stata modificata (1 giorno fa)