Wikipedia is an error multiplier

May 29, 2014 • 1:13 pm

by Greg Mayer

Close readers of WEIT will know that I rarely cite or link to Wikipedia (other than for images), and that I have occasionally promised to at some point say more about this. This won’t be a full account, but a recent spectacular example of Wikipedia’s ability to spread error has been reported by Eric Randall at The New Yorker, and deserves a mention: the coati has been widely cited as the “Brazilian **rdv*rk”! (See note below.)

A coati, not a member of the Tubulidentata (by Vassil, from Wikipedia;))
A coati, not a member of the Tubulidentata (by Vassil, from Wikipedia;))

Coatis are New World members of the order Carnivora, in the same family as raccoons. Indeed, they look very much like raccoons with long noses and skinny tails. They are not at all related closely to aardvarks, which are are of course African, and members of the very distinctive mammalian order Tubulidentata. (Their name means ‘earth pig’, from Dutch/Afrikaans). Here’s how the coatis’ new name got started and spread:

In July of 2008, Dylan Breves, then a seventeen-year-old student from New York City, made a mundane edit to a Wikipedia entry on the coati. The coati, a member of the raccoon family, is “also known as … a Brazilian **rdv*rk,” Breves wrote. He did not cite a source for this nickname, and with good reason: he had invented it….

Adding a private gag to a public Wikipedia page is the kind of minor vandalism that regularly takes place on the crowdsourced Web site. When Breves made the change, he assumed that someone would catch the lack of citation and flag his edit for removal.

Over time, though, something strange happened: the nickname caught on. About a year later, Breves searched online for the phrase “Brazilian **rdv*rk.” Not only was his edit still on Wikipedia, but his search brought up hundreds of other Web sites about coatis. References to the so-called “Brazilian **rdv*rk” have since appeared in the Independent, the Daily Mail, and even in a book published by the University of Chicago. Breves’s role in all this seems clear: a Google search for “Brazilian **rdv*rk” will return no mentions before Breves made the edit, in July, 2008. The claim that the coati is known as a Brazilian **rdv*rk still remains on its Wikipedia entry, only now it cites a 2010 article in the Telegraph as evidence.

This kind of feedback loop—wherein an error that appears on Wikipedia then trickles to sources that Wikipedia considers authoritative, which are in turn used as evidence for the original falsehood—is a documented phenomenon. There’s even a Wikipefeedback loopdia article describing it.

The erroneous name has now been removed from Wikipedia, and a note on its origin and fate, citing Randall’s piece, has been appended to the coati article.

This episode reminded me of one of my own earliest experiences as a Wikipedia editor: getting rid of an article about an “event” made up by another Wikipedia editor. Sometime about early 2006, I became aware of an article in Wikipedia on the “W*ll**ms R*v*l*t**n”. This was supposed to be a development in the history of evolutionary biology brought about by George C. Williams (who was indeed one of the 20th century’s great and influential evolutionary biologists). But I had never heard of such a thing– and I’m an at least reasonably well-read evolutionary biologist, plus I knew Williams at Stony Brook. I tried to find out if anyone else had ever heard of it. Here’s what I posted on Williams’ Wikipedia talk page:

I’ve already noted this on the talk page for “W*ll**ms R*v*l*t**n”, but this term seems to be a strictly Wikipedia term, invented for Wikipedia. All the references I can find to it online, including in chat groups, seem traceable to the Wikipedia entry. I’ve never encountered it in the literature of evolutionary biology, or anywhere else in print. It’s also not a terribly appropriate term. I have nothing but the greatest admiration and appreciation for Williams’ contributions, most notably his Adaptation and Natural Selection, but his critique of group selection and advocacy of gene-level selection were much more a “restoration” than a revolution (Darwin clearly rejected group selection, with the clear exception that he contemplated it as a possibility in social insects); furthermore, a number of others at about the same time (e.g. W.D. Hamilton) and slightly later (e.g. Richard Dawkins) had as much or more to do with the elaboration of a strictly gene-centered view (especially as opposed to an individual selection view) as did Williams, so it doesn’t seem as if it should bear his name, or at least not his alone.But, regardless, Wikipedia should not be in the business of inventing terms. 08:34, 8 January 2006 (UTC)

Some Wikipedia editors were sure they had heard the term, but on checking their supposed sources, none could find any uses of the term that had not originated in Wikipedia. Another, more experienced Wikipedia editor, Samsara (at the time an evolutionary genetics grad student at Edinburgh), joined me in the attempt to verify the term, but there turned out to be no non-Wikipedia uses of the term that did not trace back to Wikipedia. The article was deleted. (Most of the discussion of this was on the now deleted talk page of the now deleted article.)

When Wikipedia is used as a source, errors can spread rapidly, because it’s not just used by lazy students in term papers, but also by legitimate newspapers and publishers, and especially because there are whole websites that just copy from Wikipedia, and thus seem to form independent confirmations of the errors. Of course, errors in the old, print Encyclopedia Britannica could be perpetuated and recycled too, but the internet allows errors to spread faster and further, and the Encyclopedia Britannica would never have let a a not particularly knowledgeable 17 year old to author an article.

Note: in order to prevent Google searches turning up yet more usages of the spurious terms (and thus testaments to their use and verifiability), I have not used either neologism in this post, replacing vowels with ‘*’s.

h/t Tracy Walsh at The Dish

74 thoughts on “Wikipedia is an error multiplier

  1. Coatis are beautiful little creatures, which you’ll see if you ever visit the Iguacu Falls in Brazil/Argentina. They like begging ice-cream off tourists…

    1. The ones in Iguazu will climb over you growling and jump on your food. Quite nasty and scary, not one of my fond memories from the place.

  2. It’s not just Wikipedia, it’s ANYWHERE something is wrongly accounted, that’s the account that will be cited going forward. I’ve been incorrectly cited in scientific publications (not by much, but it was enough to notice) and that was the figure that was cited going forward.

    Same thing with an incorrectly copied address.

      1. On the time scale of years, Wikipedia is always corrected. If not, then no one cares about the subject. That is faster than textbooks, which are on the order of decades, but rarely need correction.

        1. Quite true, and I had to change textbooks when I inherited an undergrad Basic Biochemistry class because the old one was rife with errors.

          The advantage I see with Wikipedia, on top of its wide-ranging articles and its constant updates and corrections, is that everyone knows it is a source of dubious reliability. I have more faith in the Encyclopedia Britannica than in Wikipedia but I really shouldn’t: every source should be double-checked. At least, with Wikipedia, most people know that going in.

  3. It’s clear that the people who first named animals had really only had contact with pigs before seeing these new animals. Hence all those “schwein” tiere that we learned about a few days ago & now this animal! 🙂

    1. The mouseover text for that cartoon mentions a book based on a flawed wikipedia page.

      I’ve always assumed he was referring to the “list of top historical atrocities” in Pinker’s “Better Angels of our Nature”.

        1. He didn’t, but the guy who compiled the list of historical atrocities had put it up on a Wikipedia page, and has since published it as a book.

  4. Wikipedia is good fun and very useful, but should never be cited. Rather, it should be used as a subject funnel pointing serious researchers to the cited sources. If a fact doesn’t have a cited source, then it needs to be researched further, and if it does, the original source needs to be verified and checked. However, Wikipedia CAN be referenced as an article for further reading as long as it is properly cited.

    However, if some kid started a craze by stating that the coati is also known as a Brazilian Aardvark, and the coati then becomes widely known as a Brazialian Aardvark is that not now actually true? I mean, Skunks aren’t felines, yet are called pole cats, and Guinea Pigs aren’t swine (nor are Aardvark actually “earth pigs”). So, technically, isn’t it now true that the coati is also now known as a Brazilian Aardvark because it has acquired a new common name?

    1. Wikipedia can also profitably be used as a beginners guide, if the article is good and basic (and well referenced). It has few such articles, but its own system is geared for that with often shown article hierarchies building from the basics towards specifics.

    2. I agree that Wikipedia is an excellent,easy starting-place for finding things out.It has proved very-but not quite perfectly-accurate in areas where I am well-informed such as history and language/linguistics.But the nature of the site means that everything needs to be corroborated. I suspect its poor reputation in certain quarters stems from the constant doctoring or spinning of contentious issues to do with matters pertaining to politics and celebrities etc.

      1. Wikipedia is brilliant at supplementing research just outside of one’s field of specialty. Basically, if it’s important Wikipedia is never the only source. That makes for easy checking.

    3. Yep. As this notes, Jerry, you’ve effectively posited a theory here that doesn’t seem to obtain in practice. All the smart kids working on Wikipedia have produced something that is actually just as good a result, contrary to your hypothesis.

      1. I can’t tell if you’re joking or not, but Jerry didn’t write this and it’s just one example of how falsehoods can spread via Wikipedia; it doesn’t take much of an imagination to see how this could cause trouble when not discussing animals’ names.

      2. As best I can tell, you’ve effectively posited that Greg is not the author of this post. Fancy that, a correction in need of correction. There’s a meme for that, isn’t there?

    1. This Cambridge University Press article, by Neil Safer, is an astounding and amusing farrago of error (at least as regards coatis). Safer thinks he is quoting Buffon about the coati, which Safer thinks is also a type of aardvark. But on the cited page of Buffon, Buffon is discussing the agouti (a South American rodent), not the coati. Safer translates the French “l’agouti” into English as “the aardvark”. Amazingly, Buffon is in this passage making the point that you must examine the animals directly yourself, and not rely upon what other people have written; as Safer puts it “multiplication of errors was one of the most common features of eighteenth-century natural history, implying that one had to guard against such proliferated errors by always returning to the original animal in question.” Safer, in multiplying errors and creating new ones, is either making a very dry joke or committing the very errors Buffon decries!

      GCM

      1. I hope you’ve alerted CUP to this. Too late for the print version perhaps, but the eBook versions can be corrected.

  5. and the Encyclopedia Britannica would never have let a a not particularly knowledgeable 17 year old to author an article.

    So close, yet an otherwise very good and useful article fell on the very last part of the very last sentence.

    “Several studies have been done to assess the reliability of Wikipedia. An early study in the journal Nature said that in 2005, Wikipedia’s scientific articles came close to the level of accuracy in Encyclopædia Britannica and had a similar rate of “serious errors”.[2] The study by Nature was disputed by Encyclopædia Britannica,[3] and later Nature replied to this refutation with both a formal response and a point-by-point rebuttal of Britannica’s main objections.[4]” [ Then follows an account how Wikipedia is not reliable enough for many or most graduate students; http://en.wikipedia.org/wiki/Reliability_of_Wikipedia ]

    If Wikipedia is no worse or no better than other encyclopedias, the average error is not the fault of Wikipedia. The fault stems from the usual (lack of) quality compounded by web mechanisms such as the rapid loop problem.

    The usual response to that the web makes for fast errors is that it also makes for fast error correction. Try to change Encyclopædia Britannica over all libraries!

    1. “The usual response to that the web makes for fast errors is that it also makes for fast error correction.”

      That. 😉

      Also, misinformation multiplies over the internet via every platform and website you can think of. I’m sure that includes Tw*tter.

    2. Also, the problem was not that the 17 year old was not particularly knowledgeable. The problem was that he was _more_ knowledgeable than the average reader, and decided that his kind of fun was funnier to him than typical editor work.

      1. The problem was that he was _more_ knowledgeable than the average reader,

        … on this particular topic. Or he thought that he was.
        I’m just trying to get my head around the idea of not being more knowledgeable than the average reader on any subject at all. It’s tempting me to actually go out and READ some Nietzsche, in order to get the full depth of despair behind that “the abyss stares back at you” aphorism.

        1. You don’t have to try very hard to know more than the average person, which is sad because knowing more than the average person can be isolating.

          1. This is a problem for the people on the other side of the average. IM(not so)HO.

    3. I recall that 2005 article; I haven’t re-read it in quite awhile, but was unconvinced by its claim of Britannica-Wikipedia equivalence at the time. And I stand by the claim that Britannica would not commission articles by inexpert teenagers.

      GCM

      1. Indeed they wouldn’t. But you haven’t advanced evidence for your hypothesis that this is systemically a bad thing, and particularly not evidence that contradicts the evidence advanced so far.

        How would you demonstrate the hypothesis you’ve advanced?

        The other thing about Wikipedia is that it’s the first encyclopedia in history that adults actually consult on any sort of regular basis. Britannica is something people remember from high school, if their high school was lucky enough to be able to afford a copy.

        So, multiple studies showing it as comparable in quality to Britannica, and unlike Britannica people actually look at it.

        1. Are you really saying that a policy of commissioning articles by inexpert teenagers would be a good thing?

          The way to test that would be to commission a number of articles by inexpert (and, to truly replicate the situation, impish) teenagers, and then evaluate what they come up with. I can’t imagine anyone taking the time or effort to do this.

          GCM

          1. “Are you really saying that a policy of commissioning articles by inexpert teenagers would be a good thing?”

            No, I’m saying that you’ve asserted that this is the key difference, with the implication that this would lead to no circular references – and you just haven’t demonstrated your hypothesis. Further, you’ve handwaved away the evidence against your hypothesis, that Wikipedia has repeatedly tested as being as good as Brtiannica.

            That is: I’m noting errors in your hypothesis, not advancing one of my own.

            1. Actually, I haven’t advanced an hypothesis other than that having inexpert, impish teenagers write articles is a bad idea; i.e. it will lead to more errors. I have no statistical evidence for this hypothesis; I’ve already outlined how to test it, but I can’t see that it’s worth carrying out the test.

              Having articles written by inexpert teenagers is not what leads to circular references; it’s poor citation practice that does that. Wikipedia (and all high speed media) merely facilitates the practice. The xkcd comic referenced by BigDaveSB above (which I’d not seen before) captures it more or less perfectly.

              GCM

              1. Of course the policy of allowing anybody to edit or create articles leads to more errors. But it also leads to more articles. The English version of Wikipedia has about 70 times as many articles as Britannica. And as has been pointed out several times, the available evidence suggests that the overall quality of Wikipedia articles is about on par with Britannica.

                The occasional impish teenager is simply the price you pay for scaling up article production by two orders of magnitude.

  6. There is still one **rdv*rk with ‘a’s in the article and at least one commenter appears to disagree with your efforts.

  7. A bigger problem is that this method of incestuous loop referencing is an actual strategy used by propagandists of all types and you see it on the news all the time.

    1. It’s at the heart of creative judicial reasoning. Judge in case 1 postulates x, judge in case 2 repeats it and by case 3 we have a new legal rule. All 3 cases may involve the same appeal court or even the same judge. Perfect example of the Bellman’s rule of three.

  8. This reminds me of a recent study by a group of Canadian researchers which found that Wikipedia is cited in thousands of health sciences papers:
    Bould, MD, et al. “References that anyone can edit: review of Wikipedia citations in peer reviewed health science literature”. BMJ 348 (2014): g1585. http://www.bmj.com/content/348/bmj.g1585 [open access].

    Some of these citations may be relatively unproblematic (e.g., using a Wikipedia article as an example of a popular (mis)conceptions). Nonetheless, I felt surprised and concerned by the findings.

  9. I REALLY appreciate the specific case given by Dr. Coyne… I would have been skeptical otherwise, having long ago been seduced by the comprehensiveness of Wikipedia, even on very deep specialty topics in molecular biology.

    From a broader perspective, it’s kind of amazing that something like Wikipedia even works, given how the Internet and the public common operates. There are some good criticisms here, but I would contend that Wikipedia has discovered a process of self-correction and criticism that has elements of the scientific method and academic peer review.

    You have to give it a little credit along with the criticism.

    Besides: http://en.wikipedia.org/wiki/Jerry_Coyne

    Wikipedia loves you, Dr. Coyne.

    1. I was curious & had to look into the edit history. Predictably, the page has been bashed at quite a bit, with the latest round of criticism decided to be acts of vandalism (see criticism section), on par with criticisms leveled by the same editor(s?) that have gone after strong atheist Stefan Molyneux (and host of Freedomain Radio). Somebody do correct me if Molyneaux is not a strong athiest… I got that from Wikipedia.

      1. Molyneux is indeed a strong atheist. But what he is mostly is (a) a libertarian, specifically anarcho capitalist (b) a self-proclaimed philosopher so bad that even mises.org called his arguments “preposterously bad” (c) a misogynist[*] and Men’s Rights Activist, who will be speaking at A Voice For Men’s June conference. The Molyneux page on Wikipedia is a puffed-up fanboy piece.

        Your link is to a specific revision, edited by an IP. What is your specific claim?

        [*] this is a personal value judgement. I reached this opinion after five minutes of one of his wonderful videos on the subject.

        1. No specific claim per se, except if I were to make one now, that would be that in this instance, the Wiki appeared to be self-correcting. (i.e. a quick read of the edit history seemed to indicate that people were pulling up philosophical conniption fits based on entries in this.. ur… website – and the entire criticism section had been redacted as of January, and that persons who keep pounding away at their repeatedly refuted criticisms were crossing into vandal territory)..

        2. The main criticisms, as I saw them, were of the variety of “how can Jerry so easily dismiss [x], when he himself claims not to be an appointed expert of [x], especially when unacquainted with [a], [b], [c], [d], etc…” where [a]-through[infinity] are compendia of unevidenced hogwash from theology and/or fringe philosophy usually evincing deities or the necessity of some other form of mind-body dualism.

          The ironic note I make now is that such arguments have been dismissed by Jerry merely by pointing out that the tables should be turned on these wishful thinkers by insisting that they have nothing more to say about biology (or by extension, neuroscience) until they completely devour and comprehend everything written by Hawking, Feynman, Dirac, Weinberg, Einstein, Darwin, etc… the fundamental basis of what we know now to be true on the basis of mountains of evidence. A nice thought.

  10. With respect, I think there’s merit to Wikipedia even barring the occasional errors that appear in its pages. I have three reasons for believing this:

    1. Certain wikipedia pages tend to attract those who would vandalize pages more than other pages; e.g., the pages for certain “controversial” figures in public life. (I had a reference for this but can’t find it right now. And no, it wasn’t a ref to wikipedia.)

    2. The speed with which wikipedia can be edited can make it seem it is more subject to vandalism than other sources, but I think if you look at rate of vandalism as a percentage of number of edits, I think most people would be surprised by how low the number actually is. (Again, no reference but I suspect that data exists somewhere.)

    3. Besides the (in)famous Nature study, there has been recent work intended to better measure wikipedia’s reliability (https://blog.wikimedia.org/2012/08/02/seven-years-after-nature-pilot-study-compares-wikipedia-favorably-to-other-encyclopedias-in-three-languages/). And it seems to continue to do quite well in this regard.

  11. I often use Wikipedia for a quick check or as a source of sources; but no one should use it without corroboration for any serious work.

    The errors themselves are at times of interest. Politically significant pages show clear signs of re-editing by those with axes to grind, a fertile field of study by those with nothing else to do. At times pages show clear signs of boosterism (I have seen a great scholar’s achievements inflated by someone who did not understand them, but wanted to glorify the culture he lived in).

    But this sort of thing happens even without Wikipedia. Someone (Stephen Fry) invented the facetious collective term “a flanj [deliberate mis-spelling] of baboons”, and the term “flanj of baboons” has now found its way into the technical literature.

  12. Without taking away from your clear demonstration of one of the ways in which Wikipedia is most certainly an error multiplier, I’d only add that there seems to exist something deep in the human brain pan that loves to make up facts whole cloth.

    My own obsession has been the world of invented not-so-common but often insidious quotes: The kind of authoritative sounding sentences on which hang serious judgements or that drive ideologies. The Internet proliferates made-up quotes at a seemingly exponential rate, using the mechanisms you’ve described here. I’ve had an open invitation to correct me on (probably invented) quotes from Gabriel Garcia Marquez, William Cowper; and to correct the context often left off quotes from C.S.Lewis, official proclamations from the Vatican, and Eugene Robinson. The stuff does often seem to get made up faster than anyone can follow up with research and correction.

  13. It’s interesting to see how some articles are yanked back and forth by duelling contributors, where a particularly contentious sentence may be edited but the surrounding text left unchanged.

    I once read an wiki entry on Old Testament polygamy where the author spent the entire article trying to deny it existed.

    I noticed a fairly common theme is anything critical about the Catholic Church is quickly rewritten; check out most articles about the Church and the Nazis. I even invented the term “blackwashing” to cover it, though this is the first time I’ve used it publicly- maybe it will get into Wiki.

    1. Sorry but the term “Blackwash” already exists and is referenced in wikipedia already.

      It refers to the West Indies cricket team’s 5-0 annihilation of England in the 1984-85 series.

      the West ibndies

      1. 1984, not 1984-85. Also 1985-86.

        Ah, the bad old days when ENG used to lose 5-0. Good job that doesn’t happen any more.

  14. “in order to prevent Google searches turning up yet more usages of the spurious terms (and thus testaments to their use and verifiability), I have not used either neologism in this post, replacing vowels with ‘*’s”

    As a result, there will be one less Google hit that debunks the terms, and people looking for such evidence will not find this post. Combining each use of the actual terms with the modifier “mythical” or “erroneous” would have been a better tactic, I think.

  15. Working in a library with doctors, nurses, academics & students, I have come top appreciate Wikipedia as a flawed tool. I use it as a start point but then check the references. In this case I would have used the OED to see if that word had a basis in usage. Now it may be adopted & come to be used in which case the dictionary would reflect the usage – that is a matter of language rather than scientific fact though.

    On Wikipedia as a source, this recent article –
    Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions
    http://www.jaoa.org/content/114/5/368.full

    The nub is, “Most Wikipedia articles representing the 10 most costly medical conditions in the United States contain many errors when checked against standard peer-reviewed sources. Caution should be used when using Wikipedia to answer questions regarding patient care.”

    Also on BBC
    http://www.bbc.co.uk/news/health-27586356

    1. It is true that Wikipedia contains errors, but they are not well-distributed across the various disciplines. This is important because it means that people in some disciplines (e.g., mine – engineering) *can* rely on wikipedia, whereas people in other disciplines probably shouldn’t. This is a significantly different situation than if the errors were spread evenly throughout it.

      1. That’s been my experience as well. Some fields are great (old computers) some are spotty (logic) some are pretty lousy for the most part (philosophy), or ridiculous (the ones that suffer edit wars, locking etc.).

  16. A couple of years ago I read a rather disturbing story in Der Spiegel (auf deutsch). The journalist writing the story described how he was having breakfast one morning and there was a knock at the door. He opened the door to three officials (a woman and two men) and a security guard. They were from the tax office and had a warrant to search his apartment as they suspected him of tax evasion. He pointed out that he was still in his pyjamas and could they come back later. They said that was out of the question and demanded to be let in immediately. When he asked what they would have done if he hadn’t been home the lady said something like “we would have effected entry to your apartment”. The journalist was sure his tax affairs were in order and asked why they suspected him.

    The lady opened her case and produced the evidence against him…you’ve guessed it, a print out of the journalist’s wikipedia entry! It mentioned him publishing a book in the USA and the lady said he’d never paid any tax on the proceeds from it. He pointed out that the book was taxed in the states and that Germany and the US have an arrangement in such cases so that people don’t get taxed twice. Nevertheless, they insisted on confiscating all his financial records (several boxes of papers). Months went by without him being able to find out what was going on and then finally he received a letter informing him that the case against him was being dropped and that he should collect his papers from the tax office. If he failed to do so by a certain date they would be destroyed!

    Der Geist von Kafka lebt noch!

      1. Presumably someone processing his tax return saw he was a published journalist and googled his name, then thought they’d “caught him out”. Terrifying to think that a civil servant would use a source anyone can edit as a basis for getting a warrant to search someone’s home.

Leave a Reply