by Alex Kyrios
Baga, Hoover, and Wolverton recently assembled a “webliography” of free online resources for catalogers (Baga, Hoover, & Wolverton, 2013). The webliography is itself a very helpful resource, especially for institutions short on personnel, money, or both. But the authors neglected to mention a prominent source for cataloging tasks such as classification and authority control, as well as help for other library professionals, from reference librarians to resource selectors. Like those listed by Braga et al., this source is free online. That resource is Wikipedia.
Wikipedia has been a fixture of the internet for years now. The site was formally launched on January 15, 2001. It came to prominence some years later and is, as of April 19, 2013, the sixth most popular website on the internet according to Alexa.com. (While Wikipedia is a multilingual project, with 286 languages represented as of April 19, 2013, this study will only examine the English Wikipedia, the largest and oldest version.) Perhaps the best known aspect of Wikipedia is its openness. Generally speaking, anyone may contribute to its encyclopedic articles by adding or removing anything. This fact colors most people’s judgments of the site. If you believe in the virtues of “crowdsourcing,” you’re likely to see Wikipedia as one of humanity’s most impressive knowledge-organizing ventures. But if you’re skeptical of crowd wisdom, you’re more likely to see the project as an impressive sandcastle on a beach, just waiting for a malicious child or the tides to destroy it. It would be wrong to discount the dangers Wikipedia’s open model represents. In most cases, there’s nothing stopping a person from adding false information (or deleting legitimate information) from an article, so taking anything on Wikipedia at face value is ill-advised. But it would equally be wrong to discount the legitimacy and utility of Wikipedia on such grounds.
It’s no wonder librarians can be skeptical of Wikipedia. At first glance, its mission seems incompatible with ours. Even as librarians make active efforts to engage users, such as through demand-driven acquisitions, the library remains a largely top-down model. Bibliographic experts select the best resources to make available to their users. No one would assert that every library resource is of unimpeachable reliability, but ideally, at least, every library book has an invisible seal of approval, an indication that the book is useful and good in some sense. (This can range from obvious quality educational materials, such as Stephen Hawking’s A Brief History of Time, to popular leisure reading like the latest John Grisham novel, to primary source documents whose educational value is quite removed from its actual message, such as Mein Kampf.) But Wikipedia appears to have no such filters. In a system without the sort of filters that have served libraries so well, the reliability of information seems compromised. One popular writer has dismissed Wikipedia by summing up its philosophy as “Experts are scum” (Sjöberg, 2006). But this reflects a fundamental understanding of how Wikipedia works, even if it is one that many librarians may share. Used properly, Wikipedia can be a powerful tool for librarians of all types. And a librarian who knows how to make the most of Wikipedia can be a great resource for users of any library.
There’s no shortage in scholarly literature—library literature in particular—of voices warning of the danger of Wikipedia. Behrends says, “Perhaps no website has been the object of as much derision by the library community as Wikipedia” (Behrends, 2012). Popular sources haven’t spared Wikipedia, either. The encyclopedia represents everything that worries businessman Andrew Keen. In his The Cult of the Amateur, he decries Wikipedia’s “citizen-editors” for “defining, redefining, then reredefining truth, sometimes hundreds of times a day” (Keen, 2007, p. 20). The author of Lazy Virtues: Teaching Writing in the Age of Wikipedia, Robert E. Cummings, opens his book by saying “If you know anything about Wikipedia, chances are… you fall into one of two groups: you are either curious about Wikipedia… or you are worried by it” (Cummings, 2009). I would add that if you know anything about Wikipedia, you probably know that it’s open for anyone to edit its articles. This is broadly true. But in practice, Wikipedia has a multitude of defense mechanisms designed to prevent people from maliciously exploiting its open nature. Probably the simplest form of such malicious activity is referred to as vandalism. Popular and scholarly definitions of Wikipedia vandalism can vary from the official description, which excludes other behaviors classified more broadly as “disruptive editing.” For the purposes of this discussion, vandalism will entail any editing not made in good faith, including insertion of nonsense, deletion of legitimate information, or deliberately introducing factual inaccuracies.
In Wikipedia’s early days, vandalism posed a very real threat to the integrity of the project. Then, a determined vandal could quickly go from page to page defacing or deleting content, leaving it to good-faith editors to clean up his or her mess. Especially if undertaken at hours when most Wikipedians could be expected to be asleep, such attacks could cause serious, if not permanent, damage. But even in these early days, Wikipedia automatically tracks the changes made with every single edit. Older versions can easily be restored, so an individual act of vandalism can be reversed in about a minute. It wasn’t long before script-savvy Wikipedians began designing “bots,” automated accounts made to perform repetitive tasks, including identifying vandals and undoing their work (Adler, de Alfaro, Mola-Velasco, Rosso, & West, 2011). Automated tools such as Huggle and Twinkle are granted to editors in good standing, allowing them to revert vandalism in mere seconds, in just a few clicks of a mouse. Wikipedia’s software also allows administrators to protect certain articles, disallowing either anonymous users or, in extreme cases, all users from editing them. Article protection can occur for fixed periods or indefinitely. For example, the article on Yolo, California was temporarily protected after suffering extended vandalism related to the popular motto “you only live once” (YOLO). As of April 21, 2013, the article on Barack Obama is indefinitely protected from anonymous editors and brand-new accounts, having been the target of extensive derogatory vandalism. While vandalism will be a fact of Wikipedia at least as long as it allows anonymous editing—and founder Jimmy Wales has made clear that this essential feature of Wikipedia will not be discontinued—the threat it represents has been effectively diminished from a horde of barbarians at the gates to an irksome fly, easily combated and ultimately harmless.
Vandals aren’t the only bad-faith users on Wikipedia, but they’re generally the only kind that will compromise the encyclopedic content. For example, trolls—users who goad good-faith users into attacks or pointless arguments—have also been identified as a threat to Wikipedians’ morale (Schachaf & Hara, 2010). However, unless trolls themselves engage in vandalism, there is little reason to think their behaviors could directly introduce errors into articles.
These hazards should not blind librarians to the powerful tool Wikipedia can be in the right hands. Some of these benefits, in fact, come directly from the hazards. For reference and instruction librarians at the academic level, school librarians, or any information professional helping others assess the quality of resources, the ever-present possibility of errors on Wikipedia offers an endless stream of teachable moments in resource assessment. A statement on Wikipedia without a corresponding reference (generally formatted as a footnote) should be counted no better than a rumor. A statement with a reference the user can follow and verify against, however, will give that user experience in critically assessing specific claims of fact. (For bonus points, press them to find more resources to confirm the observation further.)
But focusing on Wikipedia’s shortcomings is to miss the forest for a few dead trees. Wikipedia has rightly been praised as “amazing” and “one of the best encyclopedias” (Sunstein, 2007, p. 12). For every bored teen who inserts obscenities into an article, there’s a competent researcher introducing real information supported by quality sources. For better or worse, users are increasingly relying on Wikipedia over libraries (Ockerbloom, 2013), and many of them consider it a credible source (Doueihi, 2011, p. 80). A librarian who teaches those users how to responsibly assess credibility does a service to Wikipedia and the profession alike.
This is not to say that librarians must reach out to Wikipedia; the relationship works both ways. Libraries have much to offer Wikipedia, whose references exhibit FUTON (FUll Text On the Net) bias—libraries’ print holdings represent an area of opportunity for collaboration. Wikipedia volunteers know this. The Wikipedia Loves Libraries initiative, started in 2011, seeks to improve engagement between Wikipedia and libraries. Several types of events are coordinated by Wikipedia Loves Libraries, but perhaps the most promising type is the “edit-a-thon.” In these, a library (or archive) hosts local Wikipedia editors who use the library’s resources to add sources to Wikipedia, improving the encyclopedia and bringing people into the library. Furthermore, these sessions are an opportunity for librarians to familiarize people with their collections and form community partnerships. Some edit-a-thons are free-for-alls, while others focus on a specific topic. The Smithsonian hosted an edit-a-thon dedicated to improving coverage of women scientists; the University of North Carolina at Chapel Hill hosted one focusing on notable African Americans in North Carolina.
As a digital resource, there are many ways in which Wikipedia has incorporated library data, with researchers on both sides seeking more. Some of these are quite simple. For example, users citing a book as a source on Wikipedia can provide the book’s OCLC number, which will automatically generate a WorldCat link for that book. Casual editors may more often only have the book’s ISBN, but it is entirely within the realm of possibility for someone to design a bot to match up ISBNs with OCLC numbers and supply the latter number in Wikipedia articles. In fact, one bot has already started on a similar task. Following a 2012 OCLC proposal (Klein & Proffitt, 2012), many biographical articles now include authority file identifiers, unobtrusively located at the bottom of the articles. An OCLC-designed bot matched Wikipedia article names with VIAF authority files, which can consequently be used to integrate other authority identifiers, such as LCCN or GND, the German National Library’s system. In the future, similar endeavors could link library data into Wikipedia on geographic and topical entities. (Keen might feel vindicated, however, at the reaction of a few Wikipedians to OCLC’s proposal. Unfamiliar with the concept of authority control, the phrase was attacked as “authoritarian,” “fascistic,” and “totalitarian.”) OCLC has hired a “Wikipedian-in-Residence,” who has coordinated the authority control initiative and presented on ways to integrate the missions of Wikipedia and libraries (Klein 2012).
Coming back to the idea of Wikipedia as a cataloging resource, there are several functions Wikipedia can offer a cataloger. First is its impressive structure of categories. Casual readers of Wikipedia may never notice categories, which are listed at the bottom of every page (only some new articles lack categories altogether, and volunteers quickly categorize such articles). The article for Melvil Dewey, for example, has been placed in categories such as “American librarians,” “Amherst College alumni,” and “People from Jefferson County, New York.” The solenoid, a coil device used in physics and engineering, is in the “Electromagnetic coils” category. These topical categories frequently correspond to library knowledge organization systems such as Library of Congress Subject Headings (LCSH). Wikipedia’s categories are structured hierarchically, like LCSH’s. As a cataloger, I’ve found these categories invaluable guides in classifying materials on subjects with which I’m unfamiliar. For example, I may have a thesis about a certain species of fish which does not have a discreet LCSH term for its genus or species. But if I can find a Wikipedia article on the species, I can follow its categories up through the species taxonomy until I find a grouping with an LCSH established. Generally speaking, Wikipedia categories are especially reliable. They’re not visible enough to casual readers to be the target of vandalism, and categories themselves are very difficult to vandalize. A savvy vandal could still place an article in an erroneous category, however. As with the encyclopedic content, Wikipedia should only be a starting point.
Wikipedia can also be a good resource for authority work. If a person is the subject of a Wikipedia article, there’s a good chance there are birth and death years in the article. The categories come in handy for this as well—articles on people are categorized by years of birth and death. Dewey, for example, is also in the “1851 births” and “1931 deaths” categories. Especially for living subjects, the articles may also contain links to official websites or CVs, both of which can be invaluable for the creation of an authority file. Additionally, Wikipedia’s human name disambiguation pages collate existing articles with shared names, from the dozens listed at John Smith to the two at Aníbal Acevedo. (Occasionally these pages won’t exist when there are just two entries. As of April 21, 2013, the article at Sanford Berman describes the cataloger, with a link at the top of the page to Sanford I. Berman, a philanthropist.)
Some librarians may regard Wikipedia with skepticism, distrust, or even jealousy. But the free encyclopedia shares many of the values and goals that libraries do, and many of its editors are motivated by the same values that guide many librarians. Wikipedia’s open model may pose some risks, but its overall reliability has been vetted and found comparable to traditional, top-down encyclopedias (Giles, 2005). Especially compared to those competitors, Wikipedia offers many opportunities to promote library resources and forge partnerships to keep libraries relevant in the digital age. In an interview early in Wikipedia’s lifespan, founder Jimmy Wales described the project, saying, “Imagine a world in which every single person on the planet is given free access to the sum of all human knowledge. That’s what we’re doing” (Miller, 2004). This quote has frequently been referred to as a summary of what Wikipedia stands for. It’s a utopian sentiment. But it’s one that many librarians have been pursuing at least since Alexandria. Libraries can join this effort, and in doing so we’ll improve Wikipedia and the institution of libraries alike.
Adler, B.T., de Alfaro,L., Mola-Velasco, S.M., Rosso, P., & West, A.G. (2011). Wikipedia vandalism detection: Combining natural language, metadata, and reputation features. In Gelbukh, A. Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20-26, 2011. Proceedings Part II. Berlin: Springer. doi: 10.1007/978-3-642-19437-5_23
Baga, J., Hoover, L., & Wolverton, Jr., R.E. (2013). Online, practical, and free cataloging resources: An annotated webliography. Library Resources & Technical Services, 57(2), 100-117.
Behrends, S. (2012). Libraries vs. Google in the 21st century. The Idaho Librarian, 62(2). http://theidaholibrarian.wordpress.com/2012/11/12/libraries-vs-google/
Cummings, R.E. (2009). Lazy virtues: Teaching writing in the age of Wikipedia. Nashville, TN: Vanderbilt University Press.
Doueihi, M. (2011). Digital cultures. Cambridge, MA: Harvard University Press.
Giles, J. (2005). Internet encyclopaedias go head to head. Nature, 438(7070), 900-901. doi: 10.1038/438900a
Keen, A. (2007). The cult of the amateur: How today’s internet is killing our culture. New York, NY: Doubleday.
Klein, M. & Proffitt, M. (2012). Linking library data to Wikipedia, Part 1 [Video file]. Retrieved from http://www.youtube.com/watch?v=uwwTNmJUQ8w
Klein, M. (2012). Wikipedia and libraries: What’s the connection? [Video file]. Retrieved from http://www.youtube.com/watch?v=jcWmYIF5TMs
Miller, R. (2004, July 28). Wikipedia founder Jimmy Wales responds. Slashdot. Retrieved from http://slashdot.org/story/04/07/28/1351230/wikipedia-founder-jimmy-wales-responds
Ockerbloom, J.M. (2013). From Wikipedia to our libraries [Web log comment]. Retrieved from http://everybodyslibraries.com/2013/03/04/from-wikipedia-to-our-libraries/
Shachaf, P. & Hara, N. (2010). Beyond vandalism: Wikipedia trolls. Journal of Information Science, 36(3), 257-370. doi: 10.1177/0165551510365390
Sjöberg, L. (2006, April 19). The Wikipedia FAQK. Wired. Retrieved from http://www.wired.com/software/webservices/commentary/alttext/2006/04/70670
Sunstein, C.R. (2007). Republic.com 2.0. Princeton, NJ: Princeton University Press.
Alex Kyrios, Metadata and Cataloging Librarian, University of Idaho