by Kate Baker
Folksonomies were recognized around the beginning of the 21st century as new internet phenomena in which users, not professionals, added their own keywords (tags) to information objects. These tags could then be used by anyone to sort and share items. Folksonomy, a portmanteau of folk and taxonomy (Vander Wal, 2007), became the word most commonly used to refer to this system of tagging, though ethnoclassification, social classification, and distributed classification persist as commonly used synonyms. As Park notes in her article A Conceptual Framework to Study Folksonomic Interaction, many of the terms in this field are used interchangeably, such as tagging system and folksonomy (Park, 2011, p. 516). However, there are key distinctions between terms. Tagging is the actual process of creating one or more keyword labels (tags) and associating them with a digital information object, such as a website, picture, video, or even a library catalog record. A folksonomy is the classification system that arises from these tags. This paper explores examples of web-based folksonomies as well as how libraries are integrating folksonomies into their catalogs; advantages and disadvantages associated with the use of tags and folksonomies; and where the technology may be headed in the next few years.
Folksonomies vs. traditional classification
Folksonomic classification varies from established metadata schemes in a number of ways. Library classification schemes, to this point, have focused on a top-down categorization using indexes, controlled vocabularies, and hierarchies (Park, 2011, pp. 519-20). However, as most librarians who provide reference services discover, this top-down categorization often uses words that are either unfamiliar or unintuitive to most online public access computer (OPAC) users.
An example of this is the children’s picture book Olivia by Ian Falconer. Olivia is a spunky, red-dress-wearing trouble-maker. She is also a pig. A common keyword search combination created by a patron who can’t remember the title of the book and is used to using Google and Amazon might be “children’s book pig dress.” An Amazon general keyword search in all categories returns Olivia by Ian Falconer as the fourth result (amazon.com, March 3, 2012). This same keyword search in Google returns the official Olivia website as the tenth result with several earlier results referencing the book or author (google.com, March 3, 2012). When this keyword string is entered into the OPAC search for the LYNX! Consortium, the search garners only one result which is not Olivia (www.mld.org, March 3, 2012). Even in WorldCat, Olivia doesn’t show up on the default relevance sorted list of results until sixty-eighth (worldcat.org, March 3, 2012). A look at the subject headings shows the lengthy “Olivia (Fictitious character : Falconer) – Juvenile fiction” and “Swine – Juvenile fiction” as the first two related subjects in WorldCat (worldcat.org, 2012) while “Swine – Juvenile fiction” and “Children – Conduct of life – Juvenile fiction” are the first two subject headings from the LYNX! Consortium (www.mld.org, March 3, 2012).
As evidenced by this example, it is unlikely that a patron will find Olivia through either a keyword search (if they don’t recall the title) or by subject headings. It is even less likely that a child searching for the book would find it in this way. While many argue that the role of librarians is to facilitate this type of search and to provide the necessary reference for these patrons, the fact is that library search systems are increasingly being developed to allow patrons to perform searches on their own. Indeed, most OPAC systems are accessible to patrons 24 hours a day via the internet. This is a service that many patrons have come to desire and expect. The likelihood is that library cataloging (through the use of subject headings) has not kept up with improved service to patrons. This means that libraries have provided patrons with the services they’ve asked for without ensuring that patrons know how to use these services.
One of the main problems with LC subject headings is the structure. The hierarchical subcategories separated by dashes is obscure for most patrons. Further, many find the selection of the words used in the subject headings to be just as confusing. For example, while it may seem perfectly reasonable to a cataloging professional to search for items about World War II under the Library of Congress subject heading “World War, 1939-1945,” most patrons using the catalog will be unacquainted with this type of terminology. This is the type of categorization Shirky discusses in his article Ontology is Overrated: Categories, Links, and Tags where he argues that a collaborative agreement created from the bottom up, that is by the users, is more valid than one view imposed from the top down, by professionals (Shirky, 2005). Willey cites Mathes argument that the community’s influence over users’ tags creates a de facto top-down organization. For example, when users add or edit their tags for an object based on what others in the community have done, the community creates a controlled vocabulary on their terms (Willey, 2011). Furthermore, Willey cites research from Stvilia and Jorgensen which indicates that group moderated collections within a folksonomy also exhibit a top-down mentality (Willey, 2011).
Speller continues this discussion by explaining Surowiecki’s four steps needed to develop this collaborative agreement: diverse opinions, independent decision-making, decentralization of power, and a way of aggregating opinions (Speller, 2007). Tagging and folksonomies provide users with the means of reaching this collaborative agreement. Because most current systems that facilitate tagging don’t require any sort of text verification or controlled vocabulary, the diversity of opinions allowed in tagging is limitless and users independently select which tags they will use. As there are no professionals or catalogers, the power in the system is always in the hands of the users. Finally, folksonomies provide the aggregation of these opinions in the form of systems such as Flickr, LibraryThing, and Pinterest.
Park relates tagging and folksonomies to information foraging and scent theory. Park explains foraging as “looking to see what is available, browsing and gathering, relatedness” (p. 518) while information scent is explained as the “user’s perception of the value and cost of accessing a piece of information based on perceptual clues available (p. 516). Folksonomies, by presenting tag clouds, improve the user’s perception of related resources. A tag cloud is a visual grouping of the words that all users have attached to an object through tagging. Words are usually ranked in a tag cloud by frequency, so that the larger a word appears in the grouping, the more people have used that tag in association with the object. This collection of perceptual clues gives the user the opportunity to follow the information scent to objects that other users have labeled as relevant. If a user searched for the tags “dog training,” a tag cloud might appear with “dog behavior” and “Cesar Millan,” allowing the user to click through and access other objects through foraging and scent.
Opponents have been quick to point out that tagging and folksonomies provide a number of difficulties in terms of information organization. First, and foremost, is the lack of any regulation or standardization for tags. Free text means just that users can enter whatever text they want, from keywords to punctuation to numbers to sentences. Other contributing factors to this lack of precision include variations between sites in what types of tags are allowed. Some only allow one word, so spaces may be left out between words to comply with this requirement (such as spaceshuttle) while other tagging sites allow anything within a certain character count. This means even a single user may not be able to guarantee uniformity to their own tags spread across a number of different folksonomies. Combined with differences in languages, plural and singular word forms, synonyms, and misspellings, it seems unlikely that any defined set of useful tags would arise from varied users (Willey, 2011). However, Avery points out that “one of collective indexing’s greatest strengths” [is] “the ability to quickly overcome sub-par agent-level indexing through a rapid emergence of stability at the global level” (Avery, 2010).
Several studies have shown that folksonomies provide a number of user services that are not currently being met through traditional cataloging. According to the research quoted by Willey, LCSH have a limited cross-over with user-created tags, meaning objects are classified using two sets of distinct terminology. As a result, user queries have fewer zero-hit results since their search terms are likely to be found either in the user-created tags or LCSH which improves access to objects (Willey, 2011). The tendency of folksonomies to self-correct means that, generally, the more useful an object, the more users tag it. The more users tag it, the greater the number of tags and the more likely it becomes that tags applied to the object become increasingly easier to discover.
The three most common types of metadata shed light on the likelihood that folksonomies will replace traditional library cataloging. As enumerated by the 2004 NISO booklet Understanding Metadata, metadata is categorized as descriptive, structural, or administrative. Descriptive metadata refers to information about an object that can be used for discovery, such as title, author, and subject. This is the type of metadata that tagging can best capture, as most users tag descriptively to help them find articles or images later. Structural metadata describes “how compound objects are put together,” while administrative metadata gives information about when and how an object was created as well as how that object has been modified or adapted over time (NISO, 2004, pp. 1). Tagging has been least successful at capturing structural or administrative metadata simply because most users don’t find it helpful in accessing objects in the future.
To further understand this, it is helpful to examine the needs of three types of metadata creators: professionals, authors, and users (Speller, 2007). Up until the last few decades, nearly all cataloging and categorization has been completed by professionals: information scientists and librarians. Professionals are deeply committed to publishing all three types of metadata. Descriptive metadata helps them find objects (and helps them help users find objects); structural metadata provides necessary information about the organization of an object; and administrative metadata helps professionals track and assess the authority of an object, as well as ensure proper maintenance, rights management, and preservation.
As authors move beyond creating content to publishing their content (whether physically or online), it has become more useful for them to create metadata associated with that object. Authors are likely to find administrative metadata quite useful. An author who publishes digital pictures may find that having metadata recording the creation date as well as adaptations and changes to the object will help them manage their photos and collection over time. Additionally, this author is likely to include rights management metadata in the object to ensure proper attribution. Finally, authors may also use descriptive metadata, though to a lesser degree than users, to help them organize, group, and find objects.
End users generally don’t require any of the administrative metadata that professionals and authors find valuable. Because users are most often using metadata (in the form of tags) to store resources for later recall, their choices will be specific to that purpose. For example, if a user is compiling resources about kitchen design, structural and administrative metadata do not prove useful for future retrieval via keyword (tag) searches, while descriptive metadata does. Adding descriptive tags such as “kitchen” to all objects in their collection, “table” to some objects, “lighting” to another subset of objects, and so on, the user will be able to access any or all of these related items through querying a single word. However, if the user adds administrative tags describing the creation date of objects in the collection, there will be little significance in the groupings, and, in the absence of descriptive metadata, the user would have to seek out each item individually by recalling the creation date.
Images on the web post a unique set of problems. While a number of folksonomies exist for images, such as Flickr, the very personal nature of many images, and the specificity of terms used to facilitate retrieval, makes tags nearly meaningless for collaborative purposes. For example, names are a common tag type in Flickr, and unless the person named is famous, they hold little value for other users. The flip-side of this argument is the low likelihood that the images themselves will hold value for other users.
A website that attempts to address this issue is Games with a Purpose (Gwap). Gwap provides a number of “games” that encourage players to generate metadata for images, songs, and videos. In the ESP Game, two random users are paired and shown the same image. Users type in words that they believe describe the image and/or that their partner may choose. When both agree on a word, it is recorded and the players move on to a new picture. These tags are then associated with the images and affect search engine query results in the future (Gwap, 2008).
The future of folksonomies
Folksonomies have the potential to change library catalogs. Based on the data and studies available at present, it seems unlikely that folksonomies will replace traditional metadata structures. The evidence does suggest that a hybrid approach can improve recall, search result relevancy, and usability. While not all integrated library systems (ILS) will support incorporating folksonomic techniques into the user interface, Willey offers a number of examples of libraries that have integrated tagging into their catalogs. The University of Pennsylvania’s PennTags system allows users to tag URLs, journal articles, and OPAC records without limits (Willey, 2005). The California State University-Northridge Oviatt Library uses LibraryThing for Libraries tags (2011). LibraryThing attempts to mitigate problems with tagging by providing a database of 84 million tags, and using a review process to determine whether new descriptors facilitate location (LibraryThing for Libraries, 2010).
Folksonomies, and the ways that they support user-generated metadata, are evolving. Organizations like Gwap give users incentive to improve search engine query results by providing entertainment. Such creative approaches are low cost and may prove even more effective as user data drives improvements. New and better social tagging sites are made available each year. Some, like Delicious, Flickr, and Pinterest, garner huge public responses and support, rapidly developing into giant folksonomies almost overnight.
The success of folksonomies offers libraries an opportunity to provide better service. By understanding search behaviors such as information scent, foraging, and community generated subject terminology, libraries can adapt tools to meet the needs of their patrons. Making library processes more user-friendly and intuitive increases their relevance and utility. In an increasingly visual world, descriptive metadata has become king. What better way to capture the discoverability of objects through descriptive metadata than by providing federated search systems that incorporate folksonomies.
Kate Baker is the Bookmobile Coordinator for the Meridian Library District and is working on her MLIS.
Avery, J.M. (2010). The democratization of metadata: Collective tagging, folksonomies and Web 2.0. Library Student Journal, 5. Retrieved from http://www.librarystudentjournal.org/index.php/lsj/article/view/135/268
Gwap. (2008, May 13). Hello world [Web log post]. Retrieved from http://blog.gwap.com/2008/05/hellow-world.html
LibraryThing for Libraries. (2010). Catalog enhancements. Retrieved from http://www.librarything.com/forlibraries/
National Information Standards Organization. (2004). Understanding metadata. Retrieved from http://www.niso.org/publications/press/UnderstandingMetadata.pdf
Park, H. (2011). A Conceptual Framework to Study Folksonomic Interaction. Knowledge Organization, 38(6), 515-529. Available from http://www.isko.org/ko.html
Shirky, C. (2005). Ontology is overrated [Web log post]. Retrieved from http://www.shirky.com/writings/ontology_overrated.html
Speller, E. (2007). Collaborative tagging, folksonomies, distributed classification or ethnoclassification: a literature review. Library Student Journal, 2(1), 1. Retrieved from http://www.librarystudentjournal.org/index.php/lsj/article/view/45/58
Vander Wal, T. (2007, February 2). Folksonomy coinage and definition [Web log post]. Retrieved from http://vanderwal.net/folksonomy.html
Willey, E. (2011). A cautious partnership: The growing acceptance of folksonomy as a complement to indexing digital images and catalogs. Library Student Journal, 15. Retrieved from http://www.librarystudentjournal.org/index.php/lsj/article/view/227/314