posted 7 Feb 2002 in Volume 5 Issue 5
Do you need a taxonomy strategy?
A primer on information architecture and taxonomy development
The evolution of the web from novel technology to critical business application has spurred a surge in interest in the taxonomy and information architecture development, although a general confusion as to what each term actually means still exists. Samantha Bailey discusses the practical meaning of each concept and examines how they can be leveraged.
The terms ‘taxonomy’ and ‘information architecture’ are both being encountered with increasing frequency in the business community, and their emergence in new contexts has raised both interest and confusion. Despite their novelty in the business vernacular, both terms rely on concepts and practices that have existed for decades and, in some cases, hundreds of years. As with so many fields, the advent of the web and the evolution of the internet from novel technology to critical business application has infused these concepts with a sense of excitement and urgency. Knowledge managers are at different points on the spectrum of familiarity and comfort with these concepts; if you’re just beginning your taxonomy work or are curious to learn more, this article is for you. This is a primer on what these concepts mean and how they can be leveraged in our work.
Organising information is a critical component of knowledge management. Information workers are blessed and cursed by both the availability of information and the ever expanding technologies that allow us to manipulate it, but it has become almost a cliché to observe that we’re in the midst of an information explosion in which the amount of information produced continues to outpace technical advances in managing it. Far from a dearth of possibilities, knowledge managers are overwhelmed by their options in approaching, organising and disseminating content. Enter information architecture; a discipline informed by library and information science, design, and even anthropology, which has emerged as both a fundamental approach to organising information in digital environments and a new profession in its own right.
In the design community, information architecture is a term commonly attributed to Richard Saul Wurman in his book, Information Architects. Wurman was fascinated by information design and the way that the presentation of information (in this case, he was focused on the print medium) influenced its meaning and ability to be understood. He defined an information architect as “the individual who organises the patterns inherent in data, making the complex clear”. As the internet reached a critical mass in academia, two entrepreneurs with information and library science backgrounds, Lou Rosenfeld and Peter Morville, began evolving Wurman’s conclusion into a practice more applicable to the web and internet/intranet applications, basing their work on principles of library and information science. Rosenfeld and Morville wrote the seminal book on the topic, Information Architecture for the World Wide Web.
The definition of information architecture has been evolving since the term was coined, and the discipline has emerged as an active field with a burgeoning community. While it is unlikely that any two practising information architects will give identical definitions of the term, there is consensus that information architecture has organisation at its root. Basing my understanding on Morville and Rosenfeld’s approach, I define information architecture as the art and science of organising information so that it is findable, manageable and useful. This definition is a content-intensive interpretation, indicating my bias that information architecture skills are most critical in content rich environments. It also draws on the information retrieval roots of library science, emphasising the importance of being able to find that which one seeks, whether known or unknown. Finally, information architecture is a user-centred discipline, understanding that usability is at the heart of a successful information-based interaction. (Note that information architecture, as it is discussed in the context of this article, is not to be confused with enterprise architecture [such as the Zachman framework for enterprise architecture, a method for understanding enterprise information infrastructures] or data architecture.)
An information architecture strategy is derived through analysing context, content and users systematically. For knowledge managers, context is critical, as it is the sum of our business realities. It might be possible to design the perfect information management or retrieval environment given unlimited time, budget and human resources, but I’ve never encountered anyone who has such luxuries available. Instead, we explore the business context to realistically assess what can be accomplished and to cull the must-haves from the nice-to-haves. We assess the nature of the content to understand how it can best be manipulated and presented. And we examine the site’s users – our audience, clients and customers – to understand both what they want and what they need.
The information architecture strategy refers to the conceptual approach that will be taken with the site. It answers fundamental questions about how information will be made most accessible to a site’s users. For example, will the site be oriented around products and services or will it take a needs-based approach to organising information? Will there be a primary organisation scheme supported by alternate routes to the information via tools like a search engine and FAQs, or will the site have several prominent organisations schemes, such as by organising the information around topic, audience and task? Will searching be based on full text indexing or natural language retrieval algorithms?
The information architecture itself consists of the organisation scheme (which is where classification and taxonomies enter the picture), navigation systems, labelling systems and search. Gathering, analysing and synthesising information in these three areas gives the information architect the insight needed to develop the organisational structures and schemes that form foundation elements of the website. Navigation schemes are introduced to provide access to the structure. Information retrieval is further supported by labelling applied throughout the site, as well as the supplementary navigation tools provided, such as indexes, guides and FAQs, and search.
Knowledge management is a field that can be beleaguered by the impression that it is too esoteric to provide real value to the hard-nosed business world, where successful investments are measured by ROI that can be demonstrated in dollars and cents. Information architecture, particularly for the uninitiated, can fall victim to the same fate. A taxonomy, on the other hand, appears to be a far more tangible entity. In its earliest treatment, the term taxonomy referred exclusively to the classification of plants and animals according to their presumed natural relationships. Reminiscent of the term ‘thesaurus’, which has a different meaning in information retrieval circles than the desk-reference many are familiar with, taxonomy has been adopted as the preferred term to describe the classification schemes and controlled vocabularies employed on websites and intranets.
Before delving into the application of taxonomies, it is useful to understand a few other terms that are used either synonymously or in conjunction with taxonomy. ‘Classification schemes’ and ‘controlled vocabularies’, for example, come from the library science realm. Classification schemes are used for the placement of objects (like books, articles, web pages) into a systematic structure that will support information retrieval; basically developing a system so that you can find what you’ve put away once you are dealing with more than a handful of materials. Controlled vocabularies are similar to classification schemes but have typically been used for indexing, which is often both more precise and more flexible than classification. Indexing is the process of assigning one or more terms to represent a concept, thereby allowing retrieval at a more granular level. For example, we may use the classification scheme to retrieve a book and the index to locate the specific page or idea we seek. In relation to the web, taxonomies are being used in reference to both classification and indexing.
A taxonomy is a structure, a ‘thing’ – we may not be able to touch it or smell it, but it has a recognisable shape. I am convinced that this is part of the reason taxonomies have been embraced; in a world where information management is both hard to describe and hard to do, a taxonomy poses a comforting solidity (and the association the term suggests with Darwin’s survival of the fittest can’t hurt). The problem lies in this very simplicity. It is human nature to long for a silver bullet, and it is the nature of reality to defy simple solutions. Which is not to suggest that taxonomies are simple things. On the contrary, taxonomies (or classification systems or controlled vocabularies) are powerful tools for organising information and they can run the gamut from concise lists to extremely complex, polyhierarchical structures. The important thing to take away, however, is that information management is complex; it requires an holistic approach – a robust information architecture with all its components, of which the taxonomy is just one piece.
Taxonomy has become a popular and widely accepted term in the business community and the field is divided between those who hold that the term taxonomy is synonymous with classification scheme, controlled vocabulary and even thesaurus, and those who insist that the term suggests something entirely different. One of the more carefully justified definitions of taxonomy comes from research conducted by Alan Gilchrist and Peter Kibby, of TFPL and CMG Admiral respectively, in the executive summary of the report Taxonomies for Business: Access and Connectedness in a Wired World. They define taxonomy as “a correlation of the different functional languages used by the enterprise to support a mechanism for navigating and gaining access to the intellectual capital of the enterprise”. Another definition of taxonomy is offered in an article by Jean Graef of the Montague Institute: “Structures that provide a way of classifying things (living organisms, products, books) into a series of hierarchical groups to make them easier to identify, study, or locate.” Graef goes on to explain that taxonomies consist of two parts: structures and applications. “Structures consist of the categories (or terms) themselves and the relationships that link them together. Applications are the navigation tools available to help users find information.”
The primary difference between these two definitions of taxonomy and the definitions of controlled vocabulary and classification scheme is that a taxonomy is seen as both an organisation system and a source of navigation. Neither of the above definitions insist that the taxonomy be the primary or sole source of navigation, but my sense is that when businesses strive to create enterprise taxonomies for their website they are almost without fail using the term taxonomy to describe a Yahoo!-like hierarchical structure that serves as the primary interface to the content. This puts tremendous stress on the concept of the taxonomy (remember, there are no silver bullets), and it may encourage companies to over-focus on a single be-all end-all taxonomy when what is really needed are several small, more focused schemes, such as those employed in faceted classification where the idea is to develop ‘pure’ schemes that address a single facet of a concept.
Classification schemes, controlled vocabularies and thesauri can all be used to great effect on a website, and they can be used as a primary path for navigation. However, they don’t have to be part of the primary navigation system to be effective. They can also be used behind the scenes or in combination with other approaches. For example, a site might have a loosely defined structure that doesn’t really fit the definition of classification scheme or taxonomy very well and it may also have a controlled vocabulary that is employed in indexing the content, giving the site a very powerful search engine and keyword index. In this case, the controlled vocabulary wouldn’t be used as the primary navigation scheme (or, by some definitions, as navigation at all) and yet it would play a critical role in the information architecture of the site. The point here is that all sites with successful information architectures must address classification – how the content is organised – but a taxonomy or controlled vocabulary is only part of the solution.
In traditional definitions of classification schemes and controlled vocabularies the focus on navigation is either not included or is not integral to the definition. When a classification scheme or controlled vocabulary is leveraged directly for navigation in a site, perhaps it is reasonable to call it a taxonomy. From my perspective, the name is not critical and, at present, I have yet to encounter something referred to as a taxonomy on a website that doesn’t fit the definition of either a classification scheme or a controlled vocabulary. One of the most valuable things in a name or label is the point of common understanding; we use labels to clarify what we are describing and to limit confusion. At present, I am equally comfortable with calling a classification scheme or a controlled vocabulary on a website a taxonomy and, since taxonomy has become a widely understood term, perhaps that makes it the best label.
The answer to the question, ‘do you need a taxonomy strategy?’ requires the question to be reframed. Making the information on your site accessible requires classification. There are a variety of ways this can be achieved, and a taxonomy or controlled vocabulary is one of them, representing a powerful tool for organising information and communicating information structure. But classification is not all that will be required. You will also need an information architect who can assess the overall organisational needs of your site and develop a strategy for developing the site’s structure, navigation, labelling, finding aids and search. On a complex site with a significant amount of content, one or more taxonomies will almost certainly play a central role in a successful information architecture.
Pullout 1: A note on terminology
The term ‘taxonomy’ began cropping up in business circles in reference to organising material on the web in around 1999. Initially, I reacted with consternation – didn’t taxonomy have something to do with classifying flora and fauna? In the dim reaches of memory I recalled a mnemonic device about ‘King Philip crossing over from Greece to Spain’ but this seemed to have little to do with my training in information science or my practical experience organising information for Fortune 500 websites and intranets. Ever a librarian at heart, I turned to my trusted reference resources and looked for definitions of and references to taxonomy. As I suspected, virtually everything I found in popular reference put the meaning of taxonomy firmly in the hard sciences camp. Even today, with taxonomy a popular buzzword in information management circles, nine of the top ten results for a Google search on ‘taxonomy’ relate directly to the traditional hard sciences definition.
I soon discovered that this term, which was growing in popularity in management consulting circles, was being used synonymously with ‘classification scheme’ and ‘controlled vocabulary’, terms I was familiar with but that lack both the pithiness of the single term and, I suspect, the association with the hard sciences, lending a legitimacy longed for even in the heyday of web design when traditionalism was suspect at best. I wrote a brief for my colleagues, urging them to stem the flow and explain to our clients that, impressive as the word taxonomy may be, it didn’t really describe much of what was being done or proposed as accurately as classification scheme, controlled vocabulary, or thesaurus, depending on the specific context. Time and time again I’ve entered into spirited discussions on the topic and have carefully explained my reasoning, only to be met with the equivalent of, “I see what you mean, but I still like the way taxonomy rolls off the tongue,” or perhaps more forgivably, “Taxonomy sells.”
There’s an irony here, in that people who work with vocabularies, be they schemes or taxonomies, tend to be the very souls inclined to appreciate semantic subtleties. I’ll admit that my attempt to sway the tide of popular usage has been a complete failure. Sometime in mid-2000 I adopted the attitude that if you can’t beat ’em, join ’em, but I still feel compelled to qualify my use of the term.
Pullout 2: Glossary of terms used in this article
Classification – The systematic arrangement in groups or categories according to established criteria.
Classification scheme – A scheme for arrangement of a collection of information in a systematic sequence, according to subject, and, to a lesser extent, form.
Controlled vocabulary – A limited set of authorised terms (also known as ‘preferred terms’) to be used in indexing (classifying) documents in an information system. May also be used in searching for documents in an information system.
Information architecture – Conceptually: the art and science of organising information so that it is findable, manageable and useful. When applied: the development of classification schemes, organisation structures, navigation schemes, labelling systems, supplementary navigation systems and search capabilities for a website or intranet.
Information retrieval – The process of finding documents or information.
Taxonomy – In the context of this article, a taxonomy is a structured collection of terms, generally hierarchical, that is used for both classification and navigation.
Thesaurus – A compilation of words and phrases showing synonyms, hierarchical and other relationships and dependencies, the function of which is to provide a standardised vocabulary (see ‘controlled vocabulary’ above) for information storage and retrieval systems. In information retrieval, a thesaurus is used to group terms together, allowing all documents associated with a single concept to be found, regardless of the variations in terminology used to describe the concept.
Pullout 3: Taxonomies in action
Controlled vocabularies and taxonomies are in use with impressive results across the web. For example:
- MeSH – While Yahoo! and Amazon have taxonomies or classification schemes that are widely recognisable to the general public, MeSh, the Medical Subject Headings of the National Library of Medicine, is highly regarded by information and medical professionals;
- Epinions.com – Epinions uses a hierarchical classification scheme (similar to those used by Yahoo! and Amazon) to represent the broad array of topics on its site. The scheme is used as one of the primary navigation paths from the main page and is presented in full as a site index;
- Vanguard.com – Vanguard uses a controlled vocabulary with synonyms and related terms in support of its site index.
1. The term ‘taxonomy’ is generally associated with classification and systematics. While ‘modern’ methods of taxonomic classification are attributed to Linnaeus, who introduced his methodology in the 1700s, Aristotle developed a system of classification in 300BC. Information architecture is associated with librarianship, another field with ancient roots reaching back to the library in Alexandria in 245BC.
2. A study by Hal Varian at the University of California Berkeley estimated that “the world produces between one and two exabytes of unique information per year, or roughly 250 megabytes for every man, woman and child on earth. This is equivalent to the textual content of 250 books.”
3. Bradford, P. and Wurman, R.S., Information Architects (Graphis Press, 1996)
4. Rosenfeld, L. & Morville, P., Information Architecture for the World Wide Web (O’Reilly & Associates, 1998)
5. The American Society for Information Science and Technology has a special interest group devoted to information architecture, for example.
7. “Just as reliable as fireworks on the 4 July is another annual event – the recurring quasi-philosophical question of whether knowledge managementis in or out.” Kounadis, T., ‘Getting down to brass tacks with knowledge management’ in DM Review (July 2001)
Samantha Bailey is assistant vice president, Information Architecture at First Union National Bank. She can be contacted at: firstname.lastname@example.org