posted 14 Mar 2005 in Volume 8 Issue 6
Zen and the art of taxonomy maintenance; part III: Designing for quality and logical consistency
A masterclass covering the creation, implementation and maintenance of taxonomies in a corporate context. By Jan Wyllie.
If you can’t define Quality, there’s no way you can subordinate it to any intellectual rule. The aestheticians can have nothing more to say. Their whole field, definition of quality, is gone… Life [without Quality] would just be living without any values or purpose at all. Since the world obviously doesn’t function normally when Quality is subtracted, Quality exists, whether it is defined or not…
You take your analytic knife, put the point directly on the term Quality and just tap, not hard, gently, and the whole world splits, cleaves, right in two – hip and square, classic and romantic, technological and humanistic – and the split is clean. There’s no mess. No slop. No little items that could be one way or another.
Pirsig, R.M., Zen and the Art of Motorcycle Maintenance (1974)
From purposes to challenges
Last month, the focus was on clarifying and agreeing purposes of a taxonomy project, and making a business case. The three main purposes of using taxonomies were identified as information retrieval, intelligence discovery and supporting workgroup collaboration.
This month, the focus is designing and building high-quality taxonomies that are fit to purpose. Note the use of the term ‘quality’, here. Taxonomies are judged as much by quality – aesthetics, meaningfulness, user experience – as by logical consistency and grammar, what Pirsig called the ‘square’ aspect of reality. The paradox is that taxonomies combine both rational and emotional aspects at the same time, contrary to Pirsig’s assertion that “no little items… could be one way or another”. Taxonomies seem to be one rather significant exception to this rule.
This inherent confusion creates one of the first challenges that any prospective taxonomy developer will face. Everybody uses taxonomies of sorts to organise their thinking, as well as the objects and documents around them. Food is stored in the kitchen; books go on bookshelves; files go in the filing cabinet, or on ‘that pile over there’. People are very good at remembering and managing their stuff, according to their own handmadetaxonomies, or personal ‘folksonomies’ as they are now being called.
Fighting articles of faith
It is not surprising that people are very much attached, both emotionally and logically, to their own ways of categorising and organising things, because they have invested so much time and effort into developing, maintaining and remembering them that the processes involved are almost second nature.
Along comes the taxonomy developer, full of bright ideas about how to do things better, asking for extra effort and presenting more to remember, while consigning all current practice to the dustbin. No wonder so many taxonomy-design projects become bogged down at the initial stages of consultation with information providers and users in heated arguments about the definition of terms, and an atmosphere of sullen disinterest and seemingly wilful incomprehension.
So, once the business case has been made, the first challenge is achieving initial buy-in among all the necessary participants, whose primary job is not developing taxonomies.
Like learning a language
There is no way of avoiding the fact that designing and implementing a taxonomy requires both motivation and discipline. So be up front about it. Learning to use a taxonomy is like learning a new language, except that it is much easier. Once mastered, taxonomy-based working becomes second nature, just like a folksonomy. The case can then be made that learning this new language will give its users whole new vistas of understanding and possibilities for knowledge sharing. It is exciting stuff.
Good taxonomy designers must engage people’s intelligence and imagination in order to win their commitment to the new taxonomy and its related disciplines and processes. Everybody’s time and effort must be respected in the design – for example, repetitive tasks should automated and thinking tasks should be highlighted.
If taxonomies are rolled out without the commitment of all those involved, then they are unlikely to be a long-term success.
Guided collaborative process
If possible, a guided collaborative process should be used for the development of the first two or three drafts of a taxonomy design. In order to avoid the interminable arguments about words and definitions, all participants could sign up to a declaration stating something like, ‘Because taxonomies are social constructions, they can not be judged as right or wrong, but rather as more or less useful to the group.’ Agreeing to leave final decisions to the taxonomy designer is also a useful rule in resolving deadlocks.
The other problem facing organisations that want to develop and use taxonomies is a lack of expertise and any recognised qualifications, outside the rarefied realms of library and information science, which do not really relate to the problems of an organisational-taxonomy developer. Nevertheless, it is in this group that most of the expertise exists, especially for information-retrieval taxonomies. Expertise in intelligence taxonomies is rarer nowadays, even, it seems, within government intelligence agencies.
The rental option
If creating taxonomies in house and from scratch seems too ambitious, it is possible to buy the rights to use existing taxonomies and thesauri, particularly those focused on informational retrieval. There are, though, three drawbacks to this approach. One is that a taxonomy designed for the purposes of another organisation may not fit the current context. The second is that users will feel that the bought-in taxonomy is being foisted on them. Last, it is hard to decide whether a taxonomy is the right one without using it first for a trial period, something which licences tend to discourage. Also, be aware that the intellectual-property rights to taxonomies and their structures are legally untested. Under these conditions, the wise course of action may be to licence the use of a big, standard classification system, such as Dewey Decimal, for information retrieval.
Assuming that an organisation chooses to develop its own taxonomy, the next step is to create an outline. The first questions to ask concern the taxonomy’s dimensions and scope. How many words and how many heading levels should it have? Once again, this depends on its purpose. Taxonomies designed for information retrieval tend to have more words and levels than those designed for intelligence analysis or collaboration.
One rule of thumb is perhaps worth following: limit the choices at any one level to less than ten, and the number of levels to less than five. Otherwise, users will become confused about where they are, or irritated by having to scroll. If further specification is required, use a free-text-retrieval system, ideally assisted by a thesaurus.
A decision must also be made about whether the taxonomy is designed to help people find only specific items being sought, or sets of meaningfully related items. The first option would be said to have more ‘granularity’ than the second.
As for scope, the key considerations are: which subjects will the taxonomy cover? What will the sources of the data be? Which items of data will be included/excluded? It is time to step back and consider the project from as broad a standpoint as possible. This is a starting point for the process of consultation which, if successful, will bring specific subject-coverage requirements into focus.
One final decision must be taken before embarking on the process of designing and implementing a taxonomy: is it better to do an information audit of possible sources, categorising from the bottom up, or is it better to start with a few very broad categories, assuming that the source base can be meaningfully classified under obvious broad headings such as ‘Research’, ‘Production’ and ‘Finances’? The answer is, look for the broadest headings first in the audit and content analysis of the sources of information to which the taxonomy will be applied. Once the broadest categories and the facets are agreed and tested, the process of establishing lower-level terms becomes predictably iterative and politically much less controversial.
For the most basic document tagging, use the appropriate standards – Dublin Core for content and XML for software compatibility.
Types of taxonomies
Now the choice is which type of taxonomy to use.
Hierarchical taxonomies are the best known and easiest to understand. Under ‘Electronic equipment’, find ‘Computers’, ‘Radios’, ‘Televisions’ etc; under ‘Computers’ find ‘Portables’, ‘Micros’, ‘Minis’ etc. The problem with hierarchical taxonomies is that they have a very limited, literally one-dimensional descriptive power. They are inflexible and require a lot of shoehorning of items into inappropriate categories. For example, ‘Radios’ and ‘Televisions’ could just as well be found in a completely different broader category than ‘Electronic equipment’, perhaps ‘Media receivers’. While this kind of flexibility is ruled out in hierarchical taxonomies, it is permitted in what are known as poly-hierarchical taxonomies. The advantage of the poly-hierarchy is that it provides more than one channel of access. The difficulty is that the logic of its structure is not necessarily consistent. Nevertheless, poly-hierarchies are very powerful information-retrieval aids. They are also of enormous value when building a system from existing incompatible departmental taxonomies.
Multifaceted taxonomies combine the logical rigour of hierarchies with the flexibility of poly-hierarchies. Items tagged as ‘Computers’ in a subject hierarchy could also be tagged with issues, such as ‘Supply’, ‘Demand’, ‘Performance’, ‘Standards’, making a two-faceted taxonomy. The item’s date would add a third facet; its author would be a fourth. If starting from scratch, multifaceted taxonomies are best for all three kinds of taxonomy use – information retrieval, intelligence and collaboration. They can be used to describe as many different facets of an information item as desired.
As noted in part one of this series, a thesaurus-based approach should also be considered. Traditionally, thesauri have been used as back-of-the-book indexing tools, distinct from a front-of-the-book content-organising tool. If the nature of traditional books is to be any guide, then the ideal solution is to combine both approaches: content organisation and indexing. Taxonomy developers should look at thesaurus software such as Synaptica (www.synaptica.com) if they want to take the combined approach.
It should now be possible to complete the process of developing a useable proof of concept or preferably a draft of the proposed taxonomy, plus a description of how it would work in practice.
A question of automation
Once a draft of the taxonomy is available, it is time to think about the issue of automated classification. There is no doubt that software can carry out certain types of classification, such as identifying a geographic location or listing companies into a taxonomy of industries using easy to understand and simple rules. Rule-based software is also able to track sets of user-chosen keywords, the presence of which forces the software to tag the item with a given category. One of the benefits of using this kind of rule-based software is that the processes it uses are relatively transparent to users.
Software that classifies according to its own semantic links between words is less transparent, although vendors ought to be able to describe and give examples of how the linguistic analysis works, providing potential users with at least a theoretical understanding of what the software is doing. A third type of automatic-classification software uses statistical methods, such as vector analysis and rough sets, which, from the potential user’s perspective, can be little more than a black box with inputs and outputs.
The problem is not whether or not to use automated or human classification, but rather how to combine human and automated classification in such a way that allows human intelligence and machine intelligence to be used in the most appropriate functions. Human intelligence gives meaning to concepts, something machine intelligence cannot do. Human understanding and decisions are based on the assumption that other people understand in similar ways. A computer executing a rule or making a statistical analysis is doing something qualitatively different from understanding. Attempts by salesmen to attribute qualities such as understanding and even intelligence to their systems should be treated with a great deal of scepticism.
Only once you are armed with an early draft of your taxonomy, and with clear ideas about the desired roles and relationships between automated and human classification, is it time to look for software vendors. One of the best places to start on this long and arduous journey through claim and counter-claim, marketing hype and numbing jargon is the new, updated edition of the Taxonomies: Frameworks for Corporate Knowledge report, published by Ark Group (www.ark-group.com). The latest edition features a section dedicated to technology, incorporating in-depth evaluations of individual tools and vendors.
Use a standard form to query all vendors, specifying taxonomy purpose and use, estimating the size of application and update frequencies, and describing your desired level of user interaction. Questions about scalability, ease of integration with existing systems and business continuity should also be included on the form. Your objective should be to force vendors to respond in terms of a standard framework, so responses can be easily compared. Finally, before purchase, test the vendor’s software using samples of the taxonomy and the data to be used. Also, test its performance compared with human classifiers. Then, test it again.
Now, if all has gone well so far, the time has come to implement and maintain a trialled and tested taxonomy within your organisation. This will be the subject of the final article in this series.
Please send comments and questions to firstname.lastname@example.org.