posted 10 Jun 2003 in Volume 6 Issue 9
Is a whale a hippo or a horse?
Taxonomies have stepped out of the life-sciences world and are demonstrating their value within a business context. Mark Field outlines three basic steps to creating a corporate taxonomy that touch on ninja librarians, aquatic carnivorous horses and anorak sideshows.
Taxonomy is an intellectual and practical discipline traditionally located within the life-sciences sector. It is the labelling, systematising and general tidying up of the results of fieldwork, laboratory work and theorising in botany and zoology. Biologists who concern themselves more with taxonomy than original research are often characterised as slightly dull, pedantic and a little anal-retentive. However, the life sciences without taxonomy would scarcely be sciences, assuming they could be done at all.
I’m not going to offer a tight definition of the idea of a business taxonomy and its relationship with classification, metadata, thesauri and information architecture. Current understanding is fuzzy, but that has meant that many disciplines have come together in this emerging field, and that is a good thing; we need all the methodology we can get. Taxonomy is about describing things and their relationships, particularly their similarities. But what has this got to do with knowledge and, in particular, content management?
What indeed? Bear with me for a few paragraphs of highly challenging technical discussion. Content is stuff, stuff that is not a great homogeneous blancmange, but all sorts of different stuff, and some of it is useful. Some of it is critical to the survival of
organisations. We might say that Enron and Barings died from a lack of stuff. That stuff is information and it comes in many different forms. So, we have a lot of information in many formats. That’s nice.
Well, it would be if we could only understand what it all was and how we can use it. That is, really use it: turn it into money or efficiency or saved lives or bins emptied or whatever it is we do.
For example: intranets. Some organisations have had an intranet, or several intranets, for five years or more. Some are just constructing their first intranet. These are the lucky ones, because they have a chance to create an intelligible structure for this fast developing means of sharing information. Most five-year-old intranets are, by now, almost unmappable. They are riven with departmental silos, irreconcilably different records for almost identical information, dead information, partial information and just plain wrong information.
There is a quaint belief that unmapped wastes of information can be explored á la Star Trek with search engine ‘sensors’. This is a religious matter. Either you believe it or you don’t. I think it is the stupidest load of tosh I have ever heard. I love search engines – I have the best freeware search engine installed at home: Personal Librarian, £1,500 a seat to license in 1990, and now free. I know what it can do, and what it can’t do is detect complex and marginal, but nonetheless highly valuable, associations between documents.
Let’s continue with the technical discussion: ‘stuff’ is made up of ‘things’. The nature of things is often complicated by the context of the person wanting to use the things. For example, the legal and the technical approaches to a document on environmental remediation and, let’s say, the removal of heavy metals from brownfield sites prior to redevelopment as residential/retail mix, are different. For the environmental scientist, for the lawyer, that document will be associated with different sets of other documents. Historically, one group has ‘owned’ it, and it may well have been effectively hidden from the other. But the crudest description of the document sensibly listed in a central place changes all that. I must emphasise that this idea of sensible lists – which is what taxonomies, thesauri, classifications, and indexes are – is smarter and more agile than may first appear. I’ll return to this later.
Taxonomies can therefore make sense of what appears to be a mass of jumbled things. It doesn’t much matter whether we’re looking at rats, elephants or cod, or documents, scanned pamphlets, multi-format hypertext compound documents or pictures of the kids. A taxonomy that describes what these things are, and describes the relationships between them, is bound to do some good. If you can make it work without the dull, pedantic people taking over. But perhaps they’re not so dull and pedantic. It is unlikely that you have employed a herd of biologists to manage your information; but it is likely that you’ve got a cadre of information technologists (cadre being a good word for secret societies like IT professionals), and you may even have a few librarians tucked away somewhere. Using what you’ve got, without spending a lot on a 56th-generation psycho-semantic content-engineering environment with a touch-screen interface, can you create effective taxonomies that improve the exploitation of your content and can be managed within the current scale of operating costs? Oh yes.
There is a problem: finding workable models. The life-sciences approach works for the life sciences because, while it can tolerate and even inform debate about whether a whale is a hippo or a horse (smart money reckons that it’s a horse, descended from a large aquatic carnivorous horse), ultimately it must be a clear system to which every biologist in the world subscribes: rattus rattus is rattus rattus, and not rattus norvegicus. But rats can be all sorts of things: pests, totems, gods, a food resource, human adulterers, a popular evocation of late Edwardian upper-class male society – OK, strictly speaking that’s a water vole, but you get the drift.
In the real world, we do not all agree on how we work with things: we want – we need – to work with the same objects, mostly documents, each in our own different ways. We have different views of the world. Views are important, and views are not catered for by the biological model of taxonomy. Life sciences would become a fantasy if one group of biologists decided to lump all ‘long’ animals together and created a class that included snakes, stoats, centipedes and dachshunds, and another group decided to create a class of ‘pink’ animals comprising flamingos, tropical fish and mice that had been dyed pink. Biological classification is a powerful global language, which moves quickly from debate to discipline. As a model for corporate taxonomies it would encourage an increasing fixity of approach to markets, products and customers that would kill a modern business.
Bibliographic classification presents us with a quite different model: there is still an unambiguous view of the thing, the object, again, mostly documents, but there is also a facility to describe the document in different contexts. Librarians look at content in the context of the needs of their organisations. If we had organised the stores of information we are responsible for by whimsy rather than by the needs of the business, there would be fewer of us around the corporate sector than there currently are. Which is not saying much. That lack of skilled intervention in the content design of information architectures, and intranets in particular, is why so many are messy and obstructive.
Enough fooling around. The needs of the business must determine the shape and operation of information architectures, and most particularly the corporate taxonomy. Taxonomies designed to match organisational reporting are fine for the HR department but they do not reflect day-to-day operations: the working information capital is generated in a complex of formal and informal processes shared in varying degrees by formal and informal groups.
Corporate taxonomies must be shaped by corporate objectives. If the nature of the business is to do with delivering public services, then the ‘public’ and the ‘services’ are the defining core elements of the taxonomy. If, to continue in this simplistic vein, the nature of the business is about being the biggest toiletries manufacturer in the world, then it is ‘markets’ and ‘products’. ‘Markets’ immediately finds a tension: is the next layer of detail organised by geography or product type? Unilever and Procter and Gamble have, in very crude terms, swung between seeing their operations as primarily regional or primarily product focused.
How is a good corporate taxonomy created? There are a few simple steps that seem to appear in all successful taxonomy projects. It would be impossible to be exhaustive, but we can look at some obvious steps, and some of the detail within those.
Step one: surveying. If you don’t know what you’ve got you can’t do anything with it. Incredibly (to me, at least) many taxonomy projects are entirely driven by a small clique, with no reference to the organisation at large. These will fail.
This stage is as much about mapping the human geography of the organisational information landscape as recording and characterising the actual stores of information: we are looking for the formal and informal groups that create, use and exchange information. Who holds what information for what purpose? With whom do they share it? The map of corporate information can become pretty complex, unless a clear sense of business process informs it: manufacturing has a clear sense of its business process; sales has a clear idea of its processes, as has R&D; all are equally valid and all can be accommodated in a well thought-through taxonomy
Step two: designing the structure and management. I’ll mostly deal with the structure here. The design of taxonomy-management models is so dependent on local conditions that it is difficult to extract some simple management rules. Every instance of a taxonomy project that looks halfway feasible has tended to have within it a group that is capable of both advocacy and technical implementation, but these groups are very different in each case.
There are, however some basic taxonomy structures that have been shown to apply in almost any setting. Many, maybe most, corporate taxonomies are polyhierarchies with descriptors. This is not quite as obscure as it seems. A hierarchy is something we all understand: when we visualise it we think of an upside-down tree, with a small number of encompassing groups or categories at the top, each broken down into larger numbers of smaller or more specific groups or categories, which in turn may include further smaller groups. It is a structure humans are peculiarly comfortable with, crossing all cultural boundaries.
Simple hierarchies are easy to explain, and in simple organisations, or extremely formal organisations, they are robust and quickly become embedded in everyday life. In complex systems, like large corporate organisations, they have some serious drawbacks. Going back to our remediation document, for the scientist it belongs to a category of operational documents, it may refer to various series of more specific technical guides, and it may itself be listed in a higher-level guidance document with similar operational documents: all nicely hierarchical. However, for the lawyer this same document may be used at, or close to, policy level, and may be located in a group of documents dealing with risk at the highest levels in the organisation, listed in a document only used by the chief technology officer of the organisation. The document now has not only justified places at two different levels in the hierarchy, it has more than one ‘parent’ document, and different sets of ‘child’ and ‘sister’ documents.
The simple hierarchy has broken down. But all is not lost. A polyhierarchy acknowledges and describes these multiple relationships. A polyhierarchy does not only accommodate the business process models of manufacturing, sales and R&D it can also be used to provide qualitatively better information for each of these domains and the general policy domain of the company. If the remediation risk domain is suddenly shifted by new legislation or new discoveries, or by a highly newsworthy industrial accident, the taxonomy allows fast discovery of related technical documents, which may counter all sorts of nastiness, and maybe locate technical solutions to the new problem.
A note: real-world taxonomies are very often ugly as sin. It is unlikely that they will be symmetrical, they probably do not have the same degree of detail in every ‘branch’ of the hierarchy, and there is no point in creating information structures for things that you do not do.
What about descriptors? Life is occasionally even messier and cussed than even the most cunningly designed polyhierarchical taxonomy. One or more categories of things can be isolated and re-applied, at any level and in such a way that… you know what, this is where you may need professional help.
One last thing on structure. There is a taxonomical device so powerful and adaptable that it can describe anything, place it in a framework that can change with your business, second by second, and still deliver new value, new insights, in endlessly recombined information. It is object-oriented, and highly scalable. It is analytico-
synthetic classification and should I tell you any more about it, I will have broken the secret code of librarianship and ninja librarians would come and kill me this very evening. So on to step three (or contact me on my e-mail below).
Step three: implement and review. Many successful taxonomy projects have used a group of representatives from each operational or stakeholder group in the organisation to advise on, or actually do, the surveying and design of the taxonomy. This is a ‘good clique’. Getting this group right is critical. Appoint too many taxonomical enthusiasts and they’ll forget that the only point of this exercise is to make the business more efficient and agile, a better service provider, or more competitive, without creating crippling new procedures. The taxonomy will risk being marginalised as an anorak sideshow.
Without enthusiasts, or rather, the sort of enthusiast that is prepared to go out again and again to explain what it is, why it’s being done and what the benefits will be, the project will wither and die. The organisation as a whole will be prepared to attempt to implement the programme set out by this group because it is credible: it will have acquired authority through being demonstrably effective and competent, or by unambiguous senior-management sanction of its objectives. There will always be a ‘credibility gap’ in which implementation runs ahead of the benefits it generates, and it can be numbered in months, not weeks. There are no glib solutions to the credibility gap: it can only be bridged by being persistent, patient and apparent.
Review must be built in as part of the continuing management of the project. In fact, a practical taxonomy has to be in a constant state of review. If it were possible to visualise a real-world working taxonomy, it would appear to be flickering as new terms emerged from business operations, and older terms ceased to be the preferred usage. Remember that this is not just an intellectual pleasure cruise; this is about creating a strategic and day-to-day understanding of content, robust and reliable enough to base decisions on, and smart enough to provide manageable and comprehensive views of every aspect of the business.
Which brings us finally to the resources needed to make good taxonomies: people and technology. Big, complex, ugly taxonomies need tools to manage them effectively, without the management of the taxonomy becoming the sole activity of the organisation. An organisation devoted to describing itself in great detail is Kafka-esque, very stupid and will fail. An organisation that has no clear idea of what the true state of its information resources, what they are, who creates and uses them, how they are traded internally and externally, is a blundering giant, forever skirting an abyss called Risk. The metaphor extends to include new opportunities, but it gets a bit twee on the way.
People and technology. People and tools. All tools need people to wield them, skilled people who understand the organisation from edge to edge and with some depth. I have no research to prove this, but anecdotally, again and again, organisations enjoy real productive information management when people who understand content, and people who understand content delivery have found a way to collaborate. Put simply: when librarians and IT professionals work together, good things happen.
A report giving a good overview on taxonomies: Willie, J. & Skyrme, D., Taxonomies: Frameworks for Corporate Knowledge (Ark Group, 2003)
From a good information architecture page: www.boxesandarrows.com/archives/002570.php?page=discuss
A Delphi paper: www.delphigroup.com/coverage/taxonomy.htm (you’ll need to register, but it won’t hurt)
Good software article: www.ojr.org/ojr/technology/1015016550.php
Excellent links page: www.loc.gov/flicc/wg/taxonomy.html
Good on ontology: www.ontology.org/main/papers/faq.html
The best-kept secret in information consulting: Hunter, E. J., Classification Made Simple (Ashgate Publishing, 2002)
Mark Field is information and knowledge-management adviser at the Chartered Institute of Library and Information Professionals. He can be contacted at firstname.lastname@example.org