posted 7 Feb 2002 in Volume 5 Issue 5
The case for corporate taxonomy
Separating fiction from reality
The growth of internet technologies and the threat of information overload have contributed enormously to the adoption of the science of taxonomy in a corporate setting. Peter Kibby attempts to get past conventional, mechanistic views of taxonomies to find out what really makes them work in business.
For more than 20 years I have been fascinated by the question, ‘What do we mean by meaning?’ It makes me think of the Zen idea of a hand grasping its own elbow. To untie such knots you have to cheat – grasp the elbow with the other hand. The building of taxonomies can be a similar way of cheating. Taxonomies can make explicit the components of our understanding that are otherwise taken for granted. When we look for meaning in the world around us, especially in a rapidly changing work environment, the ability to create, modify and navigate taxonomies is what gives us a sense of place or purpose and a feel for what is coming next.
Taxonomy is an everyday, basic human activity. I use it in its broadest possible term to cover the labelling of things and their organisation into structures, and that includes classifications, categorisation, directories, thesauri, vocabularies, knowledge trees, even language. Linguistic and philosophical analysis of taxonomy started with the ancient Greeks, but in the last half century these have given way to the approaches of cognitive science, which have found links between the taxonomies we use and our biologically-based capacities to receive and process phenomena.
We use the same word to describe both the activity and its result, but a taxonomy that is no longer active is a dead taxonomy. To be active, a taxonomy must be continually shifting. A classification scheme being updated to cover new areas is an explicit, expressed form of change, but there is a parallel, internal form. As people use a taxonomy, there is an internal process of categorisation that recreates a version of it in their minds. The internal activity of taxonomy makes phenomena understandable, whether these relate to the profusion of living organisms, the faces of the people we know or the contents of a museum. The beauty, harmony and diversity of the world of phenomena are captured, like butterflies in a glass case, by categories, by the labels and structures of myriad shifting taxonomies.
But how are we to understand the taxonomies that are created artificially for businesses? Do they have the same claim on our internal capacities to give meaning to the world? Are they real or just fiction?
The reason why taxonomy has leapt from intellectual to corporate conversation is easy to see. Internet technology has given us the web and the intranet, and these in turn have presented the headache of information overload: too much information in too many different formats via too many channels. Organising information in the age of paper was relatively easy. We had the benefit of hundreds of years of practice, with identifiable centres of excellence (old school libraries and publishers), and physically limited volumes of content. Organisers of information in the electronic age have not had nearly as much practice, have often ignored the lessons gained in the old school and can easily be exposed to more information over lunch than their forebears were in a lifetime. For a very large proportion of commercial and government organisations, correctly labelling and sorting information has moved from being a marginal accomplishment to a core skill.
No organisation in the knowledge economy is without knowledge systems. Every one has ways to create, store and share knowledge. They rely on information, and its labels and structures to navigate it, put it in context and make it understandable. Only a small proportion of these systems are consciously managed, but organisations are beginning to set about actively improving these systems. As the term ‘knowledge management’, if not KM theories, fades from consultant-speak, it is the pragmatic, craft-based approaches that are flourishing, including taxonomy development, organisational design, and document and records management.
The variety of forms that a taxonomy might take is almost infinite, with each one being driven by the role it serves. These roles are defined on the one hand by expectations of the people who will use the taxonomy and, on the other, by how it is to fit the business process it describes. But beyond this mechanistic view of the workings of a taxonomy, there is an extra, far less tangible, factor.
Fit to purpose is the key success criterion for a taxonomy. In practice, that fit is often the result of inspiration as well as rationalisation. The finest result of such a fit is what information researchers call ‘scent’ – the capacity of a categorisation at one level to suggest to its user what may be found at lower levels. Seen from the other end of the process, for the creator of the taxonomy, this is its ‘gist’ – a discipline, principle, idea or, sometimes, metaphor that pulls together the expressed, explicit taxonomy and the anticipated mental categorisation.
Explicitly created taxonomies therefore, in addition to being artificial, may not even be entirely rational. In a technical environment this can be difficult to see and may even be heretical to acknowledge, but in the corporate context it is in full view: restructuring can be almost continuous. If taxonomies are to some extent fiction, then maybe they are not as dull as many people assume. The slow pace of early development and the areas in which they were applied have been far from exciting to the artistically inclined (library catalogues, natural history and philosophy, for example) but all around, as art, commerce and government have become better organised, they too have generated explicit taxonomies. Consider the growth of specialist linguistic taxonomies in the form of jargon and acronyms, or organisational taxonomies like organisational charts and designs, or topographic taxonomies as used in network and traffic planning. And then, of course, came the internet, websites and intranets. The labels and structures presented by these have sparked off a whole new specialist field in information architecture.
Corporate taxonomy is expressed throughout an organisation, from the vision statement in an annual report, through the naming and dividing of its operations between departments and functions, right down to the humble navigation bar on the intranet. Only a few organisations, however, think of these in taxonomic terms and actively manage them to maximise knowledge growth. In order to understand different corporate taxonomies, a categorisation process is inevitable. I would make divisions along the lines of purpose, creator and origin; the purposes to which taxonomies are put are the largest factor in their definition, but they are also dependent on their creators and on the circumstances of their creation.
Purpose: communication, command and control
Frequently the main barrier that prevents disparate parts of an organisation communicating with one another is the lack of a common language. While literally true in many multinationals, this also occurs where specialist terminology grows up or where functional divisions in the organisation align with certain human traits. Common examples of the latter include the divisions between sales and marketing or research and development. Not only do the departments do different things, they do them in different ways. The two groups don’t think in the same way. A common structure, in addition to common labels, is required to bring them together.
With commitment from both researchers and marketing people, a team of information scientists to bring the two sides together, the Mind Manager software, technical support from Sopheon and leadership from Adrian Dale, Unilever in the UK created a structure and labels that allowed its scientists and marketers to be creative together. Their new common language enabled them to cross over between exclusively customer-oriented and scientific views to identify and link potential features to the benefits of possible products. In other words, innovation in the business was fostered by innovation in the shared labels and structures.
A valuable side effect of Unilever’s initiative was the demonstration of the group’s competence. In addition to showing where knowledge was strong or weak, the mind maps it had created became a way of expressing how the group understood the relationships and interdependencies between different areas. The maps provided a visual representation of its ‘knowledge space’, which could be impressively communicated to those outside the group, including customers and budget holders.
This example clearly illustrates that processes are dependent on communication, that communication is governed by taxonomy and that it can be positively engineered. This is also true in a context with which every organisation can identify. E-mail has established itself as a significant means of communication, and the need to confirm actions by means of printed memos and letters is diminishing. As a consequence, electronic records management is steadily growing in importance. Records managers have always used filing plans to divide up an archive so that relevant material can be retrieved, but as records are captured by end-users at the desktop, those end-users need filing categories that are relevant to their future as well as their current use.
Records provide evidence of an organisation’s actions. They can demonstrate compliance with regulation and they may be used to defend litigation from employees, neighbours or customers. In Hollywood films, when asked to disclose relevant documents, the often unfavourably portrayed defendant dumps a warehouse of undifferentiated documents on the hapless plaintiff. In order for a British court to be satisfied, however, the defendant needs, at least in the first instance, to be able to select and hand over just the relevant documents. If the records are filed without reference to this potential use, they can place an enormous load on a business, especially when they comprise many millions of documents. Relevant retrieval is a key factor in the success of a records programme. Although electronic records projects often tend to focus on the technical challenges, the filing plan is at the heart of the implementations that I work on at CMG Admiral.
Portals to an organisation’s information assets face the same challenge, namely enabling people to find material relevant to their part of the business process when it was originally produced for another purpose entirely. A white paper from Microsoft introducing its digital dashboard technology for intranet portals accepted that “the most significant roadblock to effectively integrating information resources for digital dashboards is the lack of a common way of labelling or tagging information”. Since this was written, the number of products that categorise or tag material has increased, but we are no nearer finding a common way. Of course the technology is less the issue than the labels themselves. With different departments using different categorisations for similar tasks and materials, and few using robust structures or consistent labels, the addition of a portal can make information overload worse, not better. I have even seen an instance of more than one portal being developed simultaneously by different parts of the same business: both teams believed their view of the organisation was the correct one, rather than that success would lie in creating a unified view.
And what if the different parts of a business process are not within just the one organisation? The different organisations need a shared taxonomy in order to communicate and share information, whether this is analysis and research documents or part numbers and ordering data. The promise of supply chain portals is predicated on a common categorisation of the items to be traded or shared. Gaining agreement on such common, linking components has in the past taken a great deal of time, with each player jockeying for position. The game playing familiar to those in the technology industry over the introduction of standards like DVD or, even worse, HTML, is identical to that in the development of electronic data interchange (EDI) and SGML standards. A format like XML may be ideally suited to the development of taxonomy standards (eg, topic maps), but history shows that agreement is only achieved rapidly when one player offers a de facto standard based on market or technical dominance.
A taxonomy that spreads outside its own organisation can be very advantageous for that enterprise. In a general sense, like thought leadership, an organisation’s way of thinking can set the agenda for its competitors. More specifically, when a fully-fledged classification scheme is adopted by its customers and competitors, the organisation that developed the classification benefits hugely. A recent example of this is the Ministry of Defence and the Defence Evaluation Research Agency (DERA), which had developed a technology taxonomy. While good information architecture can make a website easier to use, in DERA’s case, good taxonomy actually increased its users’ understanding of the world they were working in. Other parts of the European defence sector, lacking such a scheme, rapidly adopted it, giving DERA an enviable prize of both thought leadership and an entrée to future e-business developments.
Maintaining this sort of advantage requires dominance either in the market or in a particular technical area. The maintenance and continuing development of the taxonomy may itself be a sufficient technical headstart, but this also means continuing costs. The web used to be littered with sites that aped the structure and labels used by Yahoo!. Nowadays, given that you cannot simply copy Yahoo!’s taxonomy, the sheer scale of the undertaking has seen off many competitors. Nevertheless, the site’s directory look is indeed widely copied. The success of Yahoo!’s taxonomy makes it easier to tempt users to explore any similar looking directory (even if the actual choices made by Yahoo! are not to everyone’s taste). Another example of such taxonomy genres can be found among the major news organisations. On their websites there is a continuing battle to be similar enough to the others to make comprehension quick and easy, and yet to add value with idiosyncrasy.
These examples are artificially engineered taxonomies, created by humans. The expectations and abilities of their creators have formed them into recognisable hierarchical and branching structures, like mind maps and classification schemes. The circumstances of their creation, including the availability of leadership and finance, have enabled them to take advantage of the best thinking in these techniques. This is not always the case, however.
Flexibility, granularity and chaos
Control freaks may gain comfort from the importance of the categorisation process and the ubiquity of taxonomy, but this feeling will be short-lived. Interruptions to the status quo are essential for innovation, and a good taxonomy is never finished: it exists in a state of flux, keeping up with the categorisations and understanding of its users and creators. The most obvious failure of a taxonomy is when this process stops, for instance because the role responsible for it has been eliminated. Alternatively, the creator of the taxonomy may not have allowed for change, either within its structure or by ignoring the need or budget for maintenance. Creators of taxonomies can also kill off their own work by setting it up as their property, making it difficult for others to contribute and criticise. In all such cases, the root cause of failure is inflexibility. The Dewey Decimal library classification may seem to be ancient history, but it is gradually evolving (it’s now in its 21st version), and it is kept in robust health by the flexibility it allows libraries in fashioning it to their own requirements.
Flexibility is not the same as complexity. Taxonomies created by the human mind are almost infinitely flexible. They are limited, however, by the capacity of the minds in question to manage a taxonomy’s breadth, depth and complexity. During the 20th century a number of attempts have been made to create or define complex taxonomies, like faceted classification schemes, but on the whole these have not been a success, because even if their creators understood them, their users did not. Users may fail to recreate the taxonomy in their own minds and simply become confused. The limiting factor to complexity is the ability and, importantly, the willingness of users to deal with it. Give them complexity without obvious value and users ‘lose the plot’.
At one point in the late 1990s a government department had an intranet – or, to be more accurate, a large number of intranets sharing only a common piece of software – that it wanted to become ‘joined up’. To link the various components together, my colleagues and I looked at adding a business-process-based categorisation. This then gave the user a double layer of navigation: a top-level, process view to navigate between the intranets, and a lower-level, idiosyncratic view to navigate within a particular intranet. Piling one categorisation approach on another may seem to be creating complexity, but experience shows that people are happy to layer and nest categorisations, as long as their purpose and separation are clear. Lessons from the fictional world fit with this observation. After all, the use of stories within stories, even nested many times, is as old as the art of storytelling itself.
There are, though, limits on depth as well as on complexity. Studies of users’ navigation of resources show that they are more likely to become lost as a result of the depth of a taxonomy than as a result of its breadth. While depth requires the maintenance of a mental model – internal categorisation – breadth in a classification need not, particularly when it is supplemented by visual scanning of category items.
An intranet categorisation for one pharmaceutical company needed to integrate the processes of research and development, manufacturing, and marketing with the distribution of its operations across the world. With limits on the depth but a large number of items to accommodate – the entirety of a fully integrated organisation – we opted for a matrix approach, which enabled the two axes of process and location to be presented clearly and simultaneously. With the air of a database or spreadsheet, this provided a very ‘scientific’ view of the organisation, which fitted nicely, but it did not change the fact that the lower layers of the matrix required a large number of individual items. And as the number of items grows, the time users take in scanning them increases, which is why the aforementioned scent in the categorisation is so important.
At some point the amount of nesting and layering, even when intelligible to the users, becomes difficult to maintain. One of the most far-reaching taxonomy developments is that championed by Dietrich Lehner at PricewaterhouseCoopers in Germany. PwC is both large and diverse. It has an enormous number of taxonomies for different markets and activities, but it has developed a model, or framework, to harmonise them, which could eventually facilitate access from any area to any other. The model has a core to which individual initiatives can make extensions, within certain rules. Such a taxonomic database is difficult to create, but its benefits include regulating and removing duplicate effort and facilitating the management of a greater level of granularity (ie, number of terms and dimensions or levels) than is possible by hand.
If the constraints on managing breadth and depth can be removed by the use of databases and software, and their maintenance automated, the next obvious step is to see if the information can be persuaded to categorise itself or, failing that, to get software to do it for us.
Lies, damned lies and statistics
The successful automation of the categorisation process has lately become a prize for a number of specialist software houses (for instance Semio, Autonomy, Gammasite, Stellent and Quiver) as well as those who have been labouring in the field for decades (companies such as Verity, Hummingbird and SmartLogik). Even Microsoft has joined the fray, while automatic categorisation features are appearing in other product areas like customer relationship management and e-mail management.
This is not the place for an examination of how these products work, and each has different capabilities and a different twist, but the application of labels to content in an automated fashion is based on the workings of predictive, probability-based statistics. Most products supplement this with, for example, linguistic or stylistic analysis or comparison with pre-existing categorisations, but probabilistic calculation remains at the roots. The mathematical relationship between documents and the words they contain is not one that can be represented simply or that people can easily understand. These structures are far from those seen in traditional taxonomies.
The arcane nature and foreign feel of such taxonomies highlight their separation from normal reality. In addition probability may not seem to be a reliable way of, for example, assigning documents to a file plan. In fact this approach needs to be weighed against the weaknesses of human attempts; people make mistakes, sometimes lots of them. In a collection of hundreds of thousands or millions of documents, accumulated over time or in many locations, the challenge of creating consistency is either expensive or impossible to meet. At least if a machine makes a mistake it is likely to make the same mistake every time.
My own experience at CMG Admiral with the Hummingbird categorisation software and that of my colleagues with Autonomy suggests that the tightness of the fit between the technology and the required categorisation process is the key factor in the selection of the software and its effectiveness in replacing human effort.
The big question is, are the machine-generated categories, their relationships to each other and the labelling choices the software makes the same as or different from those made by people? My belief is that these things are the products of the human understanding that went into them. The more that the technology absorbs the depth, breadth and complexity of the original human activities, the closer they come to emulating or replacing the human effect. The trouble is that they are dependent to some extent on the content they work on, so the practical answer is to test: test the software, test the documents and see if it works for you.
The meaning of taxonomy
Taxonomies created by business share some significant features with good fiction: they have a strong structure, they are original but also have genres, they suggest what will be revealed in later or lower layers and they reward engagement by the user with increased understanding. The last of these seems to me to be the most important, that good taxonomy can increase people’s understanding – not by slavish attention to the literal truth but by recreating reality in a new light, using new labels and making new connections through structure.
Raymond Chandler (creator of the fictional detectives Philip Marlow and Sam Spade) suggested that a good way to find out how things worked was to throw a spanner in them and see what broke. The way that taxonomies fail is indeed illuminating. In addition to the technically-based failures covered above – of not fitting the purpose, non-maintenance, excessive depth, unsupported breadth and unjustified complexity – there is, finally, the failure to provide a good story. Renaming a company or government department and re-labelling its components is not always successful, simply because of the names used. The labelling on intranet and website redesigns sometimes invite ridicule. On the other hand, companies that alter their name to reflect current rather than historical understanding or organisations that can identify, label and order the dozen or half dozen things that they stand for or exist to do, as the British Council did in the 1990s, find themselves rejuvenated. The common thread in these stories is plausibility: users of taxonomies will only stretch the use of labels so far. Go beyond the limits of plausibility – by suggesting novelty or innovation where none exists, or aesthetic or moral aspirations where there are only pretensions – and the reader responds with disbelief.
Only a limited number of organisations think, or can afford to think, about their business in a taxonomic way, but they include some of the most important in the economy. The danger for those just starting out is that they will ignore the first and most important rule of writing: know your reader. In the cinema and theatre, people show an enormous willingness to suspend their disbelief, but corporations that come up with a new way of looking at their world, whether in a vision statement or an intranet navigation scheme, frequently find that this generosity is not accorded to their creation. The distinction between fiction and reality is not so important as that between a good and a bad fiction.
Peter Kibby is a consultant at CMG Admiral. He can be contacted at: firstname.lastname@example.org