posted 1 Sep 2000 in Volume 4 Issue 1
The need for a structured method for managing knowledge is particularly pertinent in the financial sector, where the emphasis is on delivering a highly personalised service to individual customers. Adam Gersting, Franck Brice, and Scott Schaftlein describe how the use of taxonomies can provide a natural structure to content management systems and solutions.
In the financial services industry, we see considerable focus on management and delivery of content in a personalised fashion to acquire, retain, and grow profitable relationships, and in a structured manner to improve operational efficiency. As the imperative for managed information, knowledge, and content continues to increase, so does the need to efficiently and effectively make use of taxonomies. A taxonomy is a hierarchical vocabulary used to organise content based on natural relationships. Such hierarchical vocabularies are used to organise information in many areas such as the taxonomy used for the categorisation of all plants and animals. This taxonomy provides a valuable natural structure for these types of information or ‘content’, and is sound enough to stand the test of time, yet flexible enough to allow newly discovered plants or animals to be classified and integrated. Similar taxonomies are critical to financial services content, which may be classified by type of investor, investment objectives, risk, and so on. Taxonomy-driven solutions are critical, regardless of the type of content being managed – learning or educational material, marketing material, financial product or service information, formalised knowledge, news, or articles.
In this article, we provide a point of view on the role of taxonomies supporting needs in financial services; approaches to be taken in developing, applying, and integrating taxonomies with knowledge and content management systems and solutions; and maintaining taxonomies over time. Integrated throughout these sections are examples and recommended practices from both our internal Andersen Consulting experiences – in applying taxonomies to our knowledge bases – and our knowledge and content management efforts with financial services clients.
Why the focus on taxonomies?
The large and growing amount of information stored in different formats in organisations makes it very difficult to know what content is available, and to access that content. Financial services organisations have thousands of content repositories, electronic folders, disparate files systems and formats, and websites. This introduces challenges in finding and presenting the content. Search engines alone do not overcome these challenges, as search engines require structured, categorised content to provide meaningful results.
Organisations are turning to portal technologies to provide their employees and clients fewer points of access to knowledge capital and content. Such portals, in order to be efficient and effective, must sit on top of structured information. There is a need to better organise (structure) information and to understand the relationship among the major topics represented by an organisation’s content, in order to efficiently manage and retrieve content and present it in a meaningful way through portals – whether the interface is a web browser, a PDA, or a WAP phone.
All of these conditions and needs are driving a focus on taxonomies. A taxonomy enables content to be classified and related. With content classified, it can be accessed and presented appropriately. For example, financial product information, classified as being for high-risk investors, can be found under such a category in a content management system, and can be presented with related information-such as articles also classified for high-risk investors. This ability to present related information is key to personalisation, as users either explicitly state preferences for certain categorisations of content or indicate preferences through actions (i.e., accessing information classified under certain categories).
The sections that follow detail an approach to be taken for developing, maintaining, and integrating taxonomies as part of comprehensive content or knowledge management solutions.
Several steps must be addressed, in turn, to effectively develop an initial taxonomy. First, an initial categorisation structure based on current content must be developed, as part of a current state assessment. This is followed by development of the top levels of the taxonomy. Once this layer has been completed, additional future-state categories and values can be determined.
Current state assessment
In order to understand and structure content repositories and their content, current state assessments are being conducted in many major financial services organisations. The development of a taxonomy can be done in parallel with the current state assessment, as the discovery of content will help define portions of the taxonomy. Content discovery – just finding out what content is out there – often represents a major effort.
As part of the current state assessment, a set of common terms and definitions – a controlled or managed vocabulary – needs to be in place or established to classify information consistently across the enterprise. The lack of managed vocabulary requires that an additional step be taken in mapping similar concepts or content that might be present in several business units under different labels. A managed vocabulary provides a common language for content creators and information seekers, and also established associations between many terms that may be used synonymously. In one large financial services client, a managed vocabulary was in place for some focuses of the organisation. This managed vocabulary had been used to organise certain piece of content, which aided in the current state assessment effort, as well as the subsequent taxonomy development.
The next step of the current state assessment is to identify highest value content repositories. In most cases, the 80-20 rule can be applied: 80 per cent of the relevant and important content can be found in less than 20 per cent of the repositories. The challenge of the current state assessment, of course, is that one does not necessarily recognise which 20 per cent at the beginning. The 20 per cent is called the top tier, and the content analysis and the development of the taxonomy should focus solely on that top tier. This will make the analysis much quicker and will focus it on high-impact areas. With one financial service client, for example, current state assessment efforts and taxonomy development efforts are focused on just a couple hundred of the thousands of repositories that have been inventoried. This was part of a several month effort to determine what content exists, and to begin to classify content from the key databases and websites. Similarly, within Andersen Consulting, our recent taxonomy refinement efforts were concentrated around less than 20 per cent of the thousands of databases that make up the Andersen Consulting Knowledge Xchange Knowledge Management System. This current state assessment was focused to provide a single logical repository of content for key areas of the organisation.
There are several methodologies available to identify the top repositories of an organisation: Direct interviews with business unit representatives; analysis of the activity within a repository; understanding the audience of the system; and analysis of the content itself. Interviews with the business unit representatives provide the most in-depth understanding of the content critical to a core business, help in defining what areas the taxonomy needs to cover initially, and aid in identification of existing taxonomies. Usage analysis or volume of traffic data (if available) will complement the information from interviews. With one of our clients, intranet home page traffic was analysed to draw conclusions and gain consensus on priority areas of the business and high-demand areas of knowledge.
The assessment effort and associated discussions will also yield additional benefits such as identification of key contacts and opportunities to collaborate and execute common content processes across an organisation, as well as content to be looked to as the top levels of the taxonomy are developed.
Develop top layers of the taxonomy
With a managed or controlled vocabulary and the top tier of content understood, the top layers of the taxonomy can be developed. A business decision needs to be made as to whether a single enterprise-wide taxonomy is developed, or separate taxonomies are developed for different business areas. A taxonomy may be created to provide different views into the enterprise content; portions of this taxonomy may be fine-tuned for specific needs of a certain part of the organisation. With one large capital markets firm, we recently developed a single taxonomy with portions common for all business units and the flexibility to add categories and values that are specific to a business unit. While a taxonomy should be created to cover the entire scoped area, it is prudent to initially develop and review a first portion in order to ensure that the processes and buy-in are effectively in place.
The iterative development of the taxonomy is an art that can be aided by considering the following questions:
Analysing the types of content and how content is currently focused provides a base for the current-state taxonomy. Development of the top levels of the taxonomy should be done through business unit representative discussions and analysis. Content and knowledge management experts, as well as those with background or experience in research or library science areas should also be involved in the taxonomy development. This activity will also make use of the results from the current state assessment. Through these discussions and analyses, top-level topics and their relationships can be determined.
This portion of the taxonomy development effort is a human-intensive effort. Most organisations have a diverse, custom set of source content, as well as a specific company vocabulary that must be considered. However, industry-specific taxonomies should be looked to and considered in the process of developing a company-specific top level taxonomy. Additionally, with the highest level of the taxonomy defined based on review of content and human analysis, it may be possible to make use of packaged products such as Semio Topic Library® to help build out upper levels of the taxonomy. Semio Topic Library® can build out these areas using proprietary knowledge bases and processing system as well as industry-standard categories. Such packaged products may aid in the rapid development of the upper levels of the taxonomy structure (1).
While each taxonomy is unique to a business, experience suggests that the highest level of the taxonomy is four to ten categories. This upper level of the taxonomy should remain ‘stable’ over time, its characteristics not evolving or changing as much as associated categories and values.
Determine categories and values
After developing the top levels of the taxonomy, one must determine the lower levels – the categories and values – that make up the rest of the taxonomy structure. The categories are the lower levels of the taxonomy, and the values are the final choices within a category. These categories and values also represent relationships between content objects.
Software is emerging that can sift through content and create key terms that represent sets of content. Software can then automatically create concepts by grouping like terms. Semio Builder®, for example, can be used to create this type of lower-level taxonomy using, in part, Semio patented approaches to extracting key phrases from content and developing databases of information for a taxonomy2. Similarly, Verity’s Knowledge Organizer can be used to create categories from existing directories and metadata. Once categories are created, content can be automatically classified against these categories (3). The relationships developed by such tools must be fine-tuned and optimised to meet specific structural needs; content management experts must still examine the suggested categories and values to ensure that they make sense.
For each area of the taxonomy, a conscious decision should be made as to whether the field is required, whether multiple values will be accepted, and whether the values ‘all of the above’ or ‘none/not applicable’ or ‘other’ should be provided. The process for management of the taxonomy areas and keyword values must also be defined including the approach to working through documents categorised as ‘other’. (The ‘other’ choice would be selected when the specific value the contributor wished to use was not provided, but, in the opinion of the contributor, should have been.)
Classify content to test the taxonomy
The step of classification of content will serve the initial purpose of testing the taxonomy as well as providing categorised content to be used within the content management system.
In order to test the categories and values initially developed, work must be done to classify representative content. It is important to not only classify current content, but also consider content types to be developed in the future. The developers of the taxonomy structure should work with business professionals to classify many pieces of content. This will lead to an understanding of the taxonomy structure and classification approach by those who will be classifying, and refinement of the taxonomy categories and values, including identification of areas in which the taxonomy is not mutually exclusive or collectively exhaustive.
With a tested initial taxonomy in place, the content management system can be developed to support classification of content through this taxonomy. This system may incorporate taxonomy creation and management software, as described previously.
Integration of the taxonomy with the content management system
The taxonomy, once developed and tested, needs to be taken advantage of in the content management system. The taxonomy developed must be ‘integrated’ into the content management system and ‘passed through’ and ‘translated’ into the presentation layer (for instance, the portal interface.) Figure 1 represents the relationship (details follow).
The taxonomy must be
developed through the approach detailed previously. A simple taxonomy might be
represented visually as shown in figure 2.
This taxonomy then must be integrated into the content management system; the data model for the content management system must be designed (or redesigned) to support the categories and values to be chosen from. In design and development of this content management system, the metadata values to be captured must also be considered. Metadata are characteristics of content that is inherent to the content but do not show natural relationships in the same valuable way that taxonomies do. For example, metadata captured for a piece of content might include the size of the item and the author; capture of the author of the content is valuable as a content manager or developer might want to view content in the content management system by author. However, author name does not provided the same inherent relationship between content items as does, for example, intention. When developing the content management system to support the taxonomy, the following metadata areas should be considered.
With the content management system designed and developed to integrate the taxonomy and support metadata capture, content can be entered, and taxonomy and metadata values assigned.
An ideal interface provides the author of a document with an in-depth view of the existing taxonomy and suggestion to classify the document under existing topics or to start a new category. In addition to entry of content item by item, content can be loaded in bulk, through a content management system or directly into a relationship database, and classified in bulk using packaged products or custom code.
Classified content can then be used in the presentation layer. In some instances, the taxonomy will be ‘passed through’ or visible through the end-user interface. This aids in the navigation and retrieval of certain types of content as well as the searching of information based on the content. For example, users might navigate a portion of the site by choosing to view content by type or intention or risk level.
In order to provide a personalised view of content to end users, the taxonomy is often used behind the scenes to translate the relationships among content types. Articles, for example, related to product or service information, can be provided based on the relationships or translation through the classification of content. This presents content with a related type, intention, risk level, or audience ‘automatically’ based on the categorisation of the content and business and presentation rules.
Taxonomy over time
The taxonomy must be refined over time; taxonomies are never ‘final’. Content managers, responsible for the content within a repository as well as the structure of the repository, must review items submitted including the way content items have been classified. Content managers must also, as one of the processes they execute, review items that have been flagged with, for example, the categorisation of ‘other’ described previously. They must review these items to verify that they are categorised correctly, or that the taxonomy needs to be modified to support the suggested new category. This will enable the taxonomy to evolve incrementally over time. This and other content can then be reclassified appropriately. Semio Builder® and Semio Taxonomy® products, for example, provide the capability to automatically make incremental changes to the taxonomy based on new content (4).
From time to time, taxonomies need to be reviewed and may be restructured to support changes in content focus, changes in organisational structure, and so on. The scope of those changes may require few, many, or all of the steps noted previously to be executed again. Custom-developed or packaged products may be used to efficiently assign new categorisations to larger numbers of content items in accordance with a new taxonomy. References
1. Automatic Taxonomy Building, copyright Semio Corporation, (2000). www.semio.com
/2. Automatic Taxonomy Building, copyright Semio Corporation, (2000). www.semio.com
/3. Extending Text Retrieval to Incorporate Automated Taxonomy Generation, Harvey, Lynne, (22 April 1999). Copyright Patricia Seybold Group
4. Automatic Taxonomy Building, copyright Semio Corporation, (2000). www.semio.com
© Andersen Consulting, 2000.
Adam Gersting is a manager with the Andersen Consulting eHuman Performance Line of Business Knowledge Management Group. He can be contacted at: firstname.lastname@example.org
Franck Brice is a manager in the Andersen Consulting Human Performance Applications Group. He can be contacted at: email@example.com
Scott Schaftlein is an analyst in the Andersen Consulting Human Performance Applications Group. He can be contacted at: firstname.lastname@example.org