Feature
posted 1 Sep 2000 in Volume 4 Issue 1
Knowledge Managed
The need for a structured method
for managing knowledge is particularly pertinent in the financial sector, where
the emphasis is on delivering a highly personalised service to individual
customers. Adam Gersting, Franck Brice, and
Scott Schaftlein
describe how the
use of taxonomies can provide a natural structure to content management systems
and solutions.
In the financial services industry, we see considerable focus on
management and delivery of content in a personalised fashion to acquire, retain,
and grow profitable relationships, and in a structured manner to improve
operational efficiency. As the imperative for managed information, knowledge,
and content continues to increase, so does the need to efficiently and
effectively make use of taxonomies. A taxonomy is a hierarchical vocabulary used
to organise content based on natural relationships. Such hierarchical
vocabularies are used to organise information in many areas such as the taxonomy
used for the categorisation of all plants and animals. This taxonomy provides a
valuable natural structure for these types of information or ‘content’, and is
sound enough to stand the test of time, yet flexible enough to allow newly
discovered plants or animals to be classified and integrated. Similar taxonomies
are critical to financial services content, which may be classified by type of
investor, investment objectives, risk, and so on. Taxonomy-driven solutions are
critical, regardless of the type of content being managed – learning or
educational material, marketing material, financial product or service
information, formalised knowledge, news, or articles.
In this article, we provide a point of
view on the role of taxonomies supporting needs in financial services;
approaches to be taken in developing, applying, and integrating taxonomies with
knowledge and content management systems and solutions; and maintaining
taxonomies over time. Integrated throughout these sections are examples and
recommended practices from both our internal Andersen Consulting experiences –
in applying taxonomies to our knowledge bases – and our knowledge and content
management efforts with financial services clients.
Why the focus on
taxonomies?
The large and growing amount of information stored in different formats
in organisations makes it very difficult to know what content is available, and
to access that content. Financial services organisations have thousands of
content repositories, electronic folders, disparate files systems and formats,
and websites. This introduces challenges in finding and presenting the content.
Search engines alone do not overcome these challenges, as search engines require
structured, categorised content to provide meaningful results.
Organisations are
turning to portal technologies to provide their employees and clients fewer
points of access to knowledge capital and content. Such portals, in order to be
efficient and effective, must sit on top of structured information. There is a
need to better organise (structure) information and to understand the
relationship among the major topics represented by an organisation’s content, in
order to efficiently manage and retrieve content and present it in a meaningful
way through portals – whether the interface is a web browser, a PDA, or a WAP
phone.
All of
these conditions and needs are driving a focus on taxonomies. A taxonomy enables
content to be classified and related. With content classified, it can be
accessed and presented appropriately. For example, financial product
information, classified as being for high-risk investors, can be found under
such a category in a content management system, and can be presented with
related information-such as articles also classified for high-risk investors.
This ability to present related information is key to personalisation, as users
either explicitly state preferences for certain categorisations of content or
indicate preferences through actions (i.e., accessing information classified
under certain categories).
The sections that follow detail an
approach to be taken for developing, maintaining, and integrating taxonomies as
part of comprehensive content or knowledge management solutions.
Taxonomy
development
Approach introduction
Several steps must be addressed, in
turn, to effectively develop an initial taxonomy. First, an initial
categorisation structure based on current content must be developed, as part of
a current state assessment. This is followed by development of the top levels of
the taxonomy. Once this layer has been completed, additional future-state
categories and values can be determined.
Current state assessment
In order to understand
and structure content repositories and their content, current state assessments
are being conducted in many major financial services organisations. The
development of a taxonomy can be done in parallel with the current state
assessment, as the discovery of content will help define portions of the
taxonomy. Content discovery – just finding out what content is out there – often
represents a major effort.
As part of the current state
assessment, a set of common terms and definitions – a controlled or managed
vocabulary – needs to be in place or established to classify information
consistently across the enterprise. The lack of managed vocabulary requires that
an additional step be taken in mapping similar concepts or content that might be
present in several business units under different labels. A managed vocabulary
provides a common language for content creators and information seekers, and
also established associations between many terms that may be used synonymously.
In one large financial services client, a managed vocabulary was in place for
some focuses of the organisation. This managed vocabulary had been used to
organise certain piece of content, which aided in the current state assessment
effort, as well as the subsequent taxonomy development.
The next step of the current state
assessment is to identify highest value content repositories. In most cases, the
80-20 rule can be applied: 80 per cent of the relevant and important content can
be found in less than 20 per cent of the repositories. The challenge of the
current state assessment, of course, is that one does not necessarily recognise
which 20 per cent at the beginning. The 20 per cent is called the top tier, and
the content analysis and the development of the taxonomy should focus solely on
that top tier. This will make the analysis much quicker and will focus it on
high-impact areas. With one financial service client, for example, current state
assessment efforts and taxonomy development efforts are focused on just a couple
hundred of the thousands of repositories that have been inventoried. This was
part of a several month effort to determine what content exists, and to begin to
classify content from the key databases and websites. Similarly, within Andersen
Consulting, our recent taxonomy refinement efforts were concentrated around less
than 20 per cent of the thousands of databases that make up the Andersen
Consulting Knowledge Xchange Knowledge Management System. This current state
assessment was focused to provide a single logical repository of content for key
areas of the organisation.
There are several methodologies
available to identify the top repositories of an organisation: Direct interviews
with business unit representatives; analysis of the activity within a
repository; understanding the audience of the system; and analysis of the
content itself. Interviews with the business unit representatives provide the
most in-depth understanding of the content critical to a core business, help in
defining what areas the taxonomy needs to cover initially, and aid in
identification of existing taxonomies. Usage analysis or volume of traffic data
(if available) will complement the information from interviews. With one of our
clients, intranet home page traffic was analysed to draw conclusions and gain
consensus on priority areas of the business and high-demand areas of
knowledge.
The
assessment effort and associated discussions will also yield additional benefits
such as identification of key contacts and opportunities to collaborate and
execute common content processes across an organisation, as well as content to
be looked to as the top levels of the taxonomy are developed.
Develop top
layers of the taxonomy
With a managed or controlled
vocabulary and the top tier of content understood, the top layers of the
taxonomy can be developed. A business decision needs to be made as to whether a
single enterprise-wide taxonomy is developed, or separate taxonomies are
developed for different business areas. A taxonomy may be created to provide
different views into the enterprise content; portions of this taxonomy may be
fine-tuned for specific needs of a certain part of the organisation. With one
large capital markets firm, we recently developed a single taxonomy with
portions common for all business units and the flexibility to add categories and
values that are specific to a business unit. While a taxonomy should be created
to cover the entire scoped area, it is prudent to initially develop and review a
first portion in order to ensure that the processes and buy-in are effectively
in place.
The
iterative development of the taxonomy is an art that can be aided by considering
the following questions:
Analysing the types of content and how content is currently focused
provides a base for the current-state taxonomy. Development of the top levels of
the taxonomy should be done through business unit representative discussions and
analysis. Content and knowledge management experts, as well as those with
background or experience in research or library science areas should also be
involved in the taxonomy development. This activity will also make use of the
results from the current state assessment. Through these discussions and
analyses, top-level topics and their relationships can be determined.
This portion of
the taxonomy development effort is a human-intensive effort. Most organisations
have a diverse, custom set of source content, as well as a specific company
vocabulary that must be considered. However, industry-specific taxonomies should
be looked to and considered in the process of developing a company-specific top
level taxonomy. Additionally, with the highest level of the taxonomy defined
based on review of content and human analysis, it may be possible to make use of
packaged products such as Semio Topic Library® to help build out upper levels of
the taxonomy. Semio Topic Library® can build out these areas using proprietary
knowledge bases and processing system as well as industry-standard categories.
Such packaged products may aid in the rapid development of the upper levels of
the taxonomy structure (1).
While each taxonomy is unique to a
business, experience suggests that the highest level of the taxonomy is four to
ten categories. This upper level of the taxonomy should remain ‘stable’ over
time, its characteristics not evolving or changing as much as associated
categories and values.
Determine categories and values
After developing the top levels of the
taxonomy, one must determine the lower levels – the categories and values – that
make up the rest of the taxonomy structure. The categories are the lower levels
of the taxonomy, and the values are the final choices within a category. These
categories and values also represent relationships between content
objects.
Software
is emerging that can sift through content and create key terms that represent
sets of content. Software can then automatically create concepts by grouping
like terms. Semio Builder®, for example, can be used to create this type of
lower-level taxonomy using, in part, Semio patented approaches to extracting key
phrases from content and developing databases of information for a taxonomy2.
Similarly, Verity’s Knowledge Organizer can be used to create categories from
existing directories and metadata. Once categories are created, content can be
automatically classified against these categories (3). The relationships
developed by such tools must be fine-tuned and optimised to meet specific
structural needs; content management experts must still examine the suggested
categories and values to ensure that they make sense.
For each area of the taxonomy, a
conscious decision should be made as to whether the field is required, whether
multiple values will be accepted, and whether the values ‘all of the above’ or
‘none/not applicable’ or ‘other’ should be provided. The process for management
of the taxonomy areas and keyword values must also be defined including the
approach to working through documents categorised as ‘other’. (The ‘other’
choice would be selected when the specific value the contributor wished to use
was not provided, but, in the opinion of the contributor, should have
been.)
Classify content to test the taxonomy
The step of classification of content
will serve the initial purpose of testing the taxonomy as well as providing
categorised content to be used within the content management system.
In order to test the
categories and values initially developed, work must be done to classify
representative content. It is important to not only classify current content,
but also consider content types to be developed in the future. The developers of
the taxonomy structure should work with business professionals to classify many
pieces of content. This will lead to an understanding of the taxonomy structure
and classification approach by those who will be classifying, and refinement of
the taxonomy categories and values, including identification of areas in which
the taxonomy is not mutually exclusive or collectively exhaustive.
With a tested initial
taxonomy in place, the content management system can be developed to support
classification of content through this taxonomy. This system may incorporate
taxonomy creation and management software, as described previously.
Taxonomy
integration
Integration of the taxonomy with the content management
system
The
taxonomy, once developed and tested, needs to be taken advantage of in the
content management system. The taxonomy developed must be ‘integrated’ into the
content management system and ‘passed through’ and ‘translated’ into the
presentation layer (for instance, the portal interface.) Figure 1 represents the
relationship (details follow).
The taxonomy must be
developed through the approach detailed previously. A simple taxonomy might be
represented visually as shown in figure 2.
This taxonomy then must
be integrated into the content management system; the data model for the content
management system must be designed (or redesigned) to support the categories and
values to be chosen from. In design and development of this content management
system, the metadata values to be captured must also be considered. Metadata are
characteristics of content that is inherent to the content but do not show
natural relationships in the same valuable way that taxonomies do. For example,
metadata captured for a piece of content might include the size of the item and
the author; capture of the author of the content is valuable as a content
manager or developer might want to view content in the content management system
by author. However, author name does not provided the same inherent relationship
between content items as does, for example, intention. When developing the
content management system to support the taxonomy, the following metadata areas
should be considered.
With the content management system designed and developed to integrate the taxonomy and support metadata capture, content can be entered, and taxonomy and metadata values assigned.
An ideal interface provides the author of a document with an in-depth view of the existing taxonomy and suggestion to classify the document under existing topics or to start a new category. In addition to entry of content item by item, content can be loaded in bulk, through a content management system or directly into a relationship database, and classified in bulk using packaged products or custom code.
Classified content can then be used in the presentation layer. In some instances, the taxonomy will be ‘passed through’ or visible through the end-user interface. This aids in the navigation and retrieval of certain types of content as well as the searching of information based on the content. For example, users might navigate a portion of the site by choosing to view content by type or intention or risk level.
In order to provide a personalised view of content to end users, the taxonomy is often used behind the scenes to translate the relationships among content types. Articles, for example, related to product or service information, can be provided based on the relationships or translation through the classification of content. This presents content with a related type, intention, risk level, or audience ‘automatically’ based on the categorisation of the content and business and presentation rules.
Taxonomy over time
The taxonomy must be refined over time; taxonomies are never ‘final’. Content managers, responsible for the content within a repository as well as the structure of the repository, must review items submitted including the way content items have been classified. Content managers must also, as one of the processes they execute, review items that have been flagged with, for example, the categorisation of ‘other’ described previously. They must review these items to verify that they are categorised correctly, or that the taxonomy needs to be modified to support the suggested new category. This will enable the taxonomy to evolve incrementally over time. This and other content can then be reclassified appropriately. Semio Builder® and Semio Taxonomy® products, for example, provide the capability to automatically make incremental changes to the taxonomy based on new content (4).
From time to time, taxonomies need to be reviewed and may be restructured to support changes in content focus, changes in organisational structure, and so on. The scope of those changes may require few, many, or all of the steps noted previously to be executed again. Custom-developed or packaged products may be used to efficiently assign new categorisations to larger numbers of content items in accordance with a new taxonomy. References
1. Automatic Taxonomy Building, copyright Semio Corporation, (2000). www.semio.com
/2. Automatic Taxonomy Building, copyright Semio Corporation, (2000). www.semio.com
/3. Extending Text Retrieval to Incorporate Automated Taxonomy Generation, Harvey, Lynne, (22 April 1999). Copyright Patricia Seybold Group
4. Automatic Taxonomy Building, copyright Semio Corporation, (2000). www.semio.com
© Andersen Consulting, 2000.
Adam Gersting is a manager with the Andersen Consulting eHuman Performance Line of Business Knowledge Management Group. He can be contacted at: adam.m.gersting@ac.com
Franck Brice is a manager in the Andersen Consulting Human Performance Applications Group. He can be contacted at: franck.brice@ac.com
Scott Schaftlein is an analyst in the Andersen Consulting Human Performance Applications Group. He can be contacted at: scott.x.schaftlein@ac.com
denotes premium content | Oct 12 2008 






