Knoco
exact  any/all
  The original knowledge-management publication
denotes premium content | Nov 18 2008 

Feature

posted 20 Mar 2001 in Volume 4 Issue 6

Conquering the language barrier: XML and knowledge management

XML has been hailed as the most powerful standardising technology since the development of the internet. Peter Stanbridge and Eric E. Cohen discuss the challenges and opportunities presented by the emergence of eXstensible Markup Language and examine its potential impact  on the field of knowledge management.

It has been proclaimed as the “Technology of the Year”1 and the most important catalyst for knowledge standardisation in years. Some call it the ‘great facilitator’ the most powerful standardising technology since the development of the internet or of the PC or even – if you have a great tolerance for hype – since common rail gauge sizes made transcontinental railway possible in the 19th century! So what is this powerful tool for bridging knowledge and technology? It is XML the eXtensible Markup Language.

In this article we look at the growing impact of XML from several angles:

  • at the potential value of XML in the context of knowledge management;
  • at the efforts of one specific group – the eXtensible Business Reporting Language (XBRL) consortium – to develop an XML-variant standard for sharing financial knowledge;
  • at how PricewaterhouseCoopers is building upon early XML/XBRL experiences to enhance its own knowledge flows.
Getting started – an XML primer

XML appears to be quickly gaining acceptance in international business circles a trend led by major hardware and software companies including IBM/Lotus Microsoft Sun Microsystems and Oracle who are rushing to add XML capabilities to their products. So what exactly is it?

XML is a cross-platform application-independent machine-independent operating system-independent standardised information transfer language that permits both human and machine readability. XML effectively lets users create and represent their own ‘languages’ or vocabularies for sharing documents and data. By encasing agreed sets of terms within angle brackets such as <purchase_order>2 or <journal_entry> users can capitalise on emerging XML-enabled tool sets to integrate systems that were previously very difficult to integrate. Publishers can now look at creating content just once and then republishing that content in multiple formats such as hypertext mark-up language (HTML) for web browsers summarised versions for personal digital assistants (PDAs) and cellular phone screens portable document format (PDF) formats or even text-to-speech formats.

Unlike HTML which just describes how content surrounded by tags should be rendered by a web browser XML lets users communicate the actual context of the content. Human or machine agents can then make decisions on what to do with the content based on these ‘contextual’ tags.

XML one of a family of recommendations from the World Wide Web Consortium (W3C)3 – is moving more rapidly than most of its predecessors from foundational technology to vital business tool. For example four events within just the past 12 months are helping move XML into the mainstream:

  • technical recommendations around XML have become firmer and more precise;
  • stakeholder communities have come together to agree on the use of these technical recommendations for their own specific focus area of KM or communications;
  • the developer community has started delivering XML-based software and service solutions to make the benefits available to a wider user base;
  • the business community at large has started to embrace XML.
Technical recommendations

The W3C began its work on XML in 1996 and issued its first XML recommendation in February 1998. This recommendation covered the basic rules about representing information within the angle brackets; the DTD (Document Type Definition) inherited from XML’s ancestor the Standard Generalised Markup Language (SGML) is a validation tool that can be used to ensure that all important components are included in the XML file in the right quantity and in the right place. Many related specifications are now emerging including tools for manipulating XML (such as XSLT – the Extensible Stylesheet Language Transformation – recommended in November 1999) and for providing greater validation for example to ensure that values are of the proper data type (this is the proposed XML Schema not yet a full recommendation at the time of the writing).

Stakeholder communities

Numerous international communities of interest are now coming together to ‘agree to agree’ on the specific sets of XML terms that will be exchanged ‘between the angle brackets’. One such community is XBRL.org4 whose XBRL language has resulted from stakeholders in the financial and business reporting markets coming together to agree on sets of terms including financial statements for various jurisdictions and industries taxation and regulatory filings performance measurement and other existing and emerging business reports.

Developer delivery

Developers make technology available. Major vendors are now incorporating XML into their developer tools and developers are using these tools to bring such technology to the end user. In the case of XML this cycle had been rather protracted because (a) the major vendors were waiting for XML’s technical specifications to firm up and (b) the application developers were waiting for user communities to provide some structure to its use. These processes are now well underway.

The expected growth of XML is analogous to the story of HTML. Creating an HTML file in the mid-90s required dedicated HTML tools but had become an integral part of most strategic business information tools by the end of the decade. Creating HTML is therefore no longer the exclusive domain of techno-geeks well versed in the art of vi a UNIX text editor or Hot Dog an early HTML editor. On the contrary it has become as simple as clicking ‘Save As HTML’ within any word processor spreadsheet or presentation tool. Signs are that XML is following a similar path. For example while the current version of Microsoft Office does already include XML it only really uses it internally for maintaining styling and other such information when ‘round-tripping’ to HTML – accessing XML from within Excel or Access still has to be done using programmatic means. However the next versions of Excel and Access are expected to make working with XML much simpler moving users towards ‘Save As XML’-type functionality. (Meanwhile the competing StarOffice from Sun Microsystems will reportedly store all of its data in XML format to make its Java-based cross-platform data even easier to use.)

Market acceptance

Many organisations and government entities have already begun to examine the efficiencies and power that the application of XML might provide. As with the earlier emergence of the fax machine and e-mail the value of XML to the first few adopter organisations has been limited. However in the same way that fax machines are now almost universally available within organisations and that e-mail is effectively ubiquitous so may the growth of XML skyrocket once critical mass is achieved. For example although the latest versions of many enterprise databases and ERP products are already XML-ready XML has yet to hit the desktop to the same extent. But once it does its usage is expected to grow even more quickly than that of HTML. At the same time variants like XBRL seem likely to profit from the XML’s rapid overall growth. Indeed while acceptance of XBRL began within larger corporates it is probable that its benefits will quickly filter down to smaller organisations too.

XML and knowledge management

One of the principal strengths that its advocates attribute to XML is its ability to blur the differences between documents and data. With its SGML roots XML has the functionality to make sure that documents are properly structured while at the same time having the ability to reliably extract data from those documents. This is significant since when content is broken down into simple files or streams of tagged text knowledge discovery need no longer be limited by fields or records. Indeed through related emerging specifications such as XSL XML-based content could be instantly publishable into the exact format required by a given user at a given time.

How does all this affect KM?

Knowledge acquisition: XML makes it easier to discover and reuse information. Through web and other file servers proper collaboration tools and training internal intellectual property and information from the outside can be more easily obtained and reused.

Knowledge distribution: XML is optimised for the internet. Using intranets extranets and the internet information can be more easily disseminated between disparate systems and users.

Knowledge storage and organisation: XML combines many of the best features of databases text mining and OLAP tools (and some of the worst as well) and opens up access to all kinds of information of all sorts within an organisation – a crucial KM consideration since even the best knowledge is useless if users cannot find and access it.

Knowledge application: with emerging tools for end users XML data can be easily accessed and analysed opening new doors for strategic planning and analysis.

Applying XML – accounting and investment stakeholders ‘agree to agree’

One example of a group that has banded together to examine the use of XML in improving its processes for creating publishing and sharing knowledge is the XBRL community. All stakeholders involved in providing information to the capital markets are invited to participate in the international XBRL consortium. This consortium is working at developing agreement on hierarchical vocabularies for various aspects of business reporting starting with financial statements and then moving on to performance measurement regulatory filings and other related areas.

The XBRL consortium published its version 1.0 specification and its first set of vocabularies – what it calls its taxonomies – in July 2000. The first taxonomy provided a way to represent every concept in a US financial statement for commercial and industrial businesses. In February 2001 the International Accounting Standards Committee made this first taxonomy available for internal review and many others are underway.

The primary need that the XBRL group identified was to come up with a way of reliably and consistently communicating financial concepts for automated analysis tools while at the same time permitting human readable concepts and completely flexible presentation. For example various companies may wish to represent the basic concept ‘cash’ through different labels such as ‘cash and cash equivalents’ ‘cash and short term investments’ or ‘cash and marketable securities’ while still allowing their investors to understand that ‘cash’ just means ‘cash’. The XBRL group therefore brought chartered accounting firms software developers and analysts together to agree upon a machine term to represent the concept ‘cash’ while continuing to permit a variety of human readable labels for presentation purposes.

Electronic Data Interchange (EDI) had previously been the effective standard for such automated data exchange but XBRL also caters for needs not met by EDI – i.e. by providing a data sharing method that is:

  • optimised for discovery so that agents can find financial information easily;
  • independent of process thereby permitting the use of XBRL in multiple contexts;
  • not just sent between two partners so that it can be embedded into other documents archived on the internet and be available at any time;
  • flexible and extensible so that published documents can be shared internally at a deeper level of detail while still understood at the surface level by all;
  • designed for publishing not just collaboration recognising the need to communicate knowledge in multiple forms and not just move it from one EDI application to another;
  • designed so that agents – human or machine – can find and extract data consistently and cost-effectively (thereby bringing EDI-type benefits to smaller organisations).
Where does XML stand today?

The road towards XML has been rocky in places because:

  • XML is still maturing and many XML technical specifications are still in flux. At the time of writing Microsoft Internet Explorer is the only web browser with extensive XML support. Indeed decisions that seem right based on technology available today cannot always be so judged by software delivered months later – for example the proposed XML Schema on which the XBRL taxonomies are based has already changed and the version 1 release is still only at the candidate recommendation stage. To achieve its goals the XBRL consortium had to extend the capabilities of XML leading critics to decry the lack of conformance to XML ‘best practices’ – a difficult situation when everyone is still in the discovery stage together;
  • XML knowledge is now growing but many expert resources knowledgeable in XML are often stretched and under pressure within their own organisations to meet the challenges of for example e-procurement and other critical projects;
  • just because XML processes can help span the gap between technology and business doesn’t mean that human processes are ready to follow suit. The XML developer community is looking at modelling techniques less than a decade old (such as UML the Unified Modelling Language) to try and help communicate its development but it is a perennial challenge to get technologists modellers and business people to speak the same language (especially when such efforts are effectively led by volunteers subject to significant time constraints).

Yet when it is applied creatively XML can undoubtedly be a key tool in enabling knowledge to flow within and between organisations – a key to successful KM.

Case study – applying XBRL/XML learnings within PricewaterhouseCoopers

Many of the technical issues around financial reporting tackled by the XBRL group are similar to those faced by architects of KM systems. In this final section we summarise the preliminary results from our investigation within the Global Knowledge Management IT Standards group at PricewaterhouseCoopers into using an XBRL approach for general knowledge representation.

PricewaterhouseCoopers has in the past adopted a down-to-earth approach to developing knowledge repositories. Documents that have been quality assured and deemed as important knowledge have been stored in databases along with meta-data attributes that provide contextual information about their content. Additional tags provide valuable information on these documents’ creation history and workflow. Such combined attributes have been powerful enough to provide our internal users with sophisticated search and personalisation capabilities and to enable portability of documents across applications databases and intranet platforms throughout the organisation.

However as we have developed key e-business strategies and recognised additional needs for knowledge representation we have started to look much more closely at XML’s potential as a content publishing technology. In particular the following needs have become apparent:

  • to generate publishable documents by aggregating knowledge from multiple sources;
  • to customise the publication and presentation of documents for different audiences;
  • to construct content from components both static (i.e. where content never changes in different instances) and dynamic (where the component describes a set of procedures for generating content);
  • to establish requirements for access control over the publishable components within documents;
  • to provide for both knowledge content and meta-data but with both sharing the same architectural paradigm – for example application programming is made more complex when meta-data is stored within a relational database and the knowledge content within word processing formats;
  • to include application logic within the content including meta-data instructions on how to dynamically generate document content from databases or other applications;
  • to provide for much richer knowledge discovery and visualisation capabilities so that users can ‘see’ their way though available knowledge based on topics and their relationships (‘topic map support’)5;
  • to provide for seamless knowledge transfer and in particular for the transfer of knowledge across application boundaries;
  • to enable automatic support for defining additional document structures within our KM applications;
  • to enable automated extensibility within the KM applications of meta-data attributes such as those associated with specific industries or product sectors.

While XML may seem an obvious generic technology to meet these requirements the refined XBRL approach provides a particularly good architecture for knowledge representation due to:

  • its use of taxonomies coded in XML Schema syntax as a data dictionary ensuring extensibility of attributes both to represent document mark-up as well as meta-data attributes;
  • the fact that the XBRL DTD is simple but designed to permit the creation of content structures of any complexity that do not need to be specified in advance;
  • validaters document parsers and applications that can be programmed to validate and process documents whose structures and content do not require pre-definition;
  • the ease and extensibility with which application oriented meta-data is describable within the taxonomies – a typical example being the specification of a SOAP6 packet designed to dynamically extract information from a remote system during the creation of a document;
  • the fact that it enables a content-focused approach to document creation.

While the approach adopted by the XBRL consortium has been of strong interest to Pricewaterhouse-Coopers in investigating knowledge representation standards other approaches are also under consideration particularly those based much more closely around the emerging XML Schema standard. Some of the teething problems identified in the XBRL-type approach will become less problematic under the XML Schema:

  • the approach adopted by XBRL is often considered a ‘non-standard’ use of XML Schemas requiring specialised or customised validater programs;
  • the representation of mixed content types is slightly unwieldy;
  • problems with validation occur when a document is aggregated from XML sources that do not conform to the XBRL standard.

While it is too early in our project to know exactly which approach we shall adopt in extending our knowledge representation capabilities to XML we are very sure that we will require an approach that is extremely flexible. The approach adopted by the XBRL group already meets most of these requirements something we cannot achieve through the use of a DTD alone which stipulates predetermined elements and report structure.

Conclusion

XML is still a developing standard characterised by the many unknowns associated with new technology developments. In particular understanding the human aspects of the change required particularly user acceptance will become an important success factor. XML will also alter the way content authors editors and publishers work and this will be a concern for all those wishing to implement a content management and publishing application based on the new technology7.

In short there are exciting challenges and opportunities ahead when applying XML in a KM context!8 KM

References

1. InfoWorld (29 January 2001)
2. Such terms and angle brackets are commonly known as tags
3. www.w3c.org
4. www.xbrl.org
5. Defined in ISO13250
6. SOAP – Simple Object Access Protocol a specification to enable XML based structured instructions and data to be transported to and from remote systems
7. Chet Ensign SGML The Billion Dollar Secret (Prentice Hall PTR 1997) provides informative case studies that address among other things the human issues of this change
8. In general it might be better for organisations to consider an approach to XML that defines content based on key problem domain concepts rather than on the content and structure of specific reports. In other words any approach must enable extensibility of content elements and structure. While the XBRL approach to XML content meets most of these challenges new developments in areas like XML Schemas are likely to provide still further means to represent flexible and complex knowledge structures

Peter Stanbridge can be contacted at:peter.stanbridge@uk.pwcglobal.com 
Eric E. Cohen can be contacted at: eric.e.cohen@us.pwcglobal.com


Other publications
by Ark Group


KB Crawl

Copyright ©1994-2005 Ark Group Ltd All rights reserved. No part of this site or the publications described herein
may be reproduced in any form without the permission of Ark Conferences Ltd, Registered in England, No. 2931372.