posted 19 Jan 2006 in Volume 9 Issue 5
Trend tracker: Information access
By Chris Harris-Jones, research director, information management, Ovum
Information retrieval is a perennial problem. Information and knowledge needs to be accessed in many different ways, depending on what needs to be done. One task might require the retrieval of all the information on a particular project, another by topic, another by author, for example.
Retrieval needs to be based on the context of the current activity. It should be possible to retrieve all relevant information regardless of who saved it and where it was saved. That includes all types of documents, spreadsheets, diagrams, e-mails, instant messages, whiteboards, discussion groups and so on.
The problem is that information on computers is stored under a structure of hierarchical directories provided by the operating system and following a metaphor used in computing for more than 20 years. Some people may have a content-management system that provides easy retrieval of some types of information, although very few organisations have every single content item across the company stored in this way. There is usually still substantial content in hierarchical folders.
This filing cabinet metaphor worked reasonably well when everything went through a central point – usually a personal or group secretary. But these valuable people are disappearing fast and now just about all office-based workers have to create and store information themselves. This usually means that the only person who can find information efficiently is the person who stored them – although even that is not always the case.
Another significant problem with hierarchical storage is that information is usually filed in a directory according to the current context. If you need that information but in a completely different context, then it can be very difficult to find.
Traditionally, numerical data has been stored in many different ways. In the early days we had stand-alone, indexed sequential files, then came hierarchical databases, which could store more and deliver it faster. Then we had relational databases, which became commercially viable in the 1980s and delivered a more interconnected structure to the data. Unfortunately the storage of unstructured information has become stuck in hierarchies. When information is used for multiple purposes, as it invariably is, hierarchies simply do not work.
The nearest we have come to moving away from the hierarchical model is found in some content-management systems. These often use the concept of a ‘heap’ - content is simply stored wherever happens to be convenient. This apparent anarchy is resolved through the creation of indexes with pointers to the content. Search engines are then used to scour the index and retrieve the item. This is a big improvement on the traditional, hierarchical storage model.
For effective information management, we need to completely remove the user from storage of their data. The idea of allowing every user to decide exactly where to store information is highly inefficient as retrieval is then based on both their personal memory and context. We need operating systems that deliver some form of content management, out-of-the-box, to kill hierarchies forever.