Knoco
exact  any/all
  The original knowledge-management publication
denotes premium content | Dec 5 2008 

Feature

posted 1 Oct 2004 in Volume 8 Issue 2

Fuelling the search engine

When it came to implementing a search and retrieval strategy, the UK’s Department for Work and Pensions (DWP) was determined to create a user-friendly intranet search engine. Rebecca Cavalôt speaks to a librarian at the DWP about the challenge of keeping thousands of civil servants satisfied with innovative content-retrieval methods.

The Department for Work and Pensions (DWP), previously known as the Department of Social Security, is a UK government department that employs 130,000 civil servants. Its main priorities are to ensure the best start for all children and end child poverty within 20 years; to promote work as the best form of welfare for people of working age, while protecting the position of those in greatest need; to combat poverty and promote security and independence in retirement for today’s and future pensioners; to improve the rights of, and opportunities for, disabled people in a fair and inclusive society; and to modernise welfare delivery, improving accessibility, accuracy and value for money of services for both customers and employers.

The DWP intranet holds approximately 340,000 textual files, containing everything from benefit guidance for customer-facing staff, to internal HR, estates and finance information. The DWP planned to procure and optimise a new strategic search engine for the DWP intranet to replace a pre-existing tactical solution. With such a wide range of content, user-needs analysis was required for a variety of reasons. The DWP planned to refer to the analysis when writing the business-requirements reports, as well as when carrying out proof-of-concept testing and configuring and optimising the new system.

Searching for relevance

The concept of relevance – the suitability of search results for DWP users’ needs – had to be central to the search experience. Search technologies based on simple character-string matching are common; for example, the UNIX grep command. However, search engines go further by ordering the results of a search so that references to the most relevant documents appear high on results list. The relevance ranking of results can be based on a number of calculations or combinations of calculations. Illustrative examples are shown below, roughly in order of increasing complexity:

  • The number of occurrences of search terms in each file;
  • The proportion of occurrences of search terms in each file relative to the file size;
  • The placement of search terms in each file (near the start of the file, near the end of the file, near the beginning of paragraphs);
  • The concentration of occurrences of search terms in areas of each file;
  • The existence of search terms in specific fields in each file (for instance, title tag or metadata tag, or even in the URL or filename of a file);
  • All of the above, using stemming and synonyms;
  • Bayesian analysis, support vector machines and page-link counting.

Vendors are happy to discuss the relative merits of the exotic algorithms that underpin the relevance ranking of their software. However, during the course of the needs analysis at the DWP, it became clear that this rather mechanistic approach did not fit with what users meant by relevance. Users were not really interested in how many times their search terms appeared in a document, or if lots of their search terms appeared close together in the document. For them, relevance was much more personal and specific. Most users were frustrated by the search engine itself and found that their habits weren’t being taken into consideration. When asked what they wanted to achieve from search, here are some of the comments offered: “Just the HR documents – stuff about staff, not about customers” and “glossaries so you can see what an acronym actually means” or “overviews, not all the nitty gritty stuff”. The list was endless.

Before the start of the project it appeared that the intranet was a ‘pie’ of information, covering subjects that needed to be bundled up into convenient portions in a categorisation system to support the search engine for the DWP users. However, comments such as those above suggested that users took it for granted that the search engine should retrieve documents on the correct subject, but that this wasn’t enough. Users felt that the search engine should be able to bring documents that met their specific criteria to their attention from within the wider results. From this feedback, the intranet appeared like a landscape with features, contours and characteristics that needed to be mapped and presented to users.

Conducting user research

The user research could be broken down into two main areas: reviewing existing information and carrying out new work.

Reviewing existing information

There were a surprising number of sources of information regarding users’ needs, including a previous information-needs analysis for one of the DWP’s agencies, carried out in 1999; 1,300 (anonymous) comments received by webmasters who were running intranet sites; 250 discussion-group comments regarding the existing search engine and the whereabouts of information on the intranet; and annual reports and lessons-learnt documents.

The previous needs analysis was useful because it outlined several different types of information needed by staff in the agency: individual case information; information to support casework; information for research and awareness (organisation awareness, specialist awareness, news and mainstream media awareness); information for training and self-development; and parliamentary information.

Anonymised comments received by webmasters covered access to guidance and bulletins; current awareness; local office (workplace) information; finding or contacting other members of staff; the previous search engine, as well as the existing one; organisational awareness; and HR issues.

Discussion group comments were categorised and underlying themes soon became apparent. The annual reports were constructive, as they provided an official high-level overview of what the DWP does and how it does it. The lessons learnt were also valuable as they described previous problems staff had encountered when trying to access information.

All these sources of information provided both general themes and specific instances of the information that staff needed in order to carry out their work. They also illustrated how staff currently retrieved information and how they wished to retrieve it in the future.

New research

As well as reviewing and re-using pre-existing information, the DWP decided to carry out new research into staff needs. A survey was carried out over the intranet that asked staff how regularly they used the existing search engine, what they thought of it and how they thought it could be improved. Although the information gathered was useful, particularly the statistical information, it was recognised that such a survey may not be entirely representative. The staff who responded tended to be self-selecting, and it was likely that they were more technology literate than the average employee.

In addition to the survey, 85 semi-structured interviews were carried out with staff from a representative sample across the DWP. These were preceded by brief unstructured interviews with staff in both customer-facing offices and the corporate centre. The preliminary interviews were useful for gaining an understanding of both the culture of different workplaces and the reasons why information retrieval had become such an essential issue. It came to light that the latter was down to two reasons. First was the increase in the number of public-sector initiatives being launched by the current government. Second, the increasing customer-focused strategy of delivery – staff needed more areas of expertise so that customers were passed from one specialist to another less often.

The semi-structured interviews evolved over a short period into 29 closed questions, 12 open questions and five supplementary questions. Subjects covered by the questions were, in order:

  1. What do you do in your job?
  2. What information do you need?
  3. Do you need external information?
  4. What is external information?
  5. How do you find information now?
  6. Specific examples of past problems/difficulties in finding information/using the search engine;
  7. If you had a magic wand, what would happen? (this question was suggested by a librarian colleague at HM Customs and Excise);
  8. Here is a possible new functionality for the search engine... would it be useful?

The structure was designed to put users at ease at the beginning of the interview, giving them the confidence to talk freely by presenting them with a topic that they were familiar with. In addition, information about management responsibilities and whether the interviewees worked with others carrying out similar tasks was divulged in this part of the interview, along with technological literacy levels and the types of day-to-day time pressures users experienced. After some interviews, users showed us the information they used on the intranet at their own desks and informally discussed how they went about using the existing search engine and the problems they encountered.

Results from the research and solution requirements

The need to refine search results

The most essential requirement was for the user to be able to refine the search results so that only content that was relevant to them would be retrieved. Users expressed this single concept in a number of ways and phrases they used included “restricting your search”, “refining” and “the search engine should do the work for you, putting it in categories”. When asked if it would be useful to restrict searches to specific types of documents, such as minutes of meetings or reports, users were enthusiastic.

The results suggested that a search solution should have functionality to allow the DWP librarian administrator to set up both pre-defined categories for restricting searches (a pre-coordinate taxonomy, based on ‘scavenged’ existing intranet structures, such as filepaths), as well as post-co-ordinate classification carried out by the user at search time (restricting results according to different combinations of metadata values). An example of the latter is a user searching for ‘automated payments’ and choosing the metadata values ‘report’ for the DC.Type metatag and ‘the pension service’ for the StrategicArea metatag.

A data cleanse of the DWP intranet had previously been carried out, and the application of the e-Government Metadata Standards to content had been made compulsory.

The need for overview information

Some users, particularly middle-ranking staff working on projects, said they often needed to search for information that was completely new to them and therefore had unpredictable search needs. These staff would construct a search as best they could, but would be overwhelmed by very detailed ‘low-level’ information in the results. One user compared results from the existing intranet search engine to results from searching on the internet. Searching on the internet often brought the user to a homepage, and they could then navigate from there. It was acknowledged that a way of separating out or raising the relevance of homepages and index pages would be useful.

Refining search to specifically dated content

Users talked about the way they read bulletins on subjects that subsequently occurred in the workplace and expressed a need to find bulletins quickly. Their comments showed that some way of restricting a search to specific content and then ordering the results by date would be a valuable tool.

Context of search terms in results summary

Some users complained that in the existing results summaries the titles and descriptions of documents were either misleading or uninformative. Users might look at documents and, although the search terms existed, the context was completely irrelevant. To remedy this, the summaries in the results lists produced by the search engine could either summarise the documents or show search terms and words on either side of them. This would help users choose which documents to view.

Highlighting search terms within results

Users were critical of the way that the existing search engine returned lengthy documents that might make it difficult to find what they were looking for within that document. The new search engine needed to highlight the relevant parts of the document and provide ‘jump to next’ and ‘jump to previous’ buttons. Knowledge of the existence of standard Ctrl-F ‘find’ functionality was not widespread, but when discussed users thought it an inconvenient intermediate step.

A ‘more like this’ functionality

Receiving results that were not quite what was required caused users to become frustrated. The concept of search results that offered similar documents to themselves, based either on the content of the text of the document (clustering or segmentation technology) or the content of the original document’s filepath (because web authors tend to put similar documents in similar places) was well received.

Identification with a specific agency

The interview question about external information brought to light the unexpected strength with which staff (customer-facing employees in particular) identified with their own particular agency, rather than the DWP itself. Agency staff often said that they did not need external information from the Inland Revenue or Department of Health, or even from another DWP agency. Many staff actually classed other DWP agencies as external to their own, viewing them as separate in the same way as entirely different government departments and ministries.

However, intranet sites for agencies do contain and use information from other DWP agencies. A decision was made that the categorisation supporting the search engine would identify internal or external, relevant or non-relevant, by channelling users into areas of information used by staff working in their agency. In practical terms this meant that some of the categories of content that were produced by Jobcentre Plus would also need to be referenced by categories under the Pension Service’s area of the categorisation system. Instead of a main category ‘guides produced by the Pension Service’ there would be the category ‘guides used by the Pension Service’, some of which would be produced by non-Pension Service staff. Furthermore, the overall DWP HR guidance would need to appear not just as a top-level category, but also throughout the categorisation system, under each agency heading. This was expressed in the high-level business requirements as a need for a polyhierarchical categorisation system – that is, different routes available to reach the same content.

Non-organisation-based taxonomies

There is some value in the argument that categorisation systems or taxonomies should not be organisation-based. The units within organisations change name and amalgamate or split over time. In addition to this, categorising by organisation pre-supposes that the user knows which unit within their organisation deals with what.

However, research found that many customer-facing DWP staff do not think in terms of the DWP as a whole. Their tasks, and the information needed to carry them out (the official guides, the official bulletins) are fairly predictable, in type, if not in content. Having a high-level category mentioning their agency name allows such staff to focus on the category and its numerous sub-categories (which may then be based on document type, subject, organisational unit and so on), secure in the knowledge that they do not need to hunt around the categorisation system for information relevant to them that they might have missed.

Other staff who have more wide-ranging information needs are likely to be project staff and senior decision makers within the organisation. These staff can be expected to have a wider knowledge of the organisation as a whole, for example the unit that deals with procurement. By typing in a search term they should know, or at least take an educated guess at, which organisation-based categories in the results hold the information they need.

In summary, the initial categorisation system put in place to support the DWP search engine is not an immediately logical or purely subject-based taxonomy such as the Government Category List. The DWP intranet search categories are aimed at showing clearly defined sets of documents that segment the search results. The rules for these reflect the most common groupings of documents that users have expressed a need to restrict their searches to, for example the Benefit Guidance used by Jobcentre Plus, only the overview pages (home pages, index pages), just the glossaries of abbreviations used in the Pension Service, only the speeches by a particular minister, just the documents produced by the project-management specialists. Many of the top-level terms are organisation names, allowing users who identify strongly with these to focus on them initially.

It may seem extravagant to expend such effort trying to find out what users want, but for procuring and optimising an intranet search engine, it is an essential part of the process.

Costs of the needs analysis

Figure 1 shows time elapsed for the user-needs analysis as a proportion of the entire project time elapsed. Other project tasks were being carried out in this time, too. Figure 2 shows the cost of the user-needs analysis as a proportion of entire project cost. These illustrations show that even a relatively in-depth user-needs analysis such as this can be low cost. When thinking about costs, it may be useful to consider the difference between material work and information work.

Benefits of information work – information economics

Consider the following vendor-neutral algorithm for the search query that underpins the category: ‘reference pages – glossaries etc’. Any document whose:

  • URL CONTAINS gloss* OR
  • TITLE CONTAINS glossar* OR
  • URL CONTAINS abbrev* OR
  • TITLE CONTAINS abbrev* OR
  • TITLE CONTAINS THE PHRASE jargon buster

When a user carries out a search for PPG and then clicks on the symbol for the category called ‘reference pages – glossaries etc’, the above search is run in addition to the search for PPG. By creating this rule in advance and adding it to the categorisation system, the search-engine librarian is saving the effort of users so that they do not need to think up and type this complex query every time they wish to limit their searches to reference pages. One of the DWP librarian’s tasks is to provide hundreds of such search frames.

In Money for Nothing, Roger Bootle explains how information economics is different from material economics.[1] For example, if a carpenter makes a chair for a customer and then another customer wants to own a similar chair, all the material work (carving the wood and putting the components together, as opposed to thinking up the design of the chair) needs to be done all over again. However, if a librarian researches users’ needs and then creates a category for a search engine (information work) and it is subsequently needed by more than one user, no new work has to be done. Information work can be consumed by many people over and over again.

With 130,000 users, the costs of carrying out the work described here are very low compared to the benefits to users.

Reference

1. Bootle, R., Money for Nothing: Real Wealth, Financial Fantasies and the Economy of the Future (Nicholas Brealey Publishing, 2003)

Rebecca Cavalôt, deputy editor, Knowledge Management, rcavalot@ark-group.com


Other publications
by Ark Group


KB Crawl

Copyright ©1994-2005 Ark Group Ltd All rights reserved. No part of this site or the publications described herein
may be reproduced in any form without the permission of Ark Conferences Ltd, Registered in England, No. 2931372.