Regular
posted 2 May 2006 in Volume 9 Issue 8
Semantically speaking
The ‘semantic web’ project could take the internet to another level of automation and interactivity. But some key challenges need to be overcome first.
By
Tim Berners-Lee, the scientist who invented the software and standards behind the worldwide web, is regarded as something of a guru pretty much everywhere he goes, as well as an all-round good guy.
He could have tried to ‘monetise’ his invention, for example, but chose to let everyone use it for free to provide a common global platform for communication and collaboration. Alternatively, he could have climbed on the dot-com bandwagon in the 1990s by founding or joining any number of half-baked internet companies, but instead chose to concentrate on the less-lucrative work of the worldwide web consortium (W3C) internet standards body.
Not that he is a pauper. But Berners-Lee’s hard work at the W3C – devising, developing and defending internet standards – helped to keep it out of the predatory hands of software, hardware and other companies that would have liked to have controlled it for their own profit.
In short, without his vision and altruism the internet might have developed very differently – and much more slowly. For Berners-Lee, however, it is far from finished. It could do much, much more, he believes. That is why since the dot-com era, he has been working on a project called the semantic web, an extension of the internet in which the information is given a well-defined meaning.
By combining the different meanings of different pieces of information, he believes that it will be possible for computers to understand and interpret the meaning of data, as well as where and how it should be presented.
It sounds like the kind of project that only a techie could be interested in, but Berners-Lee foresees it providing the kind of integration and automation that would benefit ordinary people’s lives normally only seen in science-fiction films.
The difference between the web of today and the semantic web that Berners-Lee foresees developing is that the web today is primarily constructed for people to read, not for computer programs to interpret and use. That is to say, while computers can display, for instance, somebody’s curriculum vitae or resume, they cannot understand what it is and what it is for, interpret the various elements within or understand their meanings and, as a result, reference them elsewhere.
In an article in Scientific American magazine, he cited an example where two people set up a hospital appointment for their elderly mother by using a ‘semantic web agent’ to find a doctor and hospital that fitted their various criteria, such as quality, available appointment times (in the schedules of both the children, as well as the hospital) and how well those appointment times fitted the two children’s own busy lifestyles.
The science bit
Two of the basic technologies for the semantic web are already in place: The extensible mark-up language (XML) and the Resource Description Framework (RDF).
XML, of course, is the simple standard by which internet, data and other content can be tagged and described in simple terms. In XML, for example, the various elements of an electronic invoice can be described so that the name is tagged accordingly, the address and so on. Therefore, when the invoice is sent from one organisation to another, the computer systems at the receiving end can automatically identify the information on the invoice without it having to be re-keyed. The payment process can then be automated and speeded up.
RDF is an infrastructure standard that sets-out how structured metadata – data that describes information and data, such as XML – can be encoded, exchanged and re-used. In plain English, it describes a document’s properties, making assertions about meanings and values, in a computer-readable format.
A third element to draw it all together has also been developed, the web ontology language (OWL), to provide a taxonomy in which all the metadata can be organised.
Big problems
But the problem with the semantic web is simply this: after almost ten years of endeavour, it still doesn’t quite work. Indeed, one commentator went as far as to put it on a par with spray-on hair and the Sinclair C-5 electric car of the 1980s.
One reason, claim some, is that the premise underlying the venture is, quite simply, over-optimistic. "The people working on the semantic web greatly over-estimate the value of deductive reasoning," says consultant and author Clay Shirky, "The great populariser of this error was Arthur Conan Doyle".
In Conan Doyle’s Sherlock Holmes novels, the detective solves the mysteries by a process of cast-iron logic, putting together all the clues and eliminating extraneous details in order to work out, without a shadow of a doubt, ‘who done it’ – and why. The semantic web is intended to work on similar logical principles. But real life is rarely as logical as a Sherlock Holmes novel, says Shirky, and facts are easily coloured by different peoples’ perspectives, opinions and interests
Nor will it necessarily make life any easy for everyday web users. Rather, it may introduce entirely new complexities. According to the W3C, this is how users would buy a book:
"You browse/query until you find a suitable offer to sell the book you want. You add information to the semantic web saying that you accept the offer, giving your details (ie: name and address and payment information). Of course, you add access control so only you and the seller can see it, store it in a place where the seller can easily get it… and notify the seller about it. You wait or query for confirmation that the seller received your acceptance and, perhaps later, for shipping information."
Alternatively, you could just go to online bookseller Amazon…
Other challenges associated with the semantic web are also brushed aside by the W3C with glib, sweeping statements about such complexities as, for example, merging databases. In the semantic web world, you simply record in RDF that the ‘person name’ in one database is the same as the ‘name’ in another and the system does the rest. Anyone who has tried merging or cleansing customer databases will know that there is far more to it than that.
Another questions is who decides how information and data is tagged, and how? It is a perennial problem encountered by search engines that the descriptive tags of websites are frequently misleading. Yet the semantic web requires an honesty and trust that is absent on the internet today.
However, there have been some limited successes. The RDF standard has been included in a small number of products and some of its critics have, as a consequence, softened their stance.
"I’m seeing more potential in it, but only in directions different from the ones that were being touted for it originally. I’m seeing more use of RDF in commercial products, although none in content management systems, other than blogging tools," says consultant and technology writer Rob Buckley.
Here, RDF is used to define information on the weblog in a form that other weblogs can understand, so that the author can be identified. And if anyone else comments on your blog in theirs, mutual links can be set-up between them – a system known as trackback.
In addition, some of the semantic web technologies have also been deployed in more serious applications, including data-mining tools. These enable users to more accurately identify and categorise information. "[But] it’s a hell of a ‘faff’ for minimal reward and it gets very complicated very quickly, so I don’t see it doing too much, at the moment, for the majority of web sites," concludes Buckley.
So far, therefore, the semantic web has provided a relatively small return for so much hard work. But for Berners-Lee, perhaps, it also reflects the challenge that every great artist faces after having completed their finest work. How does he follow up his success with the worldwide web with something equally ground breaking or better?
The semantic web may yet be the answer, but despite almost a decade of development there is still much more work to be done.
denotes premium content | Jul 9 2008 





