[tt] NS: Birth pangs for the 'semantic web'
Premise Checker
<checker at panix.com> on
Fri Jun 6 19:34:29 UTC 2008
Birth pangs for the 'semantic web'
http://technology.newscientist.com/article.ns?id=mg19826585.800&print=true
31 May 2008
Jim Giles
BY THE time the web began to be widely used, around a decade ago,
its inventor was already working on a more ambitious plan. Tim
Berners-Lee was imagining the ultimate "mash-up": a web in which any
sort of data - from train timetables to scientific papers - could be
seamlessly combined. The days of trawling through results from
search engines would be over. Instead, browsers would navigate in
search of answers, not web pages.
Although many still doubt this so-called "semantic web" will take
off, the first concrete steps have recently been taken. Over the
past year, several large data sources, most notably Wikipedia, have
been converted into formats that make them easier to combine.
Software that integrates these data sources is also being developed.
The results are not yet user-friendly, but the semantic web, so long
in gestation, may finally be coming into being.
"This is a really important moment," says Tom Heath, a semantic web
specialist at Talis, a software firm in Birmingham, UK. "We're
taking the theory and ideas of the last 10 years and making the
vision a reality."
The fruits of recent work into building the semantic web are most
visible at DBpedia, a semantic version of Wikipedia. The regular,
web-page-based version of the online encyclopedia works fine when a
single page contains all the information you need, less well when
you want answers to broad or complex questions. To find all battles
that took place in the German region of Saxony, for example, a user
might search for the terms "battle" and "Saxony". That returns
nearly 1000 results, most of which are not directly relevant.
To solve this problem, the DBpedia team at the Free University of
Berlin and the University of Leipzig in Germany have developed
software that analyses the content of Wikipedia and reorganises it
into a vast list of statements about "things", such as people and
places. The entry for Saxony, for example, becomes a set of links
that connect the name with other entries in the database, such as
local landmarks and notable residents.
With the data in this format, users will ultimately be able to ask
questions rather than perform searches for suitably chosen phrases.
When asked to identify conflicts that took place in Saxony, DBpedia
identified four battles without turning up reams of irrelevant
pages. Queries can be extremely complex: one user asked for all
soccer players who wear the number 11 shirt, play for a club with a
stadium that seats over 40,000 people and were born in a country
whose population exceeds 10 million. DBpedia duly supplied a list of
10 players.
The system is still very much in the test phase - the two answers
mentioned above included a battle that took place in Lower Saxony
and a player who now wears a number 3 shirt. But DBpedia provides a
hint of what the semantic web would look like if all web pages were
tagged in a way that allowed computers to understand what kind of
information they contain.
This can be done according to a model known as the Resource
Description Framework (RDF), created by the World Wide Web
Consortium (W3C), the body which develops new web guidelines and
technologies under the direction of Berners-Lee. With RDF, any type
of data on websites can be assigned appropriate descriptive tags.
Once this is done, data from different sources can be combined in
such a way that more interesting things start to happen. Two members
of the DBpedia team, Christian Becker and Christian Bizer of the
Free University of Berlin, have developed Mobile DBpedia, a
cellphone application that takes a user's GPS position and displays
Wikipedia articles on places in the vicinity, as well as showing
them on a map. It also draws in information from any source that has
made its data available in an appropriate format. This includes
articles from Revyu, a semantic website that lets users post reviews
of, for example, restaurants and places of interest, along with
photos from Flickr that have been tagged with location coordinates.
Last October the BBC started making data on its television and radio
shows available in a semantic format. "People outside the BBC can
now do interesting things with our content," says Tom Scott, a team
leader at BBC Audio and Music Interactive. The online synopsis of
each show is now tagged so that a suitably configured mash-up tool
can identify what the broadcast was about and who it featured. Once
the BBC makes its music data available, something Scott and his
colleagues are working on, it will be possible to develop an
application that lets the user know whenever the BBC airs a show
matching their musical tastes.
Some sites have managed to achieve similar aims, though only by
painstakingly collating disparate sources of information.
EveryBlock, which covers San Francisco, Chicago and New York, is an
example: it delivers news, blog posts and images linked to the
user's location. But this involves developers at the site taking
each database as it comes and working out how to integrate it into
their system. "A more semantic web would make it easier for us to
compile disparate sets of data," says Adrian Holovaty, the site's
founder.
A semantic web would also allow users to develop their own mash-ups,
rather than rely on the skills of Holovaty and other programmers.
Once users are able to ask browsers about things - people, places,
events - and the relationships between them, they will be able to,
for example, easily identify properties for sale that are close to
highly rated schools and hospitals, instead of just browsing
real-estate listings.
So is Berners-Lee's vision about to be realised? Despite some
progress, it's not yet clear that it will. For a start, at the
moment there is no easy way to search the semantic web. The
interface used to ask questions of DBpedia would frighten less
web-savvy users. There is other software that can be used to search
a broader range of online semantic information, but these tools are
not suited for the average internet user either.
As the amount of semantic data grows, better search tools will
probably spring up. But some web experts point to another problem
with the W3C's plans. They say that semantic web advocates have
spent too much time focusing on the technical aspects of their
schemes. These can require web developers and content creators to
invest considerable time learning new programming languages before
the data can be made available in a suitable format. As a result,
years of proselytising by experts has failed to convince content
creators to buy into the semantic web ideal. It's notable, for
example, that DBpedia was created by computer scientists, not by the
community of editors that keeps Wikipedia running.
"I was a big proponent of the semantic web a few years back," says
Timo Hannay, who runs the web publishing team at Nature Publishing
Group in London. Now Hannay and others say the languages developed
by the W3C to describe data semantically are just too complex for
many website owners.
Web developers and users may instead turn to simpler semantic
systems, such as the metadata tags already used to describe shared
bookmarks on sites like del.icio.us. While these tags do help
computers find the right data, they can limit the potential for
mash-ups. If a blogger only uses the tag "San Francisco" to describe
reviews of restaurants in that city, for example, that will not help
users wanting details of particular kinds of eatery in specific
neighbourhoods. "We're seeing the semantic web emerge," says Hannay.
"It's just messier than we'd hoped."
So while the semantic web seems to be coming to life, no one can say
for sure what it will grow into. In an article in Scientific
American in 2001, Berners-Lee imagined a world in which software
would be able to take over important but uninteresting tasks, such
as selecting a trusted doctor based at a convenient location. That
would require many more parts of the web, from online calendars to
doctors' surgeries, to go semantic. Right now, it's still a
futuristic ideal. Advocates of the idea may have kick-started the
creation of the semantic web, but it's still not certain that others
will follow their lead.
Related Articles
Semantic website promises to organise your elife
http://technology.newscientist.com/article/dn12903
09 November 2007
Software could add meaning to 'wiki' links
http://technology.newscientist.com/article/dn9295
07 June 2006
The Facebook of yore
http://www.newscientist.com/blog/technology/2007/12/facebook-of-yore.html
17 December 2007
Weblinks
Dbpedia
http://dbpedia.org/
World Wide Web Consortium
http://www.w3.org/
Revyu
http://revyu.com/
BBC's programmes database
http://www.bbc.co.uk/programmes/
More information about the tt
mailing list