WikiOnt: an ontology for describing and exchanging Wikipaedia articles
by
Gassert, H. (Université de Fribourg / mediagonal ag)
→
Europe/Zurich
60-6-002 (CERN, Main Building)
60-6-002
CERN, Main Building
Description
WikiOnt aims at integrating Wikipedia (and by extension other MediaWiki-based sites) into the Semantic Web framework and making Wikipedia machine-processable and -understandable
Introduction
The free content online encyclopaedia Wikipedia contains approximately 1.5 million articles, more than 500,000 of which are published in English, receiving around 50 million hits a day. It has become one of the most important single knowledge sources on the Web. Wikipedia is currently used mainly by humans who search and browse through its HTML user interface optimised for on-screen display. Web crawlers try to work with this affluent body of content as well.
In contrast to web sites targeting online users, data offered in a machine-understandable format is free from any constraints: it can be processed, integrated, combined and mapped to different system and vocabularies with ease. In contrast to HTML, such data is much more useful to software than it is to humans, but it has the advantage of multiplying the potential of the information it encodes.
[edit </w/index.php?title=Transwiki:Wikimania05/Paper-IM1&action=edit§ion=2>]
What is an Ontology? At the moment, a documented database scheme is available for MediaWiki-based sites which is sub-optimal for information exchange across sites. Semi-structured data (RDF/XML) can be self-describing and can carry its schema and semantics implicitly in the data, facilitating data exchange and integration. The current data set in Wikipedia is not generally machine-processable, but making the data in Wikipedia machine-processable could open up Wikipedia to a broad range of use cases and data consuming agents. One of these could be the addition of Wikipedia articles to search results, a goal that the 'big players'in the search engine game are aiming for as well.
One means of making Wikipedia machine-understandable is by creating a formal ontology. Ontologies are formal specifications of how to represent the entities in a specific domain as well as the various interrelations among them. In the Semantic Web, ontologies can be used to share and reuse knowledge via the Web and they can be seen as a means for knowledge management on a global scale. A specific Wikipedia ontology can be built to integrate Wikipedia into the Semantic Web framework and therefore to make Wikipedia machine-processable and -understandable. Through the use of RDF (Resource Description Framework, a W3C recommendation) and URIs, Wikipedia content could be identified, described, linked and combined with other data sources. [edit </w/index.php?title=Transwiki:Wikimania05/Paper-IM1&action=edit§ion=3>]
Uses of an Ontologised Wiki Wikipedia URLs can be used to denote a subject of a document, or to annotate photos: in fact, Wikipedia URLs can become general URIs identifying concepts in the Semantic Web, enabling the Semantic Web community to leverage the structured knowledge collected and maintained by the Wikipedia. In that sense, ontologising and 'RDFising'Wikipedia can build a bridge between these two highly productive communities and allow for various sorts of 'cross-pollination'between them.
RDF is a language for representing information about resources on the Web. So far, it has mainly been used for representing metadata about Web resources such as title, creator, and date, but as the border between data and metadata is blurring, expressing both the content and structure of an entire encyclopaedia becomes workable and desirable. RDF is particularly intended for software applications rather than being directly displayed to people, and provides a common graph-based data model so that information can be exchanged between applications without any loss of meaning.
People using a Wikipedia ontology could reuse the data in different application scenarios as people can have easy access to Wikipedia for various software programs through the use of an ontology which is extendable, non-proprietary and interoperable across the Internet.
[edit </w/index.php?title=Transwiki:Wikimania05/Paper-IM1&action=edit§ion=4>]
Paper Layout
We propose to use an ontology (WikiOnt) to describe the schema of the Wikipedia / MediaWiki dataset using the Web Ontology Language (OWL). In this paper, we describe the main concepts and relations in our proposed ontology, derived from both the HTML rendering and the original relational data. The authors present methods detailing how to convert the instances into a format adhering to the ontology, alongside a PHP5 implementation of such a converter, relying heavily on regular expressions. After sketching out how the converted data can be integrated with other datasets such as WordNet or the world of FOAF http://www.foaf-project.org, we discuss our practical experiences and lessons learned in converting a large-scale interconnected knowledge base such as Wikipedia.