Archives, Library and Open Science events

The Rosetta Project: Building an Archive of ALL Documented Languages

by Laura Buszard-Welcher (The Rosetta Project / University of California, Berkeley)

Europe/Zurich
60/6-002 (CERN)

60/6-002

CERN

90
Show room on map
Description
The Rosetta Project is a collaborative, community-based effort to create a large, online, public digital archive of language documentation. As linguists predict that we may lose as much as 90% of the world's linguistic diversity in the next century (Krauss, 1992), the project is particularly focused on the message of preserving the world's linguistic diversity through education, dissemination of public information on languages, and supporting the efforts of linguists and speech communities to document and describe endangered languages.
The archive, which is part of the U.S. National Science Foundation Digital Library (NSDL), currently houses over 70,000 text pages on over 2,500 languages. The first part of the presentation will provide an overview of the goals, history and activities of the Rosetta Project, including the creation of a microetched 'Rosetta Disk' (a new generation microfiche with thousands of images from the Rosetta Project archive), the design of the Rosetta Web site, and most recently the development of 'Query Rooms'-online discussion forums for speakers and learners of endangered languages. New project proposals will also be discussed, including DOCS (Digital Online Curation Services for Endangered Language Archives) that aims to bring online the unpublished documentation in small language archives around the world.
The Rosetta Project is part of a broader effort in the linguistics community to define and disseminate best practices for digital language resources. As such, we collaborate on a number of projects including E-MELD (Electronic Metastructure for Endangered Language Data) the Open Language Archives Community (OLAC), and GOLD (General Ontology for Linguistic Description). The second part of the presentation will briefly discuss each of these projects and their products including the E-MELD 'School of Best Practice', the FIELD lexical database tool, the OLAC search portal, and GOLD, a semantic markup scheme that enables searches across disparate language resources.

References: Krauss, Michael. 1992. The World's Languages in Crisis. Language 68.1-42. Project URLs: Electronic Metastructure for Endangered Language Data (EMELD) available at http://www.emeld.org.

Endangered Language Query Rooms available at http://rosettaproject.org:8080/emeldbase/.

General Ontology for Linguistic Description (GOLD) available at http://www.linguistics-ontology.org.

National Science Digital Library (NSDL) available at http://nsdl.org.

Open Language Archives Community (OLAC) available at http://www.language-archives.org.

The Rosetta Project, available at http://www.rosettaproject.org/live.

A preview of the new Web site (currently under construction) is available at http://preview.rosettaproject.org.

Please register to attend by contacting Susanne.Schaefer@cern.ch

For the full 2005 programme see at http://librarysciencetalks.web.cern.ch/librarysciencetalks/
more information