White Area lectures

Martin Benjamin (EPFL), The Particles of Language: "The Dictionary" as elemental data for 7000 languages across time and space

by Dr Martin Benjamin (EPFL)

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Description

WhiteArea lectures' twiki HERE

How can we document detailed data about all the world's language in a consistent, unified source, in a way that can serve knowledge and technology needs for people and their machines around the globe? Dictionaries have historically presented selective information about words and their meanings within a language, or translation equivalents between languages, in idiosyncratic, incommensurable formats with little basis in data science. The Kamusi Project introduces a new approach, conceiving of language as a matrix of interrelated data elements. By documenting these elements within each language, and linking elements at conceptual and functional nodes across languages, Kamusi aims toward an elusive Big Data goal: "every word in every language." If successful, the results will run the gamut from preserving the human heritage embedded in endangered languages, to providing international vocabularies for students to succeed in science, to a Star Trek-like universal translator embedded in your smart watch. In this talk, the project's founder discusses the nefarious complexities working against the creation of a universal language data platform, and the systems Kamusi has designed to collect, codify, and deploy quantum-level linguistic data within one massive global dictionary.

Bio: Martin Benjamin is the founder and director of the Kamusi Project (http://kamusi.org), an international non-profit dedicated to producing dictionary and learning resources for languages worldwide. Now resident in Lausanne, he was born and raised in the United States. His PhD in Anthropology, from Yale University, examined international aid in rural Tanzania. He is a senior scientist at the Distributed Information Systems Laboratory (LSIR) at EPFL, where he is developing methods to assemble reliable data across languages.

Organised by

Maria Dimou