22-24 June 2011
University of Geneva
Europe/Zurich timezone

Tutorial 1. MarcXimiL : near duplicates detection (and similarity analysis)

22 Jun 2011, 09:00
2h 30m
5183 (University of Geneva)


University of Geneva


Dr Alain Borel (Ecole Polytechnique Fédérale de Lausanne (EPFL)) Mr Jan Krause (University of Geneva)


MarcXimiL is an open source tool which works on MARCXML records and calculates similarity indices between these records. After a short theoretical introduction, the tutorial will focus on how to install, parametrize and use the tool. This tool can be implemented in order to : * prevent creation of duplicates (similar records are shown during the validation process) * identify duplicates into batch files before ingest * find duplicates inside a collection * suggest to users similar records to the one found after a request * match related documents eg. preprints and articles * and so on. http://marcximil.sourceforge.net/
