Invenio Developer Forum 2015-09-28 Grobid

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Description

The Invenio Developer Forum takes place every Monday at 16:00 CET/CEST and can be accessed via teleconferencing at https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=ygjcGzMEk8re

    • 16:00 17:00
      Grobid 1h
      Speakers: Jacopo Notarstefano (Universita di Pisa & INFN (IT)), Jan Age Lavik (CERN)

      GROBID [1] is a machine learning library for extracting, parsing and re-structuring raw documents such as PDF into structured TEI-encoded documents with a particular focus on technical and scientific publications. 

      INSPIRE-HEP will make use of GROBID in the general ingestion workflow to allow (a) catalogers to work faster with less typing and (b) potential automatic tool for bibliographic reference extraction. In it's first iteration on INSPIRE Labs, we aim to provide an interface for catalogers to upload any PDF and get back extracted metadata and then push the results to the system.

      This presentation will present quickly how GROBID is setup in our infrastructure and integrated in INSPIRE Labs via a specialized Invenio module [2]. We will also touch upon possible extensions of this tool and it's use cases in the future.

      [1] http://grobid.readthedocs.org/en/latest/
      [2] https://github.com/inspirehep/invenio-grobid