Invenio User Group Workshop 2017

Heinz Maier-Leibnitz Zentrum (MLZ)

Heinz Maier-Leibnitz Zentrum (MLZ)

Institute of Advanced Study (IAS), Technische Universität München, Garching near Munich (Germany)


Dear Invenio developer or user,

We would like to announce the forth Invenio User Group Workshop, to be held by the Heinz Maier-Leibnitz Zentrum (MLZ) on the research campus of Garching from Tuesday, 21 March to Friday, 24 March 2017. This workshop is jointly organized by CERN and MLZ for JOIN2.

It is intended for Invenio administrators and will consist of a series of lectures, practical exercises, and discussions with Invenio developers. The goal is to enable better understanding of Invenio features and capabilities, to discuss specific needs, forthcoming features and developments, etc.

The Invenio User Group Workshop 2017 will address a wide range of topics related to practical aspects of running digital repository services. We welcome proposals for presentations especially on the following themes:

1. Invenio for libraries
Talks are invited on how Invenio addresses integrated library system needs such as acquisition, circulation, reporting, statistics, discovery tools, matching and merging tools.

2. Invenio in the Open Access world
Talks are invited to demonstrate how Invenio connects to the Open Access content publishing world including topics such as persistent identifiers for material, authors, grants, licensing issues, authentication (with ORCID), e-publishing, open access and open data publishing challenges (e.g. OpenAPC, OpenAire compatibility).

3. Invenio for service managers
Talks are invited to demonstrate the use of Invenio throughout the life cycle of services, from the initial installation and customisation, through maintenance, continuous improvement to user support and service monitoring practices.

4. Invenio for multimedia
Talks are invited to explain how Invenio can be used to collect multimedia content produced by institutional audiovisual services and how this material is disseminated to grand-public.

5. Invenio for research data
Talks are invited on how Invenio operates with collections of datasets, focusing on the specificities of managing very large files from scientific experiments.

6. Proposals for Tutorials are also welcomed, with practical hands-on sessions aimed at developers or system managers.

Any other topics of interest as well as reports on your own experience with Invenio are most welcome.

Abstract submission and registration are open now!

Please circulate this workshop link also among colleagues who might be interested in.

Looking forward to seeing you.


CERN and MLZ for JOIN2



Powered by Foswiki, The Free and Open Source Wiki             logo-mlz-blue-transparent.png


Twitter hashtag: #IUGW2017

  • Abdoulaye Saliou Diallo
  • Aboubakr Badr
  • Alexander Wagner
  • Andrea Ciocchetti
  • Audun Bjørkøy
  • Bernhard Flechtker
  • Boudjamaa roudane
  • Carlos Fernando Gamboa
  • Catharina Wasner
  • Claudia Frick
  • Claudio Zambaldi
  • Connie Hesse
  • Corinna Brueckener
  • Dagmar Sitek
  • Dominik Schmitz
  • Esteban Gabancho
  • Ferran Jorba
  • Gianni Pante
  • Gudrun Friedburg
  • Guillaume Lastecoueres
  • Harris Tzovanakis
  • Igor Milhit
  • Irene Büttner
  • Iris Schmitz-Schug
  • Jaime García
  • Jana Sloukova
  • Javier Martin Montull
  • Johnny Mariéthoz
  • Jose Benito Gonzalez Lopez
  • Jürgen Neuhaus
  • Katrin Große
  • Kirsten Sachs
  • Lars Holm Nielsen
  • Laurin Wegelin
  • Louai Barake
  • Ludmila Marian
  • Nicolas Harraudeau
  • Roman Semenov
  • Rémi Ducceschi
  • Samuele Kaplun
  • Stefan Hesselbach
  • Tatiana Zaikina
  • Tibor Simko
  • Torsten Bronger
    • 08:15 09:00
      Registration 45m
    • 09:00 10:00
      Workshop: Kick-off
    • 10:00 10:30
      Coffee break 30m
    • 10:30 12:15
      Services round table
      • 10:30
        Invenio @ JINR 10m

        JINR open access repository, JDS (JINR Document Server), launched on the Invenio platform is functioning since 2009. Started with Invenio v.99 and now updated it to v.1.2.2.
        JDS collections include published articles, books, theses, conference proceedings, audio, video materials, etc. Various methods of ingesting of documents into JDS and updating its content are applied: submission by authors, harvesting, (automatic) uploading. Further development of JDS is connected with the project “JINR corporate information system” aimed as information support of scientific researches performed at JINR. Within the project we are creating a collection “Authority” which is intended to be a core of this system.

        Speaker: Tatiana Zaikina (Joint Institute for Nuclear Research)
      • 10:40
        Invenio @ IAEA 10m
        Speaker: Jaime Garcia Llopis (CERN)
      • 10:50
        Invenio @ Universitat Autònoma de Barcelona 10m
        Speaker: Ferran Jorba (Universitat Autònoma de Barcelona)
      • 11:00
        Invenio ILS as SaaS @ TIND 10m
        Speaker: Audun Bjorkoy
      • 11:10
        Invenio @ JOIN2 10m
        Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
      • 11:20
        Invenio @ CERN Scientific Information Services (INSPIRE / HEPData / SCOAP3) 10m
        Speaker: Samuele Kaplun (CERN)
      • 11:30
        Invenio @ CERN IT (CDS, B2SHARE, Zenodo, OpenData, Analysis Preservation, OAIS Archival Store) 10m
        Speaker: Jose Benito Gonzalez Lopez (CERN)
    • 12:15 13:45
      Lunch break 1h 30m
    • 13:45 15:30
      Hands-on (Getting started): Installing and running v3
      • 13:45
        Getting started with v3 1h
        Speaker: Tibor Simko (CERN)
      • 14:45
        End-user tour of v3 45m
        Speaker: Tibor Simko (CERN)
    • 15:30 16:00
      Coffee break 30m
    • 16:00 18:00
      Tour 2h
    • 16:00 18:00
      Troubleshooting 2h
    • 09:00 10:30
      Legacy: Dumping data, ORCID and migration
      • 09:00
        Things you can do dumping your Invenio database into a flat file 15m

        Invenio database design and interfaces are optimized for fast end user
        search and retrieval. As administrators, we can add indexes at will
        and use them via web or API. However, many maintenance tasks are not
        well covered with those indexes.

        For most of those cases, reading the records sequentialy is the
        optimal solution. However, if the database is large enough, reading
        them via Invenio API may take hours, while the system slows down and
        it may become unresponsive.

        In this presentation I'll show a small Python tool that uses Invenio
        API and a SQLite database as cache to keep an up to date flat file
        with your bibliographic records.

        We'll see how whith this flat file it is much faster and easier to do
        tasks like generate specialised statistics, quality control, automatic
        record enrichment or cleaning, or even creating exotic indexes or

        Speaker: Ferran Jorba (Universitat Autònoma de Barcelona)
      • 09:15
        ORCID implementation in Invenio 1.1 15m

        We present an extension to the Invenio 1.1 software for semi-automatically harvesting ORCID IDs of users and allowing them to upload publications to their respective ORCID profile. This extension was created in the context of the Join2 initiative, however, it can easily be adapted to other Invenio instances because it is only loosely coupled with Invenio itself. It opens its own local webserver to handle the additional endpoints, and calls Invenio API functions and command line programs to interact with the database. We also present a recommended workflow for successfully harvesting ORCID ID in an institution. The implementation is realised in well-documented Python 2.6 and Go and will be published as Free Software.

        Speaker: Torsten Bronger (Forschungzentrum Jülich)
      • 09:30
        Migrating records from v1.2 to v3 15m
        Speaker: Esteban Gabancho (CERN)
      • 09:45
        INSPIRE live migration 15m
        Speaker: Samuele Kaplun (CERN)
      • 10:00
        CERN Document Server migration of 1.2M records 15m
        Speaker: Ludmila Marian (CERN)
    • 10:30 10:45
      Coffee break 15m
    • 10:45 12:15
      Legacy: Libraries and Open Access
      • 10:45
        Invenio as a library system 15m

        As a join2 partner, DESY library uses Invenio already for it's publication database and institutional repository. The next logical step is to also migrate the library catalogue from the currently used Aleph system to Invenio. Starting out with a short introduction of how to migrate Aleph. This includes the migration of bibliographic data as well as holdings but also movement data, current loans etc.

        The talk also outlines some of the new additions required to run Invenio as an ILS at DESY based on the infrastructure already existing. E.g. it is necessary for DESY to interact with RFID based self service terminals, barcode based library cards and external patrons who have not DESY account etc.

        Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
      • 11:00
        The usages of JOIN2 authority records 15m

        An important base of the common JOIN2 repository infrastructure of DESY, DKFZ, FZJ, GSI, MLZ and RWTH Aachen are about 134 000 authority records for grants, projects, large-scale infrastructures, cooperations, journals, and different kinds of keys. All instances are using the authorities together.
        We will present how these authority data are used for different purposes e.g. the recent and upcoming obligations to report to regard to our funding and the data export to openAire. Furthermore, we discuss this in dependence to the German “Kerndatensatz Forschung”, which will be the new standard for future.

        Speakers: Dr Robert Thiele (Deutsches Elektronen-Synchrotron DESY), Katrin Grosse (GSI Helmholtzzentrum für Schwerionenforschung GmbH )
      • 11:15
        Matching and merging 15m

        When harvesting information from different sources it is necessary to identify
        identical objects. If both have the same unique identifier like a DOI or a
        report-number this is trivial but unfortunately a rare case.

        Most of the time matching is mainly based on author and title information.
        However, titles may change significantly from preprint to publication and
        depending on the type of the publication (journal paper, conference contribution,
        thesis) even identical basic metadata would lead to separate records.

        In general a two-step process is needed:
        a) search for potential candidates. Here it is necessary to define a search query
        with a high efficiency. However, if the search is too fuzzy, the number of records
        as search result is too large and matching becomes not feasible. Restriction to a
        limited scope of records is helpful.
        b) confirmation of the match. Depending on the strategy clear results can be
        treated automatically, whereas doubtful cases might be presented to a human for
        final decision. In both cases it is essential to have enough information.

        For a reliable match good quality of uniform metadata is essential and in many cases processing of content information like abstract, references or fulltext is needed.

        Once two records have been identified as equal or existing information receives an update, the information needs to be merged. There are obvious cases where one source always supersedes another, maybe some information comes only from one source. But to add e.g. an ORCID from one source to the author and affiliation from another source requires the identification of corresponding information.

        Experience from INSPIRE shows what is currently done (fields with controlled vocabulary), what is doable (fields where the content can be identified) and where merging is not feasible but one version simply overwrites another.

        What can be done automatically, which tools are needed, when is human intervention necessary? When is it worthwhile to overwrite (i.e. delete) manually curated, high quality information?

        Speaker: Kirsten Sachs (DESY)
      • 11:30
        What is needed for effective open access workflows? 15m

        Institutions and funders are pushing forward open access with ever new guidelines and policies. Since institutional repositories are important maintainers of green open access, they should support easy and fast workflows for researchers and libraries to release publications. Based on the requirements specification of researchers, libraries and publishers, possible supporting software extensions are discussed. How does a typical workflow look like? What has to be considered by the researchers and by the editors in the library before releasing a green open access publication? Where and how can software support and improve existing workflows?

        Speaker: Dr Claudia Frick (Forschungszentrum Juelich)
      • 11:45
        Article Processing Charges and OpenAPC 15m

        The publication landscape is about to change. While being largely operated by subscription based journals in the past, recent political decisions force the publishing industry towards OpenAccess. Especially, the publication of the Finch report in 2012 put APC based Gold OpenAccess models almost everywhere on the agenda. These models also require quite some adoptions for library work flows to handle payments, bills and centralized funds for publication fees. Sometimes handled in specialized systems (e.g. first setups in Jülich) pretty early on discussions started to handle APCs in local repositories which would also hold the OpenAccess content resulting from these fees, e.g. the University of Regenburg uses ePrints for this purpose.

        Backed up by the OpenData movmement, libraries also saw opportunity to exchange data about fees payed. Thus, was born in 2014 on github to facilitate this exchange and aggregate large amounts of data for evaluation and comparison. Using the repository to hold payment data usage of OAI-PMH is immediate. Thus, join2 and the University of Regensburg developed an interchange format for APC data that allows easy and automatic delivery to OpenAPC.

        This talk outlines a working solution for APC management and hook up with OpenAPC based on Invenio as implemented in join2.

        Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
    • 12:15 13:45
      Lunch break 1h 30m
    • 13:45 15:15
      Hands-on (Customisations): Simple customisation of V3
      • 13:45
        Simple customisations of logo, facets, sort options, query parser and record templating 1h 30m
        Speakers: Harris Tzovanakis (National Technical Univ. of Athens (GR)), Javier Martin Montull (CERN)
    • 15:15 15:30
      Coffee break 15m
    • 15:30 17:00
      Hands-on (Customisations): Intermediate customisation of V3
      • 15:30
        Tutorial/tour: v3 data model and indexing 1h
        Speakers: Lars Holm Nielsen (CERN), Nicolas Harraudeau (CERN)
      • 16:30
        Tutorial: Enabling ORCID login in V3 30m
        Speaker: Samuele Kaplun (CERN)
    • 17:00 18:00
      Troubleshooting 1h
    • 09:00 10:30
      Service management
    • 10:30 10:50
      Coffee break 20m
    • 10:50 12:15
      Research data
      • 10:50
        Invenio as one Module within a Holistic Service Suite for Research Data Management 15m

        Research data management is a duty a university or research institute can not ignore any longer. But setting up a suitable infrastructure is cumbersome and ill-supported by national or international infrastructures yet, in particular in Germany [1]. At the same time monolithic IT solutions encompassing the whole data lifecycle as well as the entire university or research institute are not an option since there is much too much development and there are far too many changes and disciplines involved, in particular when looking into solutions that really support individual research units.
        There are some prominent projects, mainly ZENODO ( and EUDAT (, funded by the EU that make use of the Invenio framework mainly for publishing research data.
        Yet publishing is only one component of research data management. How about keeping data not be published, long-term preservation, or linking publications to its foundational data? Various different approaches and tools support different aspects of research data management and need to be combined into a holistic and adaptable service suite.
        This presentation shows how RWTH Aachen University makes use of the Invenio and in particular the JOIN2 infrastructure as a module within this service suite. DOI minting, linkage between data records and towards authority files for people, institutes, and projects, and alternative storage facilities are some of the topics that will be addressed. Overall, we point out current achievements as well as open challenges.

        [1] Leistung aus Vielfalt: Empfehlungen zu Strukturen, Prozessen und Finanzierung des Forschungsdatenmanagements in Deutschland, Göttingen : Rat für Informationsinfrastrukturen, URN: urn:nbn:de:101:1-201606229098, 2016

        Speaker: Dominik Schmitz (RWTH Aachen University)
      • 11:05
        Dynamic metadata model for B2Share 15m

        Invenio 3 validates metadata format using JSON Schemas. This presentation will show how B2Share enables its users to create their own custom schemas and share them with other communities.

        Speaker: Nicolas Harraudeau (CERN)
      • 11:20
        Caltech RDM by TIND 15m
        Speaker: Audun Bjorkoy
      • 11:35
        Two Petabytes in Invenio? CERN (Open) Data 15m
        Speaker: Tibor Simko (CERN)
      • 11:50
        Handling large files and versioning data sets 15m
        Speaker: Lars Holm Nielsen (CERN)
    • 12:15 13:45
      Lunch break 1h 30m
    • 13:45 14:45
      Workshop: Multimedia and record editor
      • 13:45
        A new record editor for Invenio 3 15m

        On this presentation, a new record editor will be presented. Current version under development can be found in This editor uses JSON as its native data format, provides many configuration options and can handle very large JSON documents. An update on the development status and pointers to how to use it in your own installation will be provided.

        Speaker: Javier Martin Montull (CERN)
      • 14:00
        CERN Document Server Videos 15m
        Speaker: Ludmila Marian (CERN)
      • 14:15
        Machine Learning examples on Invenio 15m

        This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.

        Speaker: Mr Samuele Kaplun (CERN)
      • 14:30
        CERN Archival Store: Invenio Archivematica integration 15m
        Speaker: Remi Ducceschi (Universite de Franche-Comte (FR))
    • 14:45 15:15
      Hands-on (Develop v3): Architecture
    • 15:15 15:30
      Coffee break 15m
    • 15:30 17:00
      Hands-on (Develop v3): Build and package
      • 15:30
        Develop a simple v3 module 1h
        Speakers: Harris Tzovanakis (National Technical Univ. of Athens (GR)), Remi Ducceschi (Universite de Franche-Comte (FR))
      • 16:30
        Package your Invenio v3 module 30m
        Speakers: Harris Tzovanakis (National Technical Univ. of Athens (GR)), Remi Ducceschi (Universite de Franche-Comte (FR))
    • 17:00 18:00
      Troubleshooting 1h
    • 19:00 22:00
      Workshop Dinner @ Gasthof Neuwirt 3h
    • 09:00 10:15
      Community: Processes and translations
    • 10:15 10:30
      Coffee break 15m
    • 10:30 12:15
      Community: Group Brainstorming and closing
      • 10:30
        Group Brainstorming: Community 1h
        Speakers: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg), Jose Benito Gonzalez Lopez (CERN)
      • 11:30
        Feedback on workshop 30m
        Speaker: Lars Holm Nielsen (CERN)
      • 12:00
        Workshop summary and closing 15m
        Speaker: Connie Hesse (Technische Universität München (TUM))
    • 12:15 13:15
      Lunch break 1h