Invenio User Group Workshop 2017

Name: Invenio User Group Workshop 2017
Start: 2017-03-21T08:00:00+01:00
End: 2017-03-24T15:00:00+01:00
Location: Heinz Maier-Leibnitz Zentrum (MLZ)

21 Mar 2017, 08:00 → 24 Mar 2017, 15:00 Europe/Berlin

Heinz Maier-Leibnitz Zentrum (MLZ)

Institute of Advanced Study (IAS), Technische Universität München, Garching near Munich (Germany) www.tum-ias.de

Description

INVITATION

Dear Invenio developer or user,

We would like to announce the forth Invenio User Group Workshop, to be held by the Heinz Maier-Leibnitz Zentrum (MLZ) on the research campus of Garching from Tuesday, 21 March to Friday, 24 March 2017. This workshop is jointly organized by CERN and MLZ for JOIN².

It is intended for Invenio administrators and will consist of a series of lectures, practical exercises, and discussions with Invenio developers. The goal is to enable better understanding of Invenio features and capabilities, to discuss specific needs, forthcoming features and developments, etc.

The Invenio User Group Workshop 2017 will address a wide range of topics related to practical aspects of running digital repository services. We welcome proposals for presentations especially on the following themes:

1. Invenio for libraries
Talks are invited on how Invenio addresses integrated library system needs such as acquisition, circulation, reporting, statistics, discovery tools, matching and merging tools.

2. Invenio in the Open Access world
Talks are invited to demonstrate how Invenio connects to the Open Access content publishing world including topics such as persistent identifiers for material, authors, grants, licensing issues, authentication (with ORCID), e-publishing, open access and open data publishing challenges (e.g. OpenAPC, OpenAire compatibility).

3. Invenio for service managers
Talks are invited to demonstrate the use of Invenio throughout the life cycle of services, from the initial installation and customisation, through maintenance, continuous improvement to user support and service monitoring practices.

4. Invenio for multimedia
Talks are invited to explain how Invenio can be used to collect multimedia content produced by institutional audiovisual services and how this material is disseminated to grand-public.

5. Invenio for research data
Talks are invited on how Invenio operates with collections of datasets, focusing on the specificities of managing very large files from scientific experiments.

6. Proposals for Tutorials are also welcomed, with practical hands-on sessions aimed at developers or system managers.

Any other topics of interest as well as reports on your own experience with Invenio are most welcome.

Abstract submission and registration are open now!

Please circulate this workshop link also among colleagues who might be interested in.

Looking forward to seeing you.

Yours,

CERN and MLZ for JOIN²

Twitter hashtag: #IUGW2017

Email contact

iugw-2017@mlz-garching.de

Participants

44 View full list

Tuesday 21 March
- 08:15 → 09:00
  
  Registration 45m
- 09:00 → 10:00
  Workshop: Kick-off
  - 09:00
    
    Welcome 10m
    
    Speaker: Connie Hesse (Technische Universität München (TUM))
    
    Recording
  - 09:10
    
    Overview and Goals 10m
    
    Speaker: Jose Benito Gonzalez Lopez (CERN)
    
    2017-03-IUGW-Overview.pdf
    
    Recording
  - 09:20
    
    Invenio: State of the Union 20m
    
    Speaker: Lars Holm Nielsen (CERN)
    
    mar2017-state-of-the-union.pdf
    
    Recording
  - 09:40
    
    Invenio 3 - Call for action 20m
    
    Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
    
    Handout.pdf
    
    Recording
- 10:00 → 10:30
  
  Coffee break 30m
- 10:30 → 12:15
  Services round table
  - 10:30
    
    Invenio @ JINR 10m
    
    JINR open access repository, JDS (JINR Document Server), launched on the Invenio platform is functioning since 2009. Started with Invenio v.99 and now updated it to v.1.2.2.
    JDS collections include published articles, books, theses, conference proceedings, audio, video materials, etc. Various methods of ingesting of documents into JDS and updating its content are applied: submission by authors, harvesting, (automatic) uploading. Further development of JDS is connected with the project “JINR corporate information system” aimed as information support of scientific researches performed at JINR. Within the project we are creating a collection “Authority” which is intended to be a core of this system.
    
    Speaker: Tatiana Zaikina (Joint Institute for Nuclear Research)
    
    Recording
    
    Zaikina_JDS_Workshop2017.pdf
  - 10:40
    
    Invenio @ IAEA 10m
    
    Speaker: Jaime Garcia Llopis (CERN)
    
    Recording
  - 10:50
    
    Invenio @ Universitat Autònoma de Barcelona 10m
    
    Speaker: Ferran Jorba (Universitat Autònoma de Barcelona)
    
    Invenio at UAB.pdf
    
    Recording
  - 11:00
    
    Invenio ILS as SaaS @ TIND 10m
    
    Speaker: Audun Bjorkoy
    
    Recording
  - 11:10
    
    Invenio @ JOIN2 10m
    
    Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
    
    Handout.pdf
    
    Recording
  - 11:20
    
    Invenio @ CERN Scientific Information Services (INSPIRE / HEPData / SCOAP3) 10m
    
    Speaker: Samuele Kaplun (CERN)
    
    INSPIRE, HEPData, SCOAP³.pdf
    
    Recording
  - 11:30
    
    Invenio @ CERN IT (CDS, B2SHARE, Zenodo, OpenData, Analysis Preservation, OAIS Archival Store) 10m
    
    Speaker: Jose Benito Gonzalez Lopez (CERN)
    
    2017-03-IUGW-DR-Services.pdf
    
    Recording
- 12:15 → 13:45
  
  Lunch break 1h 30m
- 13:45 → 15:30
  Hands-on (Getting started): Installing and running v3
  
  Recording
  - 13:45
    
    Getting started with v3 1h
    
    Speaker: Tibor Simko (CERN)
  - 14:45
    
    End-user tour of v3 45m
    
    Speaker: Tibor Simko (CERN)
- 15:30 → 16:00
  
  Coffee break 30m
- 16:00 → 18:00
  
  Tour 2h
- 16:00 → 18:00
  
  Troubleshooting 2h
Wednesday 22 March
- 09:00 → 10:30
  Legacy: Dumping data, ORCID and migration
  - 09:00
    
    Things you can do dumping your Invenio database into a flat file 15m
    
    Invenio database design and interfaces are optimized for fast end user
    search and retrieval. As administrators, we can add indexes at will
    and use them via web or API. However, many maintenance tasks are not
    well covered with those indexes.
    
    For most of those cases, reading the records sequentialy is the
    optimal solution. However, if the database is large enough, reading
    them via Invenio API may take hours, while the system slows down and
    it may become unresponsive.
    
    In this presentation I'll show a small Python tool that uses Invenio
    API and a SQLite database as cache to keep an up to date flat file
    with your bibliographic records.
    
    We'll see how whith this flat file it is much faster and easier to do
    tasks like generate specialised statistics, quality control, automatic
    record enrichment or cleaning, or even creating exotic indexes or
    counters.
    
    Speaker: Ferran Jorba (Universitat Autònoma de Barcelona)
    
    Recording
    
    Things you can do dumping your Invenio database into a flat file.pdf
  - 09:15
    
    ORCID implementation in Invenio 1.1 15m
    
    We present an extension to the Invenio 1.1 software for semi-automatically harvesting ORCID IDs of users and allowing them to upload publications to their respective ORCID profile. This extension was created in the context of the Join2 initiative, however, it can easily be adapted to other Invenio instances because it is only loosely coupled with Invenio itself. It opens its own local webserver to handle the additional endpoints, and calls Invenio API functions and command line programs to interact with the database. We also present a recommended workflow for successfully harvesting ORCID ID in an institution. The implementation is realised in well-documented Python 2.6 and Go and will be published as Free Software.
    
    Speaker: Torsten Bronger (Forschungzentrum Jülich)
    
    ORCID in Invenio 1.pdf
    
    Recording
  - 09:30
    
    Migrating records from v1.2 to v3 15m
    
    Speaker: Esteban Gabancho (CERN)
    
    migration_1_3.pdf
    
    Recording
  - 09:45
    
    INSPIRE live migration 15m
    
    Speaker: Samuele Kaplun (CERN)
    
    INSPIRE Live Migration.pdf
    
    INSPIRE Live Migration.pdf
    
    INSPIRE Live Migration.pptx
    
    Recording
  - 10:00
    
    CERN Document Server migration of 1.2M records 15m
    
    Speaker: Ludmila Marian (CERN)
    
    IUGW_migration.pdf
    
    Recording
- 10:30 → 10:45
  
  Coffee break 15m
- 10:45 → 12:15
  Legacy: Libraries and Open Access
  - 10:45
    
    Invenio as a library system 15m
    
    As a join2 partner, DESY library uses Invenio already for it's publication database and institutional repository. The next logical step is to also migrate the library catalogue from the currently used Aleph system to Invenio. Starting out with a short introduction of how to migrate Aleph. This includes the migration of bibliographic data as well as holdings but also movement data, current loans etc.
    
    The talk also outlines some of the new additions required to run Invenio as an ILS at DESY based on the infrastructure already existing. E.g. it is necessary for DESY to interact with RFID based self service terminals, barcode based library cards and external patrons who have not DESY account etc.
    
    Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
    
    invenio-as-a-library-system.pdf
    
    Recording
  - 11:00
    
    The usages of JOIN2 authority records 15m
    
    An important base of the common JOIN2 repository infrastructure of DESY, DKFZ, FZJ, GSI, MLZ and RWTH Aachen are about 134 000 authority records for grants, projects, large-scale infrastructures, cooperations, journals, and different kinds of keys. All instances are using the authorities together.
    We will present how these authority data are used for different purposes e.g. the recent and upcoming obligations to report to regard to our funding and the data export to openAire. Furthermore, we discuss this in dependence to the German “Kerndatensatz Forschung”, which will be the new standard for future.
    
    Speakers: Dr Robert Thiele (Deutsches Elektronen-Synchrotron DESY), Katrin Grosse (GSI Helmholtzzentrum für Schwerionenforschung GmbH )
    
    20170322-IUGW_Thiele_Grosse.pdf
    
    Recording
  - 11:15
    
    Matching and merging 15m
    
    When harvesting information from different sources it is necessary to identify
    identical objects. If both have the same unique identifier like a DOI or a
    report-number this is trivial but unfortunately a rare case.
    
    Most of the time matching is mainly based on author and title information.
    However, titles may change significantly from preprint to publication and
    depending on the type of the publication (journal paper, conference contribution,
    thesis) even identical basic metadata would lead to separate records.
    
    In general a two-step process is needed:
    a) search for potential candidates. Here it is necessary to define a search query
    with a high efficiency. However, if the search is too fuzzy, the number of records
    as search result is too large and matching becomes not feasible. Restriction to a
    limited scope of records is helpful.
    b) confirmation of the match. Depending on the strategy clear results can be
    treated automatically, whereas doubtful cases might be presented to a human for
    final decision. In both cases it is essential to have enough information.
    
    For a reliable match good quality of uniform metadata is essential and in many cases processing of content information like abstract, references or fulltext is needed.
    
    Once two records have been identified as equal or existing information receives an update, the information needs to be merged. There are obvious cases where one source always supersedes another, maybe some information comes only from one source. But to add e.g. an ORCID from one source to the author and affiliation from another source requires the identification of corresponding information.
    
    Experience from INSPIRE shows what is currently done (fields with controlled vocabulary), what is doable (fields where the content can be identified) and where merging is not feasible but one version simply overwrites another.
    
    What can be done automatically, which tools are needed, when is human intervention necessary? When is it worthwhile to overwrite (i.e. delete) manually curated, high quality information?
    
    Speaker: Kirsten Sachs (DESY)
    
    IUGW.pdf
    
    Recording
  - 11:30
    
    What is needed for effective open access workflows? 15m
    
    Institutions and funders are pushing forward open access with ever new guidelines and policies. Since institutional repositories are important maintainers of green open access, they should support easy and fast workflows for researchers and libraries to release publications. Based on the requirements specification of researchers, libraries and publishers, possible supporting software extensions are discussed. How does a typical workflow look like? What has to be considered by the researchers and by the editors in the library before releasing a green open access publication? Where and how can software support and improve existing workflows?
    
    Speaker: Dr Claudia Frick (Forschungszentrum Juelich)
    
    IUGW2017-Frick.pdf
    
    Recording
  - 11:45
    
    Article Processing Charges and OpenAPC 15m
    
    The publication landscape is about to change. While being largely operated by subscription based journals in the past, recent political decisions force the publishing industry towards OpenAccess. Especially, the publication of the Finch report in 2012 put APC based Gold OpenAccess models almost everywhere on the agenda. These models also require quite some adoptions for library work flows to handle payments, bills and centralized funds for publication fees. Sometimes handled in specialized systems (e.g. first setups in Jülich) pretty early on discussions started to handle APCs in local repositories which would also hold the OpenAccess content resulting from these fees, e.g. the University of Regenburg uses ePrints for this purpose.
    
    Backed up by the OpenData movmement, libraries also saw opportunity to exchange data about fees payed. Thus, OpenAPC.de was born in 2014 on github to facilitate this exchange and aggregate large amounts of data for evaluation and comparison. Using the repository to hold payment data usage of OAI-PMH is immediate. Thus, join2 and the University of Regensburg developed an interchange format for APC data that allows easy and automatic delivery to OpenAPC.
    
    This talk outlines a working solution for APC management and hook up with OpenAPC based on Invenio as implemented in join2.
    
    Speaker: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg)
    
    article-processing-charges.pdf
    
    Recording
- 12:15 → 13:45
  
  Lunch break 1h 30m
- 13:45 → 15:15
  Hands-on (Customisations): Simple customisation of V3
  - 13:45
    
    Simple customisations of logo, facets, sort options, query parser and record templating 1h 30m
    
    Speakers: Harris Tzovanakis (National Technical Univ. of Athens (GR)), Javier Martin Montull (CERN)
    
    IUGW2017 - Customize
    
    IUGW2017-Customize.pdf
    
    Recording
- 15:15 → 15:30
  
  Coffee break 15m
- 15:30 → 17:00
  Hands-on (Customisations): Intermediate customisation of V3
  - 15:30
    
    Tutorial/tour: v3 data model and indexing 1h
    
    Speakers: Lars Holm Nielsen (CERN), Nicolas Harraudeau (CERN)
    
    datamodels tutorial IUGW2017.pdf
    
    Recording
  - 16:30
    
    Tutorial: Enabling ORCID login in V3 30m
    
    Speaker: Samuele Kaplun (CERN)
- 17:00 → 18:00
  
  Troubleshooting 1h
Thursday 23 March
- 09:00 → 10:30
  Service management
  - 09:00
    
    Services for Invenio v3 (Elasticsearch, PostgreSQL, ...) 15m
    
    Speaker: Guillaume Lastecoueres (TIND Technologies)
    
    Recording
  - 09:15
    
    Monitoring your V3 infrastructure 15m
    
    Speaker: Lars Holm Nielsen (CERN)
    
    mar2017-monitoring.pdf
    
    Recording
  - 09:30
    
    High availability for Invenio v3 15m
    
    Speaker: Esteban Gabancho (CERN)
    
    high_availability_i3.pdf
    
    Recording
  - 09:45
    
    Deploying Invenio v3 20m
    
    Speaker: Esteban Gabancho (CERN)
    
    deploying_i3.pdf
    
    Recording
  - 10:05
    
    Deploying with Docker 20m
    
    Speaker: Audun Bjorkoy
    
    Recording
- 10:30 → 10:50
  
  Coffee break 20m
- 10:50 → 12:15
  Research data
  - 10:50
    
    Invenio as one Module within a Holistic Service Suite for Research Data Management 15m
    
    Research data management is a duty a university or research institute can not ignore any longer. But setting up a suitable infrastructure is cumbersome and ill-supported by national or international infrastructures yet, in particular in Germany [1]. At the same time monolithic IT solutions encompassing the whole data lifecycle as well as the entire university or research institute are not an option since there is much too much development and there are far too many changes and disciplines involved, in particular when looking into solutions that really support individual research units.
    There are some prominent projects, mainly ZENODO (http://zenodo.org) and EUDAT (http://eudat.eu), funded by the EU that make use of the Invenio framework mainly for publishing research data.
    Yet publishing is only one component of research data management. How about keeping data not be published, long-term preservation, or linking publications to its foundational data? Various different approaches and tools support different aspects of research data management and need to be combined into a holistic and adaptable service suite.
    This presentation shows how RWTH Aachen University makes use of the Invenio and in particular the JOIN2 infrastructure as a module within this service suite. DOI minting, linkage between data records and towards authority files for people, institutes, and projects, and alternative storage facilities are some of the topics that will be addressed. Overall, we point out current achievements as well as open challenges.
    
    [1] Leistung aus Vielfalt: Empfehlungen zu Strukturen, Prozessen und Finanzierung des Forschungsdatenmanagements in Deutschland, Göttingen : Rat für Informationsinfrastrukturen, URN: urn:nbn:de:101:1-201606229098, 2016
    
    Speaker: Dominik Schmitz (RWTH Aachen University)
    
    IUGW2017_Module4RDM_Schmitz.pdf
    
    Recording
  - 11:05
    
    Dynamic metadata model for B2Share 15m
    
    Invenio 3 validates metadata format using JSON Schemas. This presentation will show how B2Share enables its users to create their own custom schemas and share them with other communities.
    
    Speaker: Nicolas Harraudeau (CERN)
    
    Dynamic metadata model for B2Share.pdf
    
    Recording
  - 11:20
    
    Caltech RDM by TIND 15m
    
    Speaker: Audun Bjorkoy
    
    Recording
  - 11:35
    
    Two Petabytes in Invenio? CERN (Open) Data 15m
    
    Speaker: Tibor Simko (CERN)
    
    cap-iugw-2017-03-23.pdf
    
    Recording
  - 11:50
    
    Handling large files and versioning data sets 15m
    
    Speaker: Lars Holm Nielsen (CERN)
    
    mar2017-files.pdf
    
    Recording
- 12:15 → 13:45
  
  Lunch break 1h 30m
- 13:45 → 14:45
  Workshop: Multimedia and record editor
  - 13:45
    
    A new record editor for Invenio 3 15m
    
    On this presentation, a new record editor will be presented. Current version under development can be found in https://github.com/inveniosoftware-contrib/ng2-json-editor. This editor uses JSON as its native data format, provides many configuration options and can handle very large JSON documents. An update on the development status and pointers to how to use it in your own installation will be provided.
    
    Speaker: Javier Martin Montull (CERN)
    
    IUGW-2017-record-editor.pdf
    
    Recording
  - 14:00
    
    CERN Document Server Videos 15m
    
    Speaker: Ludmila Marian (CERN)
    
    IUGW_videos.pdf
    
    Recording
  - 14:15
    
    Machine Learning examples on Invenio 15m
    
    This talk will present the different Machine Learning tools that the INSPIRE is developing and integrating in order to automatize as much as possible content selection and curation in a subject based repository.
    
    Speaker: Mr Samuele Kaplun (CERN)
    
    Machine learning examples on Invenio.pdf
    
    Recording
  - 14:30
    
    CERN Archival Store: Invenio Archivematica integration 15m
    
    Speaker: Remi Ducceschi (Universite de Franche-Comte (FR))
    
    archival_store.pdf
    
    Recording
- 14:45 → 15:15
  Hands-on (Develop v3): Architecture
  - 14:45
    
    Architecture and module overview 30m
    
    Speaker: Lars Holm Nielsen (CERN)
    
    mar2017-architecture.pdf
    
    Recording
- 15:15 → 15:30
  
  Coffee break 15m
- 15:30 → 17:00
  Hands-on (Develop v3): Build and package
  - 15:30
    
    Develop a simple v3 module 1h
    
    Speakers: Harris Tzovanakis (National Technical Univ. of Athens (GR)), Remi Ducceschi (Universite de Franche-Comte (FR))
    
    IUGW2017 - Develop
    
    IUGW2017-Develop.pdf
  - 16:30
    
    Package your Invenio v3 module 30m
    
    Speakers: Harris Tzovanakis (National Technical Univ. of Athens (GR)), Remi Ducceschi (Universite de Franche-Comte (FR))
- 17:00 → 18:00
  
  Troubleshooting 1h
- 19:00 → 22:00
  
  Workshop Dinner @ Gasthof Neuwirt 3h
Friday 24 March
- 09:00 → 10:15
  Community: Processes and translations
  - 09:00
    
    Invenio community processes 15m
    
    Speaker: Lars Holm Nielsen (CERN)
    
    mar2017-community.pdf
    
    Recording
  - 09:15
    
    Hands-on: Translating Invenio v1 and v3 30m
    
    Speaker: Tibor Simko (CERN)
    
    Recording
  - 09:45
    
    Group Brainstorming: Community 30m
    
    Speakers: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg), Jose Benito Gonzalez Lopez (CERN)
    
    2017-03-IUGW-Brainstorming.pdf
    
    Recording
- 10:15 → 10:30
  
  Coffee break 15m
- 10:30 → 12:15
  Community: Group Brainstorming and closing
  - 10:30
    
    Group Brainstorming: Community 1h
    
    Speakers: Alexander Wagner (Deutsches Elektronensynchrotron DESY, Hamburg), Jose Benito Gonzalez Lopez (CERN)
    
    Recording
  - 11:30
    
    Feedback on workshop 30m
    
    Speaker: Lars Holm Nielsen (CERN)
    
    Recording
  - 12:00
    
    Workshop summary and closing 15m
    
    Speaker: Connie Hesse (Technische Universität München (TUM))
    
    Recording
- 12:15 → 13:15
  
  Lunch break 1h