
We are very excited to announce a workshop on Analysis Reproducibility organised through the HEP Software Foundation and IRIS-HEP
What is analysis reproducibility?
No one does a data analysis once. After the exploratory phase, computations that were found to be useful are formalized as reusable programs that convert input data into final results, and these programs are run over and over, with updates, as new corrections and considerations for come to mind. This is a data analysis pipeline.
It is very important for a data analysis pipeline to be reproducible. After all, you want to draw conclusions about your data by running the pipeline under different conditions and seeing how the results change, but they would not be valid conclusions if running it under the same conditions also yields different results! A clean workbench is an essential part of the scientific method, and your data analysis code is part of your scientific workbench.
In addition, scientific results need to be reproducible after your experiment is done. Ensuring reproducibility during your analysis simplifies the process of preserving your analysis for future research. (This training workshop was previously called "Training on Analysis Pipelines.")
Reproducibility is a concern for software developers as well, and many of the tools that have been developed for the software industry can be applied to data analysis.
This training event is for data analysts who are already familiar with analysis tools and concepts (e.g. C++, Python, event selection, limit setting) who want to learn how to make their analysis pipelines robust using continuous testing (CI/CD) and containerization (Podman, Docker, and Apptainer).
It will be taught by tutors expert in HEP software. Interactive hands-on sessions lead by the tutor will be supported by a number of helpers to ensure all participants are able to follow and understand the material.
Given the limited number of participants, all participants are expected to attend the whole workshop.
This is a virtual event and no payment or travel is required for attending. Participants are required to have their own laptop for the workshop.
The times for the workshop are in US Eastern time zone.
Please contact the organizers (email us) in case of any questions.
What exactly will I learn?
Over four half-days we will cover the fundamentals of:
- Podman (free work-alike of Docker): Podman material
- Apptainer (formerly known as Singularity): Singularity/Apptainer material
- GitHub and GitLab CI/CD: GitHub material, GitLab material
- REANA: material
- Additional topics: lectures
Are there any prerequisites?
Yes!
- Familiarity with git (very important!)
- Know how to create repositories
- Know how to edit and push files
- You should have an account either with github.com, gitlab.com, or gitlab.cern.ch
- Some familiarity with the Linux command line
- Some familiarity with Python
Also, see Setup: do this first on the left sidebar.
Who is supporting this?
This event is supported by CERN and U.S. National Science Foundation Cooperative Agreement OAC-1836650 (IRIS-HEP)
Who is teaching this thing?
This is a hands-on training and consists of live lectures by the instructors via Zoom. Along with this, there are mentors who will give individual attention and to debug assistance to participants via chat tools. The people filling these roles are listed below.
Instructors:
- Podman (Docker):
- TBD
- Apptainer (Singularity):
- Marco Mambelli (Fermilab)
- GitHub CI/CD:
- Andres Rios-Tascon (Princeton Univ)
- GitLab CI/CD:
- Lera Lukashenko (University of Zurich)
- REANA
- Tibor Simko (CERN)
Mentors (on Slack):
- Marco Mambelli (Fermilab)
- Richa Sharma (Univ of Puerto Rico Mayaguez)
- Michel Hernandez Villanueva (Brookhaven National Lab)
- Alexander Moreno Briceño ( Universidad Antonio Nariño)
- Roy Cruz Candelaria (Univ of Puerto Rico Mayaguez)
- Tibor Simko (CERN)
- Tetiana Mazurets (Univ of Puerto Rico Mayaguez)
- Emery Nibigira (University of Tennessee, Knoxville)
- Iliomar Rodriguez Ramos (Univ of Puerto Rico Mayaguez)
- Mateo Lisondo (Univ of Puerto Rico Mayaguez)