Training on Analysis Pipelines (Virtual)

UTC
Virtual

Virtual

Alexander Moreno Briceño (Universidad Antonio Nariño), Holly Szumila-Vance, Jim Pivarski (Princeton University), Valeriia Lukashenko (University of Zurich (CH))
Description

What is an analysis pipeline?

No one does a data analysis once. After the exploratory phase, computations that were found to be useful are formalized as reusable programs that convert input data into final results, and these programs are run over and over, with updates, as new corrections and considerations for come to mind. This is a data analysis pipeline.

It is very important for a data analysis pipeline to be reproducible. After all, you want to draw conclusions about your data by running the pipeline under different conditions and seeing how the results change, but they would not be valid conclusions if running it under the same conditions also yields different results! A clean workbench is an essential part of the scientific method, and your data analysis code is part of your scientific workbench.

In addition, scientific results need to be reproducible after your experiment is done. Ensuring reproducibility during your analysis simplifies the process of preserving your analysis for future research. (This training workshop was previously called "Analysis Preservation.")

Reproducibility is a concern for software developers as well, and many of the tools that have been developed for the software industry can be applied to data analysis.

This training event is for data analysts who are already familiar with analysis tools and concepts (e.g. C++, Python, event selection, limit setting) who want to learn how to make their analysis pipelines robust using continuous testing (CI/CD) and containerization (Podman, Docker, and Apptainer).

What is the format of this workshop?

The main part of of the workshop is you learning asynchronously with pre-recorded videos and training material. This means that you can profit from this workshop no matter your time zone! During the whole time we offer assistance via slack.

On the first day, we offer a central kickoff session, help with the setup, and one live lecture. These are only in one timezone, but don't worry if you cannot make it (you can still profit from the rest of the workshop and they will also be recorded).

On the last day, we will offer small-group mentoring sessions (different sessions for all time zones) to help you answer additional/advanced questions and apply your new knowledge to your own analysis. There will also be one more live lecture.

What exactly will I learn?

Are there any prerequisites?

Yes! 

  • Familiarity with git (very important!)
    • Know how to create repositories
    • Know how to edit and push files
    • You should have an account either with github.com, gitlab.com, or gitlab.cern.ch 
  • Some familiarity with the Linux command line
  • Some familiarity with Python


Also, see Setup: do this first on the left sidebar.

Who is supporting this?

This event is supported by CERN and U.S. National Science Foundation Cooperative Agreement OAC-1836650 (IRIS-HEP)

Who is teaching this thing?

This is a hands-on training and consists of asynchronous lectures by the instructor via video recordings.  Along with this, there are mentors who will give individual attention and to debug assistance to participants via chat tools.  The people filling these roles are listed below.  

Instructors: 

  • Podman (Docker):
    • Michel Hernandez Villanueva (Brookhaven)
  • Apptainer (Singularity):
    • Marco Mambelli (Fermilab)
  • GitHub CI/CD:
    • Andres Rios-Tascon (Princeton Univ)
  • GitLab CI/CD:
    • Guillermo Fidalgo (Univ of Puerto Rico Mayaguez) 

Mentors (on Slack): 

  • Lera Lukashenko (Nikhef)
  • Marco Mambelli (Fermilab)
  • Jim Pivarski (Princeton Univ)
  • Richa Sharma (Univ of Puerto Rico Mayaguez) 
  • Michel Hernandez Villanueva (Brookhaven)
  • Alexander Moreno Briceño ( Universida Antonio Nariño)
  • Roy Cruz Candelaria (Univ of Puerto Rico Mayaguez) 
Participants
  • abdullah burkan bereketoglu
  • Ahmed Abdelmotteleb
  • Alberto Belvedere
  • Alberto Lusiani
  • Aleeda Charly
  • Alexander Drabent
  • Alexander Heidelbach
  • Alexander Nicholas Jury
  • Alexandru Manea
  • Alfredo Castaneda
  • Aniket Raj
  • Aniol Lobo Salvia
  • Arthur Kraus
  • Bisnupriya Sahu
  • Bita Masomi
  • Christopher Dilks
  • Daniel Felea
  • David Martin Koch
  • Denys Klekots
  • Edgar Fernando Carrera Jarrin
  • Ehizojie Ali
  • Elena Sacchi
  • Emanuele Villa
  • Francesca Swystun
  • Franz Glessgen
  • Gabriela Hamilton
  • Gerhard Hejc
  • Hayden Richard Hollenbeck
  • Honor Hare
  • Jacob Smith
  • Jakob Nordin
  • Jan Tuzlic Offermann
  • Jim Pivarski
  • John Lawless
  • Julie Hogan
  • Karol Sowa
  • Kati Lassila-Perini
  • Khwaja Idrees Hassan
  • Lakshan Madhan
  • Lanxing Li
  • Lawrence Ng
  • Lekhika Malhotra
  • Levi Evans
  • Lorenzo Valente
  • Louisa Smieska
  • Luca Quaglia
  • Luis Fariña
  • Lukas Gülzow
  • M. Faraz Samavat
  • Manuel Gonzalez Berges
  • Manuel Sommerhalder
  • Marina Orta Terre
  • Matthew Bellis
  • Matthew Snape
  • Mohamed Ouchemhou
  • Moises Zeleny
  • Mukul Mhaskey
  • Nicola Rubini
  • Nilay Bostan
  • Paul Felix Kruper
  • Petar Stojkovic
  • Petri Kehusmaa
  • Prem Kumar
  • Rafey Hashmi
  • Raj Handique
  • Raktim Mukherjee
  • Renaud Amalric
  • Renzo Vizarreta
  • RISHABH MEHTA
  • Rithika Ganesan`
  • Rolf Verberg
  • Ryan De Los Santos
  • Sergio Jaimes
  • Shashank Kumar
  • Shivani Lomte
  • Shuaixiang Zhang
  • Shuchong Ding
  • Si Hyun Jeon
  • Stephanie Kwan
  • Steven Boi
  • Tanishq Sharma
  • Thomas Van Laer
  • Valerii Kholoimov
  • Vitor Jose Shen
  • Vladislav Kuskov
  • Waleed Hussain
  • Wenxing Fang
  • Xiaodong Shi
  • Yamna Shaikh
  • Yao Yao
  • Ziou He
  • ‪Sohila Eid‬‏
  • +94