*REMOTE* CMS Open Data Workshop for Theorists at the LPC

Edgar Fernando Carrera Jarrin (Universidad San Francisco de Quito (EC)) , Jesse Thaler (MIT) , Kati Lassila-Perini (Helsinki Institute of Physics (FI)) , Matthew Bellis (Cornell University/Siena College (US))

Feedback surveyhttps://forms.gle/m4Hmw8oCvSTR8Fj17

The CMS Open Data Workshop for Theorists will take place on September 30-October 2, 2020. It will be ZOOM ONLY.

In 2014, CMS released a significant amount of data through the CERN Open Data Portal, available to anyone to analyze as they saw fit. In 2017, a theory group at MIT published two peer-reviewed publications using this data, prompting renewed discussion as to how to make these open datasets easier to work with by non-CMS analysts. The goal of this workshop is to lower the threshold for access to these data for theorists and phenomenologists. Attendees will be lead through the steps of how to login to the CERN virtual machines, run the CMS analysis software, and perform the basics of an analysis. All exercises will be hands-on and participants should be prepared to dive into the data right away. Time will also be spent brainstorming with attendees about how the entire process of accessing and analyzing the data could be made more useful for the broader HEP community.

Please join us and register before the deadline of September 23rd.

Everyone has to follow Fermilab's code of conduct:


Organizing Committee:
Matthew Bellis (Siena College)
Edgar Carrera (U. San Francisco de Quito)
Kati Lassila-Perini (U. of Helsinki)
Jesse Thaler (MIT)

Local Organizing Committee:
Gabriele Benelli (Brown U.)
Christian Herwig (Fermilab)
Julie Hogan (Bethel U. and Brown U.)
Clemens Lange (CERN)
Andrew Melo (Vanderbilt U.)
Nada Mohamed (Siena College)
Stephen Mrenna (Fermilab)
Kevin Pedro (Fermilab)
Emanuele Usai (Brown U.)
David Yu (Brown U.)

LPC Events Committee:
Gabriele Benelli (Brown U., Co-Chair)
Kevin Pedro (Fermilab, Co-Chair)

LPC Coordinators: 
Cecilia Gerber (UIC)
Sergo Jindariani (Fermilab)

  • Abderrazaq El Abassi
  • Achim Geiser
  • Aditya Nath Mishra
  • Alba Soto-Ontoso
  • Alejo Rossia
  • Allan Jales
  • Ambar Rodriguez Alicea
  • Andrew Larkoski
  • Andrew Malone Melo
  • Anup Kumar Sikdar
  • Ayodele Ore
  • Bharadwaj Harikrishnan
  • Brian Omar Cruz Rodriguez
  • Brian Shuve
  • Camellia Bose
  • Cari Cesarotti
  • Charanjit Kaur
  • Christian Herwig
  • Clemencia Mora Herrera
  • Conett Huerta Escamilla
  • Daneng Yang
  • Daniel Ernani Martins Neto
  • Daniel Tapia Takaki
  • Diego Alberto Coloma Borja
  • Dimitri Bourilkov
  • Diogo Buarque Franzosi
  • Edgar Fernando Carrera Jarrin
  • Eduardo Brock
  • Eliza Melo Da Costa
  • Emanuele Usai
  • Fanqiang Meng
  • Flip Tanedo
  • Frederic Alexandre Dreyer
  • Gabriel Corrêa
  • Gabriele Benelli
  • Gustavo Gil Da Silveira
  • Haipeng An
  • Harri Waltari
  • Humberto Reyes-González
  • Jasim Afnan Predhanekar
  • Jesse Thaler
  • Johnathan Gargalionis
  • Jorge Andrés Medina Moreira
  • Jose Andres Monroy Montanez
  • Julie Hogan
  • K.C. Kong
  • Kajari Mazumdar
  • Keziban Kandemir
  • Konstantin Matchev
  • Kristin Marie Dona
  • Leonardo Cristella
  • Lizardo Valencia Palomo
  • Marguerite Belt Tonjes
  • Maria Vittoria Garzelli
  • Matthew Bellis
  • Miaoyuan Liu
  • Michael Chang Gordon
  • miguel cifuentes
  • Mohan Sundar B
  • Márcio Mateus Jr
  • Nada Sherif Hatem Mohamed
  • Nadeesha Wickramage
  • Nick Manganelli
  • Nishita Desai
  • Ohannes Kamer Köseyan
  • Orcun Kolay
  • Patricia Rebello Teles
  • Patrick Komiske
  • Philipp Englert
  • Prabhat Solanki
  • Pranav Sistla
  • Prasanth Shyamsundar
  • Rahmat Rahmat
  • Samuel May
  • Sandro Fonseca De Souza
  • Sanmay Ganguly
  • Scott Thomas
  • Sezen Sekmen
  • Shufang Su
  • Steve Mrenna
  • Suchita Kulkarni
  • Sudeshna Banerjee
  • Tamas Almos Vami
  • Thomas Gaehtgens
  • timothy raben
  • Tonatiuh Garcia Chavez
  • Tulio Kuhn
  • Xabier Feal
  • Xavier Coubez
  • Yoxara S. Villamizar
  • Zhen Liu
  • Zongjin Ong
    • 08:30 09:00
    • 09:00 09:45
      Live Presentation: Workshop Introduction 45m
      Speaker: Kati Lassila-Perini (Helsinki Institute of Physics (FI))
    • 09:45 10:00
      Break 15m
    • 10:00 10:45
      Live Hands-on lesson: Dataset scouting 45m

      This lesson is designed to teach you how to use the command-line to explore the directories where the data is stored. In this way, you can see what triggers were applied when the data was taken and what Monte Carlo samples are available for the run period you are interested in.

      You’ll also be shown how to do a first-order inspection of some of these datafiles, just to see what is stored in them.

      Speaker: Matthew Bellis (Cornell University/Siena College (US))
    • 10:45 12:00
      Live Hands-on lesson: Trigger manipulation 1h 15m

      In this lesson you will:

      learn what the CMS trigger system is
      learn how to select and understand triggers for you analysis
      learn how to obtain trigger prescales and acceptance bits

      Speaker: Edgar Fernando Carrera Jarrin (Universidad San Francisco de Quito (EC))
    • 12:00 12:15
      Break 15m
    • 12:15 13:00
      Async Demo: Physics objects 45m

      In this video tutorial we will introduce you to CMS particle flow
      and physics objects. Later, in the afternoon, you will dive into the code. Enjoy!

      A youtube video link is attached as well as the slides presented in it.

      Speaker: Julie Hogan (Brown University, Bethel University (US))
    • 13:00 14:30
      Lunch 1h 30m
    • 14:30 16:00
      Async Hands-on lesson: Physics objects I 1h 30m

      When a physicist approaches an analysis using CMS data, they typically rely on the reconstruction algorithms developed by CMS to interpret detector signals into meaningful physics objects. In code, the result of these recontruction algorithms takes the form of several C++ classes that will be introduced briefly in this lesson. The content of the C++ class reflects the nature of the physics object it represents.

      In this lesson we will study several fundamental particles: muons, electrons, photons, and tau leptons. The first three particles are special in CMS, because they are reconstructed as single “particle-flow candidates”. The Particle Flow algorithm (CITE ME) combines detector signals from multiple CMS subdetector systems to categorize all energy deposits as muons, electrons, photons, neutral hadrons, or charged hadrons. Tau leptons are more complex because they are not stable and have several detector signatures that include muons, electrons, photons, and/or hadrons. In the next lesson we will approach even more complex objects such as jet and missing transverse energy.

      After exploring the code elements that are common to all CMS physics objects we will look at muons, electrons, photons, and tau leptons in more detail to understand the options for identifying these particles in your analysis. The final episode (MAYBE IN SELECTION LESSON?) will show how an analyzer can combine different identification elements into selection criteria.

      Speaker: Julie Hogan (Brown University, Bethel University (US))
    • 16:00 17:00
      The Future is Open: Adventures with Public Collider Data 1h

      Fermilab employees and users can access the Zoom link below (Services login required):


      Please note: you will need the passcode to enter the zoom

      Anyone else can obtain the Zoom link the day of the colloquium by emailing Barb Kronkow at kronkow@fnal.gov

      In November 2014, the CMS experiment at the Large Hadron Collider made the unprecedented move of releasing research-grade particle physics data for unrestricted use. I am a theoretical particle physicist, and for the first time, I had access to real collision data from a cutting-edge experiment, as well as an opportunity to demonstrate the scientific value of public data access. Over the past six years, my research group has carried out a number of innovative analyses using the CMS Open Data. In this colloquium, I highlight some of our research successes as well as some of the challenges we faced using public collider data to explore physics in and beyond the Standard Model.

      Speaker: Jesse Thaler (MIT)
    • 08:30 10:00
      Live Hands-on lesson: Physics objects II - Jets and MET 1h 30m

      Jets and missing transverse energy (MET) are critical for CMS physics analyses. They are more complex than most of the objects we discussed in the previous lesson, because they are reconstructed using multiple particle-flow candidates. After all candidates have been built from the tracks and energy deposits in CMS, they can be “clustered” using a variety of algorithms into composite objects called “jets”. Missing transverse energy clusters, in a sense, all candidates in the entire detector: it is the negative vector sum of the momentum of all candidates.

      In this lesson we will explore the basic utilities for jets and MET, how to identify jets that arise from interesting original particles such as bottom quarks, and possibly how to correct jets and MET for differences between data and simulation.

      Speakers: Farrah Simpson (Brown University (US)) , Julie Hogan (Brown University, Bethel University (US)) , Nikolas Pervan (Brown University (US))
    • 10:00 10:15
      Break 15m
    • 10:15 11:15
      Live Hands-on lesson: Pre-selection and skimming 1h

      In this lesson we will review the CMS data flow and summarize the selections that have been made up through the NanoAOD production. We will also go over how to produce your own set of NanoAOD files, though this will not be required to continue with the workshop.

      Speakers: Farrah Simpson (Brown University (US)) , Julie Hogan (Brown University, Bethel University (US)) , Nikolas Pervan (Brown University (US))
    • 11:15 11:45
      Live Hands-on lesson: Object ID and selection. 30m

      With physics object prepared and NanoAOD files created, we are ready to begin thinking about an actual physics analysis!

      In the previous exercises, you learned how to access and store object information from an AOD file and convert the AOD file to NanoAOD. The Events tree within the NanoAOD files contains all the derived information required for many searches or measurements. We will study a search for the Higgs boson in the tau tau decay channel – you can go back to the pre-exercises to find the published paper.

      Speakers: Farrah Simpson (Brown University (US)) , Julie Hogan (Brown University, Bethel University (US)) , Nikolas Pervan (Brown University (US))
    • 11:45 12:00
      Break 15m
    • 12:00 13:00
      Live Hands-on lesson: Plotting and interpretation 1h

      This exercise will walk you through the process of making some basic plots, once you have skimmed some data and produced ROOT files. We’ll make these plots using ROOT and calling ROOT from python using the PyROOT module. Note, that you could choose other approaches to making plots, such as using matplotlib, but the ROOT files need to be accessed with ROOT or uproot so we’ll stick with ROOT for now.

      Once we’ve made the plots, we’ll spend some time examining them and trying to interpret the distributions so that we can develop some intuition for how things look.

      Speaker: Matthew Bellis (Cornell University/Siena College (US))
    • 13:00 14:30
      Lunch 1h 30m
    • 14:30 15:15
      Async Demo: Luminosity and data quality. 45m

      Welcome. In this lesson you will:

      learn what luminosity is and why it’s important
      learn how luminosity is measured
      learn how to calculate luminosity

      Speaker: Thomas McCauley (University of Notre Dame (US))
    • 15:15 17:00
      Async Hands-on lesson: Efficiency Studies using the Tag and Probe Method 1h 45m
      Speakers: Allan Da Silva Jales (Universidade do Estado do Rio de Janeiro (BR)) , Thomas Gaehtgens