7th Open Science Practitioners Forum

Europe/Zurich
Surveys
Open Science Community Engagement Feedback
Zoom Meeting ID
67317341543
Host
Merten Dahlkemper
Useful links
Join via phone
Zoom URL

Minutes

Welcome

Discussion

  • Zach about Open Infrastructure meetinfgs: Hepdata, Swan, binder, collab etc (cloud infrastructures that support Open Data), EOSC infrastructure. How broad should this be?
    • Could be split: Inward looking vs outward looking, or documents (INSPIRE, cds, indico) vs. data (hepdata etc)
  • Giacomo endorses the idea to have a joint event with OSPO

Community Engagement

  • New Open Science website is currently being developed and to be published soon: Beta-testers needed
  • Community Ambassador network being launched: Get in touch if you want to share your excitement for Open Science
  • In-person OSPF should become a regular event: Feedback needed as to who would like to join
  • Indico survey for feedback

Discussion

  • ATLAS can share a lot of stories, but it should be as clear as possible what the ask is
  • For training it needs to be known who the target audience is

ATLAS Open Data

  • Full chain of tools for Open Data from no-code solutions to OD for research
  • Collecting projects on how OD is used
  • Regional usage patterns monitored via monit-grafana: https://monit-grafana.cern.ch/d/da06d76c-24f0-4d23-b51e-da08d36c4ece/welcome?orgId=93
  • In 2026 US use went down, and Brazil went up, also Africa got some uptake
  • OD Tutorial (https://indico.cern.ch/event/1564767/): tried to cover all audiences, won't be repeated, rather online tutorials for specific audiences in 2026
  • Big takeaway: We need to understand the audience
  • Upcoming white paper on event generation data
  • proof-of-concept: Agentic workflow to easily generate output from the Open Data from natural language

Discussion

  • How do you collect the materials people use
    • Reaching out to individuals, word of mouth, googling
    • Can be tricky to track, as often data is only downloaded once and then used regularly in education
  • What about the Masterclasses?
    • Masterclasses are very well known and widely used, so it's hard to replace.
    • Needs more discussions with IPPOG, maybe will be partially replaced
  • Why using atlasopenmagic and not something from the open data portal?
    • Long term vision would be to have something universal for everyone

CMS Open Data

  • It was clear the data needs to be preserved, but who would use open data?
  • First use case by theorists: Had significant challenges with the file format
  • Eventually had the first OD workshop for theorists in 2020
  • So far in total 6 workshops: https://cms-opendata-guide.web.cern.ch/cmsOpenData/workshops/
  • Ratio participants to registrants is relatively low
  • Workshops so far have received good feedback (>90% recommendation rate)
  • Data format has now changed to NANOAOD from AOD in the beginning, which does not require CMS specific software
  • Next workshop: 28-30 July 2026 at University of Notre Dame, USA, focus on educational use-cases targeted to teachers

Discussion

  • In ATLAS also issues with no-shows, maybe worth considering a small participation fee so people are committed
  • What about the drop-out rate over the week as material gets harder
    • Decline was not super high
    • Had large registration numbers in the beginning, but then many didn't show up. However, actual participation seemed to be quite constant
    • Some wanted to get a certificate for university. For this it should be necessary to go through exercises
  • Which datasets do you use in the CMS workshops? And do participants actually understand an actual analysis?
    • Hopefully the students should understand the whole analysis, at least that's what the material is presenting
    • Support is there via the OD Forum

LHCb

  • Ntupling Service is now released since February, no more need to know LHCb tupling software
  • Ntuples are created on Grid and can be downloaded by user
  • Access to 4 PB of Run 1 & 2 data
  • Needs some guidance as first-time user: https://lhcb-opendata-guide.web.cern.ch/ntupling-service/
  • Currently working on implementing more examples, to be published by May
  • Only 2 people actively working on Open Data in LHCb
  • Several presentations at conferences etc.
  • Answering questions from users
  • So far 14 requests, various use cases (CP asymmetry searches, fitting algorthms, educational/learning root)
  • So far 4 TB od OD has been produced
  • Will organize future LHCb OD events

Discussion

  • What's the background of the people placing the requests?
    • Mostly either theorists or high school students
  • Are there restrictions for LHCb members to use the OD? It might be interesting to create educational datasets
    • For educational datasets, this is used also by LHCb members; for physics publications it's not allowed to use OD for LHCb members
  • How is this advertised?
    • On the outreach website and conferences; not via Social Media
    • Could be worth putting in physics talks

ALICE 

  • Data format has been changed after run 2 to AO2D, so the run 1 and run 2 data has to be converted into this format to be shared as OD
  • currently old format data (7.6 TB) still in OD Portal, but probably cannot be used anymore
  • 2015 Pb-Pb data (62 TB) has been released as educational data in March
  • In the next years much more data is being released to hit the release targets, expect to upload the converted run 1 data by the end of the year
  • O2OpenAccess is the software repository for ALICE Software: https://github.com/AliceO2Group/O2OpenAccess
  • Software is the same software for internal use and open data
There are minutes attached to this event. Show them.
    • 16:00 16:05
      Welcome 5m
      Speaker: Clemens Lange (Paul Scherrer Institute (CH))
    • 16:05 16:35
      Open Science Community Engagement 30m

      Merten presents the new Community Hub for Open Science and the idea of a community ambassador network.
      15 minutes presentation + 15 minutes discussion

      Speaker: Dr Merten Dahlkemper (CERN)
    • 16:35 16:55
      ATLAS Open Data Communication 20m

      15 minutes presentation + 5 minutes discussion

      Speaker: Zach Marshall (Lawrence Berkeley National Lab. (US))
    • 16:55 17:15
      CMS Open Data Communication 20m

      15 minutes presentation + 5 minutes discussion

      Speakers: Matthew Bellis (Cornell University/Siena College (US)), Thomas McCauley (University of Notre Dame (US))
    • 17:15 17:35
      LHCb Open Data Communication 20m

      Developing guidelines for the new Ntupling service
      15 minutes presentation + 5 minutes discussion

      Speakers: Dillon Fitzgerald (University of Michigan (US)), Piet Nogga (University of Bonn (DE))
    • 17:35 17:55
      ALICE Open Data Communication 20m
      Speaker: Dr Adrian Sevcenco (Institute of Space Science subsidiary of INFLPR (RO))
    • 17:55 18:00
      Wrap up 5m
      Speaker: Clemens Lange (Paul Scherrer Institute (CH))