Open (and Closed) Data in the Age of AI

Name: Open (and Closed) Data in the Age of AI
Start: 2026-04-23T09:00:00-05:00
End: 2026-04-24T15:00:00-05:00
Location: No location set

23 Apr 2026, 09:00 → 24 Apr 2026, 15:00 US/Central

Description

The aim of the workshop is to explore what it means to build “cross-experiment multi-modal” foundation models in a landscape where the relevant scientific data can be subject to a range of policies from “open data” to experiment-restricted (either proprietary or limited distribution/raw) data as well as the use of simulations in this context. Similarly, both the data and simulation come with a range of latency/embargo, “initial use vs reuse/reinterpretation” and experiment governance structures. The goal of this workshop is to explore the technical, policy and cyberinfrastructure questions that arise when pursuing such shared models.

Specific questions:

What does it actually mean in practice to build a “foundation model” across experiments with different detector designs, data formats, and physics goals?
What does it mean in practice to do pre-training on “diverse data” in a shared environment vs fine-tuning in a restricted (experiment) environment?
How is benchmarking of the models done? How are the models validated when issues arise spanning the pre-training and restricted fine-tuning?
What are the technical and cyberinfrastructure implications?
Who owns the resulting models and what are the implications given different experiment governance structures? (And data ownership by international collaborations?)
If initially trained on a set of current and archived data, how do these models evolve going forward as new data appears from new experiments/upgrades/detector configurations?
If industry is involved in parts of this process, how do we avoid issues related to vendor lock-in and/or retain the “public” expectation that underlies most of the government funding of fundamental science?

This event is sponsored in part by the National Science Foundation through grants OAC-2226378, OAC-2226379 and OAC-2226380 (FAIROS-HEP) Any opinions, findings, conclusions or recommendations expressed in this material are those of the developers and do not necessarily reflect the views of the National Science Foundation.

Thursday 23 April
- Thu 23 Apr
- Fri 24 Apr
- 09:00 → 09:15
  
  Introduction 15m
  
  Speakers: Peter Elmer (Princeton University (US)), Robert William Gardner Jr (University of Chicago (US))
- 09:15 → 09:30
  
  TREASURE Project 15m
  
  Speaker: Paolo Calafiura (Lawrence Berkeley National Lab. (US))
- 09:30 → 09:45
  
  American Science Cloud 15m
  
  Speaker: Oliver Gutsche (Fermi National Accelerator Lab. (US))
- 09:45 → 10:00
  
  ATLAS Open Data 15m
  
  Speaker: Zach Marshall (Lawrence Berkeley National Lab. (US))
- 10:00 → 10:15
  
  CMS Open Data 15m
  
  Speaker: Matthew Bellis (Cornell University/Siena College (US))
- 10:15 → 10:30
  
  Neutrinos Open Data 15m
  
  Speakers: Prof. Jianming Bian (University of California Irvine (US)), Prof. Jianming Bian (University of California, Irvine)
- 10:30 → 11:00
  
  Coffee Break 30m
- 11:00 → 11:15
  
  FM4NPP 15m
  
  Speaker: Shuhang Li
- 11:15 → 12:30
  
  Discussion 1h 15m
- 12:30 → 13:30
  
  Lunch 1h
- 13:30 → 15:00
  
  Discussion 1h 30m
- 15:00 → 15:30
  
  Coffee Break 30m
- 15:30 → 17:00
  
  Discussion 1h 30m
- 19:00 → 21:00
  
  Workshop Dinner 2h
Friday 24 April
- Thu 23 Apr
- Fri 24 Apr
- 09:00 → 10:30
  
  Discussion 1h 30m
- 10:30 → 11:00
  
  Coffee Break 30m
- 11:00 → 12:30
  
  Summary Discussion 1h 30m
- 12:30 → 13:30
  
  Lunch 1h
- 13:30 → 15:00
  
  Small Group Working Sessions/Discussions 1h 30m