Statistical Analysis Software Meeting

Europe/Zurich
Jonas Rembser (CERN)
Zoom Meeting ID
62468501683
Host
Jonas Rembser
Useful links
Join via phone
Zoom URL

Statistical Analysis Software Meeting — Minutes

This is an AI-generated summary from the meeting transcript.

Jonas Rembser (chair), Simon Cello (presenter)


1. Main Talk: Re-discovery of the Higgs boson with the original ATLAS workspaces and HS3

Speaker: Simon Cello (PhD student, Dortmund, working with Carsten Burgard / ATLAS)

Motivation

  • New physics may appear as deviations from observables, so precision is essential. EFT reinterpretations are an important path forward.
  • Modern analyses are increasingly expected to publish full likelihoods, allowing external phenomenologists to reinterpret and combine results across experiments.
  • Simon's PhD goal: exploit publicly available full likelihoods to perform a novel EFT reinterpretation, including combinations across experiments. HS3 is the chosen format to enable this.

What is HS3?

  • HS3 = High Energy Physics Statistics Serialization Standard.
  • Developed to overcome limitations of pure HistFactory JSONs (e.g. binned-only datasets), HS3 is a richer, extensible JSON format.
  • Human-readable, manipulable, and not tied to ROOT/RooFit — encouraging external groups to publish models. The HL-LHC is also targeting HS3.

Validation strategy: round-tripping the ATLAS Higgs discovery workspaces

  • The ATLAS Higgs discovery workspaces were chosen because they are very complex and heterogeneous — a good stress test.
  • Round-trip: ROOT workspace → HS3 JSON (via RooJSONFactoryWSTool) → ROOT workspace again.
  • The conversion process surfaced several bugs in the current HS3 implementation in ROOT, which were fixed.

Results

  • Baseline (original workspaces): signal-strength scan agrees well with the published result (graphical overlay used because not all mass points had numeric values stored). The expected p-value also agrees, with one discrepancy at 370 GeV due to a workspace that is volatile to initial conditions.
  • Round-trip workspaces: signal strength matches the baseline almost perfectly (same 370 GeV outlier as in the baseline). The expected p-value shows a small offset between roughly 200–400 GeV, but the contour matches the baseline well.
  • The residual discrepancy is likely linked to the need to introduce new Asimov datasets (to handle bins with unexpected yields). This was needed for both the baseline and the round-tripped workspaces.

Conclusions

  • Minor inconsistencies remain, and a few workspaces require manual tweaks due to volatility, but only at masses well above the actual Higgs mass.
  • Signal strength and observed/expected significances agree very well with the published paper.
  • HS3 conversion appears to work correctly, and the format is flexible enough to capture the model's complexity.
  • Goal: release all ATLAS Higgs discovery workspaces in HS3 format together with a Jupyter notebook demonstrating how to reproduce the results.

Discussion

  • Open data timeline (Jonas → Carsten): Approval is in principle uncontroversial — it has so far been blocked by lack of person-power. With Simon's progress and full closure expected soon, an approval meeting is being scheduled, with a target of roughly 4 weeks.
  • XRooFit validation (Will): Will offered to collaborate with Simon to reproduce the minimisation in XRooFit, since XRooFit is what was used for the previous public notebook. Follow-up to happen on Mattermost. Simon noted he had used XRooFit previously but ran into issues; the validation plot shown was produced with the native ROOT minimiser, not QuickFit.
  • Single parameterised workspace (Will): Could one workspace, with a signal component parameterised in mass, replace the ~150 separate per-mass-point workspaces?
    • Carsten: For the Higgs discovery specifically this is hard, because the channel composition itself changes with the mass point — probably the worst-case application for this idea.
    • Alexander: Reframing helps — the "product" can be presented as the single workspace giving the paper's result, rather than the full mass scan, which makes the situation less problematic.
    • Jay: Independently of this analysis, a parameterised-signal workspace would be very attractive for SBI with binned templates, where one needs templates that are regenerated at fit time.
    • Will: A halfway solution could share background components across multiple RooSimultaneous objects in one workspace. Worth keeping in mind as a longer-term programme of work.
  • Workspace size (Jonas): ~5 MB per ROOT workspace, ~20 MB per JSON workspace, ~150 mass points. Alexander noted the JSONs should compress very well (likely to sub-GB total).
  • Fit timing (Jonas): The full p-value scan takes ~2 hours on 16 cores. QuickFit is roughly twice as fast (or more) than ROOT/RooFit. Will flagged that hyperparameter settings used by QuickFit need to be understood and translated for the XRooFit comparison.

2. Round-table / AOB

Jonas — RooONNXFunction progress

  • Fixed the JAX benchmarking issue flagged by Mohamed on Mattermost: jax.jit does not actually compile eagerly, so the previous numbers included compilation time.
  • With this corrected, batched inference performance for both PyTorch and JAX is fast, as expected. The slides from the previous meeting will be updated.
  • Working towards a complete and fair comparison for CHEP, though the main point of the work is the interface, not raw performance.

Will — removeRange / binning API in RooRealVar

  • The current removeRange method does not actually remove the named range; it just resets its min/max to ±∞, which is confusing (hasRange still returns true afterwards).
  • Proposal: add a new removeBinning method that genuinely deletes the underlying binning object (which is what represents a named range). Jonas agreed.
  • Second proposal: add an extra bool to setRange to choose between shared and unshared binnings. By default, ranges are shared across clones of a RooRealVar, which is sometimes undesirable.
    • Jonas is open to it but wants to first check with Stefan Hageboeck about the original rationale (possibly memory-related). Jonas will follow up.
  • Will to open a merge request implementing removeBinning (assuming it does not already exist).

Carsten — New group member

  • Stefan Albrecht, postdoc in Hamburg, is joining Carsten's group this week as part of the DEMOS project, working on statistics software development.
  • He has already been added to the relevant Mattermost channels (ROOT / RooFit) and will join future meetings.
  • His first task will be to complete the HS3 interface for Combine, which Carsten had started but paused after the ATLAS hackathon.

3. Next meetings

  • Next meeting: Wednesday 30 April 2026 — strict two-week cadence; meetings falling on holidays will simply be skipped to keep the rhythm. Jay Sandesara to present an update on HistFactory v2 / pyhf side. Jonas will use this as a forcing function to make progress on the ROOT side.
  • Future: an SBI presentation would be welcome. Contributions for slots after Jay's talk are needed — please get in touch with Jonas.
There are minutes attached to this event. Show them.
    • 15:00 15:05
      News and updates 5m
    • 15:05 15:30
      The 2012 ATLAS Higgs Discovery Workspaces in HS3 25m
      Speaker: Simon Cello (Technische Universitaet Dortmund (DE))
    • 15:30 16:00
      Round table discussion 30m