* How to reach the goal of recurrent mass production with timely schedules

* Run-based approach for data and MC processing - interaction with calibration procedure

* New tools

Machine Learning in Data Processing

Run-based approach (proposal being written by Valentin)

main issue comes from the large number of files
also to allow an improvement in efficiency
run-wise means:
- data processing is a set of configuration
- produce for a given run all the files and merge them at a certain point
  - most likely at trigger level
  - merge by type: 1 file per run for data, 1 for atmospheric muon, 1 (or 2 if 2 different light propagators are used) neutrinos
  - checks are done before merging
  - if things fail, not merging is done and step is rerun
- query before the simulation all the inputs
  - raw data
  - calibration - which requires its own processing chain?
- Take care of how event weighting and headers are treated
- irods upload at the final step of fully-successful runs
Is it possible to merge runs instead of files per run
- it's a design choice. to be addressed when decisions are made.
Understand how bookkeeping should be done
incorporate all tests - to be agreed between DPDQ, Comp&Soft, Simulation and Analysis WG
Allow for at least 2 wasy
- GRID (DIRAC?)
- Local (batch on cluster, nextflow?)

Action point

- Valentin is writing a proposal. Discuss it when ready. Comp&Soft Workshop to think about it, too.