• meeting to discuss dataset access companion Python package
  • we would like a utility package for accessing the dataset
  • Philip has an ML library for Collide-2V
    • Utilities for accessing data and examples of tasks
    • oriented towards pytorch lightning
    • https://github.com/pploner/foundation_model_testing/tree/main
  • Can also take inspiration from ColliderML
    • https://opendatadetector.github.io/ColliderML/library/overview.html
    • https://github.com/OpenDataDetector/ColliderML
  • discussion about the dataset structure
    • 100s of TBs size
      • to be accessible from hugging face and EOS
    • dataset is jagged arrays
    • it is nested:
      • each row is a collision event
      • each column is a feature
        • there can be different numbers of entries in each row
  • we’ll need to provide a written specification