- meeting to discuss dataset access companion Python package
- we would like a utility package for accessing the dataset
- Philip has an ML library for Collide-2V
- Can also take inspiration from ColliderML
- discussion about the dataset structure
- 100s of TBs size
- to be accessible from hugging face and EOS
- dataset is jagged arrays
- it is nested:
- each row is a collision event
- each column is a feature
- there can be different numbers of entries in each row
- we’ll need to provide a written specification