Introduction
-
Hardware
- Would like to buy a box also for reconstruction
- Would the spec for the simulation box meet these needs?
- Hadn’t foreseen GPU, but would be useful strategically and for HGCAL
- At the moment the HGCAL have their needs covered internally via CMS resources
- ACTION: Include Andi, Moritz, Marco and Felice in the discussion with IT
- Do we want Intel CPUs in the suite of R&D machines?
- Probably these will come in the Analysis machine
-
Next meeting
- Agreed to cover HGCAL reconstruction in June
- Will decide on the date soon
Analysis Systems
- DAOS is an Intel storage system based to replace the cluster filesystems in data centres
- SSD based, so the high performance part of the storage heirarchy
- Can emulate a filesystem, but for highest performance use needs to be addressed as an object store
- Object granularity
- To early to say what will be best (pages or clusters), or if one-size-fits-all is possible
- How to interface to the data management layer?
- Will add metadata to what is stored and this will have a namespace associated with it
- Too early to say exactly what the interface to the data management layer would be (and out of scope for us to tackle it right now)
- Will need to expose things at the correct level of granularity (unlikely the DM system wants to know about 10kB pages)
- Do plan to get away from the file notion as central, from the analysis side
- RNTuples are stored inside the current TFile objects, but this is a lightweight bootstrap
- For Xrootd the XCache layer would be good to look at
- Snapshots of intermedate analysis
- Suggested to enable this behind the scenes (user doesn’t need to know)
- Where to store these results?
- Local SSD: very fast, but then not accessible to the rest of the analysis nodes (workload scheduling problem)
- ClusterFS: accessible to the whole cluster, but may be performance limited
- Spark has done interesting work on this (resilent datasets)
- Parsl does this caching by hashing the Python code and the calling parameters, storing the intermediate results in files
- Separation of caching input data from the processed outputs would be advantageous
There are minutes attached to this event.
Show them.