EP R&D Software Working Group Meeting

Europe/Zurich
Vidyo

Vidyo

 

Software R&D Working Meeting Minutes

Introduction

  • Hardware

    • Would like to buy a box also for reconstruction
    • Would the spec for the simulation box meet these needs?
      • Hadn’t foreseen GPU, but would be useful strategically and for HGCAL
      • At the moment the HGCAL have their needs covered internally via CMS resources
    • ACTION: Include Andi, Moritz, Marco and Felice in the discussion with IT
    • Do we want Intel CPUs in the suite of R&D machines?
      • Probably these will come in the Analysis machine
  • Next meeting

    • Agreed to cover HGCAL reconstruction in June
    • Will decide on the date soon

Analysis Systems

  • DAOS is an Intel storage system based to replace the cluster filesystems in data centres
    • SSD based, so the high performance part of the storage heirarchy
    • Can emulate a filesystem, but for highest performance use needs to be addressed as an object store
  • Object granularity
    • To early to say what will be best (pages or clusters), or if one-size-fits-all is possible
  • How to interface to the data management layer?
    • Will add metadata to what is stored and this will have a namespace associated with it
    • Too early to say exactly what the interface to the data management layer would be (and out of scope for us to tackle it right now)
      • Will need to expose things at the correct level of granularity (unlikely the DM system wants to know about 10kB pages)
    • Do plan to get away from the file notion as central, from the analysis side
    • RNTuples are stored inside the current TFile objects, but this is a lightweight bootstrap
  • For Xrootd the XCache layer would be good to look at
    • Contact Andy Hanusheveky
  • Snapshots of intermedate analysis
    • Suggested to enable this behind the scenes (user doesn’t need to know)
    • Where to store these results?
      • Local SSD: very fast, but then not accessible to the rest of the analysis nodes (workload scheduling problem)
      • ClusterFS: accessible to the whole cluster, but may be performance limited
    • Spark has done interesting work on this (resilent datasets)
    • Parsl does this caching by hashing the Python code and the calling parameters, storing the intermediate results in files
      • Separation of caching input data from the processed outputs would be advantageous
There are minutes attached to this event. Show them.