System performance modelling WG meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
  • Local: Andrea Sciabà, Markus Schulz
  • Remote: Chris Hollowell, David Lange, Jan Iven, Michele Michelotto, Oxana Smirnova, Shigeki Misawa, Gareth Row, Renaud Vernet

People reported their impressions on the cost model session at the HOW workshop.

David thinks the most interesting topic was the site cost estimation also due to the diverse audience. Renaud got a similar feedback; he also was promised information to the T1 survey from RAL. Andrea also thinks that the session was well received, although the input in terms of new goals for the WG was less than hoped for. Markus points out that following the discussions from remote was next to impossible due to sound issues.

The next discussion was on how to account for non-CPU resources, as we will be charged to investigate this topic by the WLCG MB. This arises from the fact that more and more often funding agencies are inclined to pledge resources in forms other than CPU-only clusters. How to give value to GPU resources, for example? We need a way to measure the usefulness of these resources, if they are going to be part of the pledges.

At the moment we do not have an answer to these questions: we should form a sub-group to do some brainstorming.

Gareth asks what percentage of experiment workloads would use accelerators. As of today, ALICE has a version of the TPC tracking that runs of GPU, and LHCb also has GPU versions of their reconstruction. CMS is not using any GPUs in production and David asks what timescale should we consider. Markus says that the focus is clearly Run4. David says that the experiments will use what they can get, one cannot define a priori a fraction of the work being done on GPUs. Gareth argues that for capacity planning some number would be needed. Markus replies that this will change a lot over time, also benchmarks will be completely different after 10 years.

Michele stresses that we need to measure the computing power of data centres using GPUs. Domenico has already a Docker version of some GPU-based workloads, but not yet a real benchmark, which will critically depend on how the experiments will use GPUs.

Shigeki mentions that SPEC has a GPU benchmark that can be used to compare e.g. AMD and Nvidia GPUs, but the real question is which fraction of flops need to be provided by GPUs. Markus adds that people from the US should be more motivated because funding agencies are pushing more strongly towards accelerator-based HPC.

Oxana asks if there is any funding agency willing to fund development on GPUs. David says that this is the case in the US, but the issue is the scale. Markus adds that Pete Elmer's software institute is being funded for that, DESY is kickstarting something similar and CERN is planning it. This will be a crucial aspect in the future. David supports the idea of having workload-driven benchmarks.

To conclude, people interested in actively participating to a new "accelerator resources sub-working group" can send an email to the list and propose their ideas. Markus will provide a problem statement and schedule a brainstorming session. Chris volunteers and will obtain some ML ATLAS workloads running at BNL to be used as reference workloads.

 

There are minutes attached to this event. Show them.
    • 1
      Discussion