System performance modelling WG meeting

Europe/Zurich
31/S-028 (CERN)

31/S-028

CERN

30
Show room on map

Participants

  • Local: Andrea Sciabà, Markus Schulz, Domenico Giordano
  • Remote: Andrea Sartirana, Oxana Smirnova, Gareth Roy, Johannes Elmsheuser, Renaud Vernet, Davide Costanzo, David Lange, Catherine Biscarat, Graeme Stewart, Michel Jouvin, Andrea Valassi

Discussion on task list

We go through the task list.

The task about building a glossary of terms is agreed without discussion.

The task about listing the most important workloads is also approved; Graeme just warned that reconstruction jobs in ATLAS are rather different on data and MC because of the different amount of conditions data read.

A longer discussion on the definition of properties for workloads follows. Johannes stresses the importance of definining exactly memory usage (PSS vs RSS vs VMEM, etc.) and Oxana the distinction between maximum memory from the point of view of the user and of the site. Andrea V. points out that memory consumption is highly dependent on the number of threads/processes, so a study for different scenarios would be needed. Gareth adds that in this way one can determine the optimal number of threads. Graeme suggests to also look into memory access efficiency, e.g. cache miss rates. Andrea Sc. adds IPC (instructions per cycle) as another interesting metric.

Markus mentions that the work by Servesh and David S. would be highly relevant in this context, as they are studying CPU metrics and their time evolution for ATLAS jobs.

Andrea V. stresses the importance of having operative definitions of the metrics, or in other words, exactly explain how to calculate them. Andrea Sc. adds that there might be metrics that cannot be measured from the outside, but need to be measured by the experiment software itself.

Also the importance of network latency and IO-related metrics (iowait, bytes read and written, etc.) is stressed.

Concerning vectorisation, it may be interesting to measure the level of vectorisation in the software, but it is not a property of the workload.

Domenico proposes to also look at KV as a benchmark we can use to rescale CPU time measurements (as e.g. HS06 is known not to scale well for simulation), and on more CPU models. He volunteers for the packaging of experiment workflows. He asks if it is interesting to run them on VMs; Andrea Sc. thinks that it is better to start using bare metal, as the results will be more consistent and easy to interpret.

Johannes reminds us that we can learn a lot also looking at production jobs, which could be an additional task for the WG.

A new task is added, to provide packaged versions of the most important workloads, as already done by the HEPiX benchmarking working group.

On the task about having a simple resource calculation model, Davide suggests to implement it as a notebook rather than a spreadsheet, as it is much easier to understand and maintain.

People working on tasks can use dedicated Google documents for the work in progress and move the information to the twiki when it is more solidified.

There is interest for asking experiments to present the way they estimate their needs for HL-LHC, and have a presentation from the HEPiX benchmarking people on how test workloads are packaged.

In the next few days people are invited to add themselves as volunteers for one or more tasks in the list.

There are minutes attached to this event. Show them.
    • 17:00 17:50
      Discussion on the task list 50m
    • 17:50 18:00
      AOB 10m