System performance modelling WG kick-off meeting

Name: System performance modelling WG kick-off meeting
Start: 2017-11-09T17:00:00+01:00
End: 2017-11-09T18:00:00+01:00
Location: CERN

Thursday 9 Nov 2017, 17:00 → 18:00 Europe/Zurich

31/S-028 (CERN)

31/S-028

CERN

Show room on map

Hide

Participants

Local: Andrea Sciabà, Markus Schulz, Domenico Giordano, Andrea Valassi, David Lange, Daniele Bonacorsi
Remote: Gareth Roy, Graeme Stewart, Catherine Biscarat, Helge Meinhard, Michel Jouvin, Johannes Elmsheuser,
Concezio Bozzi, Eric Fede, Frank Würthwein, Yves Kemp

Discussion on mandate and goals

Markus goes through his draft for the motivations, the mandate and the goals of the WG.
The agenda has a link to a Google document that can be edited and commented on.

David remarks that we might end up with a suite of models, not just one.

Graeme suggests to explicitly say that we need to better understand tradeoffs (e.g. between disk and CPU) in workloads ans software.

The "placement" of the WG is discussed. Although it is defined as a WLCG working group, it is closely linked also to HEPiX and the HSF and it should have reporting channels also in those contexts.

It is agreed that the scope may naturally extend outside of WLCG (for example it might be of interest for Belle 2 or even SKA), but the initial focus will be on LHC experiments.

Frank asks whether the driving force behind the creation of the WG is the current prediction of a mismatch between the resources needed for HL-LHC and the available budget. Markus explains that it is also useful to understand how we use resources today and what is the best match between resources and workloads, which is very relevant also today.

Frank (speaking for himself) raises a point about the fact that in the past experiments have been reluctant to disclose detailed information about how they calculate their resource requirements. The WG should clarify how we are going to deal with this aspect and to what extend predictions are going to be made public.

Helge thinks that it should be clarified soon enough and we also need to understand why some experiments are not willing to show some information.

According to Markus, we do not actually need to know all the details, but we need to know first of all how many resources are needed to process a given amount of events for the different job types. Ideally all experiments should agree on the same metrics and the same model principles, but without the need to disclose all details.

Johannes adds that this is not a problem for ATLAS, which always was very open in discussing these issues, for example in the working group on worflow performance efficiency, where also IT people participate.

Yves is worried that when evaluating costs for different sites comparisons are improperly done. Markus answers that the point is to help sites to plan their purchases, not to assess the cost of any given site. Helge points out though that if we make easy to do the cost calculation, somebody will do it by himself and possibly draw the wrong conclusions.

Andrea V. reminds that the ultimate goal is to get more computing with the same budget and Michel adds that it is more important to calculate differences in costs for different solutions rather than the absolute cost value.

There is a discussion about metrics, that should be explicitly mentioned in the short term goals. The idea is that any workload should be represented in a model using the values of a set of agreed metrics.
It is also understood that our predictions for how workloads will look like at HL-LHC are extremely uncertain.
Having a predictive model can also help to inform the evolution and tell us what should we focus on in terms of optimisation work.

Stemming from some confusion in how some terms are used (workflow vs workload) it is agreed that we should add a glossary to our twiki to agree on the terminology.

Frank asks if we will also measure the performance impact today of using different compiler flags. Markus thinks that at the beginning we should not get at this level of detail and start with something simple.

Michel warns that we should not give the impression that we aim at making a model than can be just run to find the definitive answer to all these questions.

Work organisation

Andrea S. presents a possible organisation of the work in terms of sub-groups, which stirs some discussion.

Helge argues that the proposed subgroups make a lot of sense but are heavily interrelated, and he proposes instead to work together on the most urgent things.

Frank suggests to start with some fact-finding and ask the experiments to provide some information on how they do resource estimation, in particular how they estimate needs for HL-LHC. This should be "politically" acceptable because anyway all current estimations are necessarily very crude.
Similarly, experiments could provide some insight into what their workloads do.

There is agreement on this idea and Andrea V. adds that we should initially concentrate on "low-level" elements, like workloads and metrics.

Helge proposes to have another meeting next week to conclude the discussion on how to get organised.
The frequency and time of subsequent meetings is not yet decided, but probably it will be less than one meeting per week and more than one meeting per month.

There are minutes attached to this event. Show them.

- 17:00 → 17:25
  
  Mandate and goals discussion 25m
- 17:25 → 17:50
  
  WG organisation 25m
- 17:50 → 18:00
  
  AOB 10m

Choose timezone