Markus presents his ideas on the new goals and timeline for the working group. From the discussion:

We need a one-stop page with the instructions for all reference workloads. Andrea had done something of the sort last year for the summer students, he can update it.

We also need a catalogue of results of our studies on the workloads with PrMon and Trident. They should be archived. Somebody should work on that.

Domenico says that the results, to be embedded in a model, must contain a definition of the test environment. For benchmarking we collect a lot of metadata on the machine; we can use the same infrastructure. For a more detailed analysis like with Trident, being more complex, we must decide what to use, how to summarise it.

As a repository Domenico suggests HDFS or CERNBox, not Gitlab, which should be used only for code.

Markus: we need to understand how the data (e.g. popularity data) is used and shift our centre of gravity. For example DOMA makes assumptions on how future infrastructures will look like and we should make statements about the cost of these scenarios. Data popularity is an open-ended story: we have data from data management, like Rucio, but we don't have access profiles from the sites. A lot of things happen outside the official tools (e.g. in ATLAS). We need to get an idea of what these things are to model the computing model load. This can be done only at a few sites.

For site cost modeling, we need data from the HEPiX Technology watch WG and combine it with TCO calculations. Some of the costs are not changing much (building, people, ...). For example at CERN they are thinking about changing the lifetime of hardware and replace it less often. Lot of work is needed to get this time evolution right.

CERN is planning to change the hardware inside the trays, as some components almost never fail. It should not be too time-consuming. Progress in power efficiency of CPUs has considerably slowed down.

Concerning resource requirement, Johannes mentions that ATLAS has now updated estimations: the CPU and disk models have been updated.

Markus mentions the modeling tool developed at KIT to model their computing farm: this could be of a more general interest and the people who did it should be invited to give a presentation.

ACTION: everyone should give a look at the document and add their own ideas and comments!

Markus showed Renaud's slides on th TCO calculation. From the discussion:

About his own spreadsheet, Markus mentions that he made the manpower calculations more realistic. Sites are again invited to try it with their own data.

He mentions that the "holistic" approach of Renaud of starting from the budget of the data centre would fail for CERN IT because it takes care of much more than providing the resources.

Markus proposes the idea of using the workshop to push the idea that sites should use this tool.

Gareth mentions that for them electricity is free; in that case one could use a typical cost figure for the UK... or even zero as a matter of fact.

Markus argues that including the cost of user support in the cost of resources is dangerous because puts the site as a disadvantage, compared to e.g. cloud providers. We need to decide exactly what we refer to when we talk about manpower costs.