Indico celebrates its 20th anniversary! Check our blog post for more information!

Analytics WG meeting

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map

Participants:

Dirk, Christian, Rainer, Bernd, Luca, Luca, Tony, Ulrich, Sebastien, Evangelos, Raul, Ilija,Maciej, Zbigniew,Manuel, Domenico, Pedro,
Marek

Minutes & News:

 - New alias for ssh on hadoop, details in TWiki
 - Evangelos prepared a prototype example for Hadoop MapReduce jobs in Java, it will be presented in an AWG meeting in January
 - Luca is preparing a Spark example for hadoop, also to be presented later
 - TOTEM, in discussion with other experiments, has started a project to see if they can run root analysis on a hadoop cluster. This could be interesting.
 - Dirk is figuring out how to access the hadoop cluster remotely on Mac (and Linux). There will be a HowTo in the TWiki/Mailing list

Data management analytics - topics in ATLAS:

 - Some of the problems mentioned concerning access/use of the IT hadoop cluster might already be solved. Ilija will discuss this with Rainer.
 - Tony pointed out that there is an overlap between the analysis goals of CMS and ATLAS. They will have a discussion to see where they can work together.

Impala tests with accelerator data:

 - Dirk asked if Parquet as a data format is sufficiently standardized to use it in other frameworks than Impala. This should be the case.
 - Parquet seems to work well for the presented use case, even though its features as a column store are not used. However, there was no in depth comparison to other storing formats.


How to Measure Job Performance: Testing correlation and covariance with R:

 - Tony mentioned that the normalization of cpu/wall time could be usefull to predict runtime of CMS jobs better. Further discussion with Christian.
 - Ulrich pointed out that the analysis showed, that the currently used scaling with cpu factors in lsf seems to be working reasonably well.

There are minutes attached to this event. Show them.
    • 14:00 14:05
      Minutes & News 5m
      Speaker: Dirk Duellmann (CERN)
    • 14:05 14:25
      Data management analytics - topics in ATLAS 20m
      Speaker: Ilija Vukotic (University of Chicago (US))
      Slides
    • 14:45 15:05
      How to Measure Job Performance: Testing correlation and covariance with R 20m
      Speaker: Christian Nieke (Brunswick Technical University (DE))
      document