Participants:

Dirk, Christian, Rainer, Bernd, Luca, Luca, Tony, Ulrich, Sebastien, Evangelos, Raul, Ilija,Maciej, Zbigniew,Manuel, Domenico, Pedro,
Marek

Minutes & News:

- New alias for ssh on hadoop, details in TWiki
- Evangelos prepared a prototype example for Hadoop MapReduce jobs in Java, it will be presented in an AWG meeting in January
- Luca is preparing a Spark example for hadoop, also to be presented later
- TOTEM, in discussion with other experiments, has started a project to see if they can run root analysis on a hadoop cluster. This could be interesting.
- Dirk is figuring out how to access the hadoop cluster remotely on Mac (and Linux). There will be a HowTo in the TWiki/Mailing list

Data management analytics - topics in ATLAS:

- Some of the problems mentioned concerning access/use of the IT hadoop cluster might already be solved. Ilija will discuss this with Rainer.
- Tony pointed out that there is an overlap between the analysis goals of CMS and ATLAS. They will have a discussion to see where they can work together.

Impala tests with accelerator data:

- Dirk asked if Parquet as a data format is sufficiently standardized to use it in other frameworks than Impala. This should be the case.
- Parquet seems to work well for the presented use case, even though its features as a column store are not used. However, there was no in depth comparison to other storing formats.

How to Measure Job Performance: Testing correlation and covariance with R:

- Tony mentioned that the normalization of cpu/wall time could be usefull to predict runtime of CMS jobs better. Further discussion with Christian.
- Ulrich pointed out that the analysis showed, that the currently used scaling with cpu factors in lsf seems to be working reasonably well.