Participants:

Dirk, Christian, Rainer, Bernd, Luca, Luca, Tony, Ulrich, Sebastien, Evangelos, Raul, Ilija,Maciej, Zbigniew,Manuel, Domenico, Pedro,
Marek

Minutes & News:

 - New alias for ssh on hadoop, details in TWiki
 - Evangelos prepared a prototype example for Hadoop MapReduce jobs in Java, it will be presented in an AWG meeting in January
 - Luca is preparing a Spark example for hadoop, also to be presented later
 - TOTEM, in discussion with other experiments, has started a project to see if they can run root analysis on a hadoop cluster. This could be interesting.
 - Dirk is figuring out how to access the hadoop cluster remotely on Mac (and Linux). There will be a HowTo in the TWiki/Mailing list

Data management analytics - topics in ATLAS:

 - Some of the problems mentioned concerning access/use of the IT hadoop cluster might already be solved. Ilija will discuss this with Rainer.
 - Tony pointed out that there is an overlap between the analysis goals of CMS and ATLAS. They will have a discussion to see where they can work together.

Impala tests with accelerator data:

 - Dirk asked if Parquet as a data format is sufficiently standardized to use it in other frameworks than Impala. This should be the case.
 - Parquet seems to work well for the presented use case, even though its features as a column store are not used. However, there was no in depth comparison to other storing formats.


How to Measure Job Performance: Testing correlation and covariance with R:

 - Tony mentioned that the normalization of cpu/wall time could be usefull to predict runtime of CMS jobs better. Further discussion with Christian.
 - Ulrich pointed out that the analysis showed, that the currently used scaling with cpu factors in lsf seems to be working reasonably well.