Analytics WG meeting
Attending:
Dirk, Christian, Luca, Evangelos, Rainer, Bernd, Manuel, Jerome, Domenico, Sebastian, Tony, Raul, Maria, Valentin, Daniele
Minutes:
- TWiki is up, everybody should have a look and put some information about their data sources
- The experiments want to join our efforts, today we had Tony from CMS, next week we will have a representative of ATLAS
- Next week, we will have some talks about analysis on Hadoop, further talks are welcome
CMS Analytics:
- Dirk asked if the CMS data set would fit into the cluster. The dataset should be ~100GB, so no problem.
- The question of dealing with confidential/private was raised. Options:
- Anonymise the data on CMS side (string to id's) and share it
- Keep critical part of the data in a separate folder, with restricted access to CMS representatives (by unix user/group)
- Parallel installation for CMS, but sharing experience with Hadoop
- CMS will discuss and select the most appropriate way to go
LanDB Data in Hadoop:
- CS will add additional locality information (basically a resolution of the "Room" attribute for ease of use)
AWG Repository prototype:
- The access tool should stick to the minimal requirements for now:
- Show available data sets and schemata
- Extract time periods
- Simple selection on attributes (>,<,= etc.)
- "Typical" extraction formats
- For advanced analysis on the Hadoop cluster:
- Create technology overview with examples/tutorials
- In the end, complex analysis has to be implemented by the analyst
- ...but we should exchange experience within the group