CMS Big Data Science Project

Name: CMS Big Data Science Project
Start: 2016-03-02T10:00:00-06:00
End: 2016-03-02T11:00:00-06:00
Location: No location set

Wednesday 2 Mar 2016, 10:00 → 11:00 US/Central

Description

Fermilab room: PPD/ Quarium-WH8SW - Wilson Hall 8th fl South West

CERM room:

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

If not possible, people can join the meeting by the phone, call-in numbers are here:

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

Hide

attendance:
- Matteo Cremonesi (FNAL), Cristina Mantilla (FNAL/Johns Hopkins), Saba Sehrish (FNAL), Jim Kowalkowski (FNAL), Jim Pivarski (Princeton), Alexey Svyatkovskiy (Princeton), Bo Jayalatika (FNAL), Maria Girone (CERN/openlab), Ian Fisk (Simons Foundation), Volker Tresp (Siemens), Tobias Enrich (LMU), Jin Chang (FNAL), Ruth Pordes (FNAL)

Many thanks to JimK for writing notes!

first milestone in 4 weeks: load BACON ntuples in hadoop+spark and produce a plot

Overall goal and time schedule
- Realize CMS analysis use case in industry big data technology
- Document comparison to traditional analysis using HEP specific ROOT framework in write-up by Fall 2016
- Start with hadoop+spark, when complete, possibility to branch out

testing platforms
- Alexey has 10 node testing cluster at Princeton and will give access
- Ian is planning to have a test setup at Simons in New York and will also provide access
- Matteo, Cristina to figure out together with Alexey and Ian to transfer larger quantities of BACON ntuple files to Princeton and New York

Milestones and meeting time schedule
- Meeting every two weeks in this time slot, Wednesday’s at 10 AM CST, 5 PM CET
  - Next Meeting March 16
- first milestone in 4 weeks: load BACON ntuples in hadoop+spark and produce a plot
- everyone: think about further milestones and parts of the project that needs to be accomplished by Fall 2016

technical discussion
- discussion about content of BACON ntuples (flat or flat/flat) ➜ answer is flat (simple structure of classes)
- discussion of loading data from ROOT files or from pre-converted data in HDFS
- discussion about analysis in python or scala ➜ will start with python for interactive part, scala will be used for slimming/skiming

There are minutes attached to this event. Show them.

The agenda of this meeting is empty