CMS Big Data Science Project

Name: CMS Big Data Science Project
Start: 2016-07-06T10:00:00-05:00
End: 2016-07-06T11:00:00-05:00
Location: No location set

Wednesday 6 Jul 2016, 10:00 → 11:00 US/Central

Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))

Description

FNAL room: DIR/ Snake Pit-WH2NE - Wilson Hall 2nd fl North East

CERM room:

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

https://account.cern.ch/account/Externals/

If not possible, people can join the meeting by the phone, call-in numbers are here:

http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

10502145

Hide

# Meeting 160706

attendance: Zbigniew, Luca, Alexey, Illa, Cristina, Bo, OLI, Matteo

agenda: <https://indico.cern.ch/event/549221/>

## News

* CHEP abstract was accepted as oral presentation (12+3)
* August 1st to August 5th: Alexeys will come to FNAL.

## Notes

* quick introduction for Zbigniew and Luca
* discussion about AVRO, JSON and parquet
* Action item for OLI: sent introduction information to Zbigniew and Luca
* Luca and Zbigniew:
* running CERN-IT hadoop service
* also in openlab
* fellow is coming in Fall to help with this another use cases
* Princeton Workflow:
* last week met with JimP to finalize Princeton workflow
1. filtering step is 95% complete, just need to check
* scale factors is not yet done
* plan to convert all scale factors into JSON and use this as a common input format
* b-tag scale factors is a csv, easy to convert into JSON
2. save all information in parquet file, created class and functions
* need to complete this step
3. histogramming, we’re going to use histogrammar (is in very good shape)
* did some preliminary plots
* new histogrammer version has stack plots
* intensify interaction with Alexey about histogramming
* list of things to complete
1. complete the list of things that we are saving in the parquet format
2. complete scale factor treatment
3. implement plots using histogrammer
* goal:
* didn’t reach goal to have full scale test for this meeting
* looks good for in 2 weeks
* histogrammer
* from the beginning, it is design to aggregate data into bins using Spark actions
* wrote tutorials, Cristina is following them
* asking about default error bars, talking with JimP, Alexey doing some more development
* started working profiling and optimization of histogrammer
* JimP used numpy
* Alexey used intel python ➜ 10% improvement on top of optimized version
* lets post information in slack channels so that everyone can read them

There are minutes attached to this event. Show them.

The agenda of this meeting is empty