CMS Big Data Science Project

Europe/Berlin
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
Description

FNAL room: Dark Side-WH6NW - Wilson Hall 2nd fl North East

CERN room: 513-R-68

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

If not possible, people can join the meeting by the phone, call-in numbers are here:

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

  • 10502145

## 180110 - Meeting

Attendance: Bo, Dominick, Luca, Vagg, Viktor, Matteo, Siew Yan

* topics
    * presentation at the end of the week

* news
    * Luca
    * Vagg 
        * before Christmas, received analysis code from Dominick
        * not blocked anymore
        * start scalability test instantly
        * 2012 open data
            * bad results for the previous Viktor code
            * new code works fine
        * small problem with xrootd-connector
            * didn't accept a single file
            * will be resolved soon
            * then will be put on public GitHub, include guidelines how to compile it
    * Viktor
        * try to send slides by tonight
        * will include slides from Vagg
        * need to change agenda names so that Viktor can submit slides
    * Matteo
        * CHEP conference: circulated the draft of abstract, was submitted to conference, waiting for conference acceptance
            * abstract includes both intel and analysis thrust
            * still need to understand if this is good for Andrew Melo
        * progress on analysis thrust, together with Siew Yan
            * new data tier from CMS: NanoAOD
            * goal is to have a data format that is flat, small and that can serve from 30-50% of analyses
            * avoids having to call CMSSW, but still a centrally produced data format
            * Siew Yan already read NanoAOD, test already performed, following Melo's analysis use case and adapt it
            * Padova group goal is to read NanoAOD
            * Centrally produced samples are appearing
            * Matteo will run the code everywhere, interested in portability experience
                * need to move NanoAOD around if we want to test at different sites
                * order of TB

* CERN openlab user meeting
    * will try to chat with Claudio

* xrootd-connector
    * not grid authenticated for now, on the list

* ROOT output discussion
    * motivation: for fits, we currently need ROOT, usually the combine package is used, relies on ROOFIT
    * ROOT output is currently not supported
    * Viktor: possible, but would take time (and not in the context of the deepest project)
    * Not a big problem, can still do everything for CHEP, but in the future we need the capability to interface with the tools of the collaboration
    * Have a discussion in the future with Viktor, JimP, Vagg and others 

* code
    * 10 TB tests, multiple tests
    * configuration of cluster will be sent to Intel, to use cofluent 
    * code currently has no support to run over multiple samples, can be implemented
    * data and MC, slightly different event content
    * for cluster tests, you want uniformity
    * Dominick will test the code on data and MC, will try to run on both samples
    * Vagg and Dominick will scale up

* access to both electron and muons in same ipython job
    * it's possible, but it's complicated
    * reason: SQL does not understand nested structures
    * python does not have this issue, but you have the problem of double serialization
    * Viktor and Dominick will write a purely python workflow to compare to the scale we have right now

* collect GitHub links with different code examples and basis from people
    * Viktor, Dominick and Vagg, Matteo and Siew Yan, Melo
    * send it to OLI, he will make a webpage

* Dominick had problems with all the cuts for the analysis
    * AOD files didn't have all the information easy available, because muon branches are not all trivial but references
    * Viktor: it is possible, somebody has to trace the CMSSW code and rewrite it in Scala
    * NANOAOD does not have this problem

There are minutes attached to this event. Show them.