CMS Big Data Science Project

Europe/Berlin
Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
Description

PPD/ Round Table-WH11SE - Wilson Hall 11th fl South East

CERM room:

 

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

If not possible, people can join the meeting by the phone, call-in numbers are here:

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

  • 10502145

## 170208 - Big Data Meeting

* Status reports
    * Saba
        * January: submitted a paper to a workshop: HP Big Data Computing (Spark on NERSC)
        * Submitted proposal for same use case on CORI II at NERSC. Was rejected, but NERSC team contacted Saba to use the use case to tune Spark on CORI II
            * Need to submit queries and HDF5 layout to NERSC next week
        * Plans to implement more use cases
            * longterm: other experiments
            * for now: more sophisticated analysis queries
        * Saba wants to submit to Spark Summit East, deadline will be March/April/May
        * Need more data ➜ Matteo will send data set lists for MINIAOD and help to get started copying more files
* Discussion about plans (following google doc linked to agenda)
    * comments from Matteo
        * reading ROOT from Spark directly very promising
        * potential to have plots directly out of any ROOT ntuple
            * schema is generated for input dynamically
            * plotting package developed by Jim last year can be called directly, don't have to write out ntuple-like  files
        * some things are missing
            * Victor is working on some libraries that are currently missing
            * On top of MINIAOD we run CMSSW code to recluster jets, etc
                * running CMSSW from python looks promising
    * comments from Luca
        * Victor is giving presentation in ROOT I/O workshop
            * tested on 1 TB from HDFS
        * in the next weeks, there will be some tuning of the code to directly access ROOT files from Spark
        * proposal to ask Victor for next week to repeat ROOT I/O talk
        * Reading directly from EOS into Spark is work in progress
            * 1st step is the copy to HDFS, this is working
            * We need to work on the 2nd step, reading from EOS directly

There are minutes attached to this event. Show them.