CMS Big Data Science Project

Europe/Berlin
600/R-002 (CERN)

600/R-002

CERN

15
Show room on map
Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
Description

FNAL room: Dark Side-WH6NW - Wilson Hall 2nd fl North East

CERN room: 600-R-002

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

If not possible, people can join the meeting by the phone, call-in numbers are here:

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

  • 10502145

* attendance: Luca, Vagg, Viktor, Andrew, Saba, JimP, Matteo, OLI

* news
    * ACAT talk feedback and proceedings
    * FNAL tutorial
        * Andrew Melo: tutorial for Spark in HEP Nov. 29 at Fermilab
            * targeted towards CMS users, introduction, simple analysis
            * ~3 hours 
            * using Vanderbilt resources
            * input: ROOT files on HDFS
            * using histogrammar & matplotlib
            * there will be recordings
    * CERN:
        * 28-29 November: tutorial/course
    * LHCb
        * Luca did tutorial on LHCb open data
        * Luca talked to Stefan Roser
        * Luca suggested a physicist to give a talk ➜ Matteo will give the talk early November
    * Another contact: TOTEM
        * using spark on a supercomputer already
        * will talk when they come to CERN

* Updates/discussion
    * CHEP abstracts
        * What would we like to achieve by CHEP (July 2018)
            * 1 PB test: abstract: Vagg
            * full analysis walkthrough: Andrew/Matteo
            * JimP: query systems
            * abstracts due: end of December
    * CMS publication committee will update procedures to publish technical results with CMS data (making it unnecessary to use open data)
    * Saba
        * baseline implementation of Spark-stuff in HDF5, MPI and python on NERSC (supercomputer center)
            * input data is small
            * overall performance with MPI and Lustre was much better than what we saw with Spark
            * need a dataset with couple of TB to make better statements
            * maybe the root to numpy library used by Igor to convert data into NoSQL database schemas in batch mode could be used here
                * batch more
                * or reader comparison to HDF5
                * Saba and JimP and Igor talk
        * in one of the next meetings we could talk about SciDAC-4 big data project

* AOB
    * CMS open data was used in publication by Theorists, they converted into ASCII!!!

There are minutes attached to this event. Show them.
    • 16:00 16:10
      News 10m
      Speakers: Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
    • 16:15 16:40
      Discussion 25m