CMS Big Data Science Project

Europe/Berlin
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
Description

FNAL: WH6NW

CERN: 600-R-001

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

If not possible, people can join the meeting by the phone, call-in numbers are here:

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

  • 10502145

News

Notes

attendance: Bo, Illia, Marco, Luca, Kacper, Viktor, Paul, Matteo, JimP, OLI

* news
    * Marco Zanetti: introduction
        * Padova group, has a group of computer scientist used to work on HEP
          topics
            * convert these resources into something related to big data and
              data science in general
        * started looking at what was on the market, talked already with DIANA
          team
            * played with whatever was there
        * Paul Lujan (PostDoc) will dedicate some time on this topic
        * Padova people should join slack, google groups
        * goal is to deploy a full analysis workflow
        * Matteo talked about the two thrusts, Marco is more interested in the
          2nd trust
            * Matteo will do an introduction
        * Marco has infrastructure with Spark at Padova, happy to share this
          resource for tests and development
            * MESOS running underneath 
            * welcome the opportunity, think about where to run the tests,
              discuss offline
    * Jim mentions FemtoCode
        * DianaHEP meeting update this Monday
        * would like to test at Padova
        * this is not a Spark project, but Padova would fit because Padova
          allows to install software (through MESOS)

* Action item on Matteo
    * transfer Panda samples to CERN for Viktor to run on them
    * took 10 days to setup everything (no space, no permission)
    * copy directly to HDFS, was only a temporary kerberos problem
    * Bo is helping
    * We can do this within days now
    * few TB of files

* abstracts
    * will contact Illia if something special needs to be done for
      openlab/Intel related to abstracts

* CERN
    * Vagg is away last and this week, will report when is back next Monday

* Short term planning for analysis thrust
    * bottle neck was transferring files
    * want to look at full Panda production sample set
    * then we will write the analysis with the new tools
    * python
    * to make this project really useful for analysis, lets look at the fit
* data reduction thrust
    * start with Scala, plan to eventually go to python (far off)
    * For the analysis thrust, performance is critical, python functions would
      have to get compiled, that is more complicated
    * easier on the physics part
    * write code as soon as Vagg is back
    * some data is already there

* feedback from Intel team
    * analysis could be reproduced with the simulation framework
    * they have basic building blocks
    * will do this when there is an established workflow, close to the real
      thing
    * no general suggestions, comments because the team needs more details and
      access to logs

* move from open data to CMS data
    * permission for Vagg will come, but not soon

There are minutes attached to this event. Show them.
    • 16:00 16:10
      News 10m
      Speakers: Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
    • 16:10 17:00
      Discussion 50m