Help us make Indico better by taking this survey! Aidez-nous à améliorer Indico en répondant à ce sondage !

CMS Big Data Science Project

US/Central
Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
Description

PPD/ Round Table-WH11SE - Wilson Hall 11th fl South East

CERM room:

 

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

If not possible, people can join the meeting by the phone, call-in numbers are here:

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

  • 10502145

## 21st Big Data Meeting

* SuperComputing
    * Poster session (2 hours)
    * 10 people stopped by
    * One from Cray, Ebay, Particle physicists, few students
    * people were interested in seeing performance difference to read from lustre and hdfs 
        * comment from Alexey: we have Lustre at Princeton, available for comparisons
    * ebay was interested in configuration, hitting some scaling issues 
    * questions/comments
        * Why did we convert from ROOT to HDF5
        * H5Spark from NERSC did not work for us
    * spoke with person from LBNL, working on future version of HDF5, interested in implementing our queries, Saba could just give him our headers, not the data itself
        * LBNL has an own system that does what Spark does
        * interested in working with us
        * careful, we work on CMS data, LBNL is Atlas
    * company: 2sigma
        * Spark extension for time series data
        * Saba was supposed to look into it
        * Wes McKinney works at 2sigma: Arrow, Pandas

* Alexey is participating in SparkSummit East, Boston
    * talking about Histogrammer and Princeton efforts in research track
    * February 7-9, deadline passed
* Saba: should consider to report at the next SparkSummit
    * San Francisco
    * call will be opened January

* Future data reading:
    * reading ROOT files from Java
    * right now, you can read simple types, fixed size arrays variable dimensions, arrays where one dimension is variable length and others are fixed, struct specifying the list of leaves
    * BaconProd files are not mapped correctly yet
        * working on this
    * root4j is in maven central
    * spark-root is in git and will move to maven central with the next release including stl types

* new data
    * 2016 will be used
    * re-reco is ready to be used
    * MC is being produced
    * end of the year, all should be available

* next meeting in January
    * focus on chep proceedings

There are minutes attached to this event. Show them.
    • 15:00 15:05
      News 5m
      Speakers: Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))