CMS Big Data Science Project

Name: CMS Big Data Science Project
Start: 2017-10-18T16:00:00+02:00
End: 2017-10-18T17:30:00+02:00
Location: CERN

Wednesday 18 Oct 2017, 16:00 → 17:30 Europe/Berlin

600/R-002 (CERN)

600/R-002

CERN

Show room on map

Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))

Description

FNAL room: Dark Side-WH6NW - Wilson Hall 2nd fl North East

CERN room: 600-R-002

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

https://account.cern.ch/account/Externals/

If not possible, people can join the meeting by the phone, call-in numbers are here:

http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

10502145

Hide

* attendance: Luca, Vagg, Viktor, Andrew, Saba, JimP, Matteo, OLI

* news
* ACAT talk feedback and proceedings
* FNAL tutorial
* Andrew Melo: tutorial for Spark in HEP Nov. 29 at Fermilab
* targeted towards CMS users, introduction, simple analysis
* ~3 hours
* using Vanderbilt resources
* input: ROOT files on HDFS
* using histogrammar & matplotlib
* there will be recordings
* CERN:
* 28-29 November: tutorial/course
* LHCb
* Luca did tutorial on LHCb open data
* Luca talked to Stefan Roser
* Luca suggested a physicist to give a talk ➜ Matteo will give the talk early November
* Another contact: TOTEM
* using spark on a supercomputer already
* will talk when they come to CERN

* Updates/discussion
* CHEP abstracts
* What would we like to achieve by CHEP (July 2018)
* 1 PB test: abstract: Vagg
* full analysis walkthrough: Andrew/Matteo
* JimP: query systems
* abstracts due: end of December
* CMS publication committee will update procedures to publish technical results with CMS data (making it unnecessary to use open data)
* Saba
* baseline implementation of Spark-stuff in HDF5, MPI and python on NERSC (supercomputer center)
* input data is small
* overall performance with MPI and Lustre was much better than what we saw with Spark
* need a dataset with couple of TB to make better statements
* maybe the root to numpy library used by Igor to convert data into NoSQL database schemas in batch mode could be used here
* batch more
* or reader comparison to HDF5
* Saba and JimP and Igor talk
* in one of the next meetings we could talk about SciDAC-4 big data project

* AOB
* CMS open data was used in publication by Theorists, they converted into ASCII!!!

There are minutes attached to this event. Show them.

- 16:00 → 16:10
  
  News 10m
  
  Speakers: Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
  
  ACAT 2017 proceedings
- 16:15 → 16:40
  
  Discussion 25m