CMS Big Data Science Project

Name: CMS Big Data Science Project
Start: 2017-03-22T16:00:00+01:00
End: 2017-03-22T17:00:00+01:00
Location: No location set

Wednesday 22 Mar 2017, 16:00 → 17:00 Europe/Berlin

Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))

Description

PPD/ Quarium-WH8SW - Wilson Hall 8th fl South West

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

https://account.cern.ch/account/Externals/

If not possible, people can join the meeting by the phone, call-in numbers are here:

http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

10502145

Hide

## 170322 - Big Data Meeting

* attendance: JimP, JimK, Matteo, Saba, Kacper, Luca, Vagg, Bo, Illia

* news
* ACAT abstract submission deadline: 29 April 2017
* https://indico.cern.ch/event/567550/abstracts/
* JimP wants to submit abstracts about FemtoCode
* IEEE Big Data December 2017
* Call for paper is open and will remain open for most of the year
* CERN Openlab workshops with aim of identifying potential areas of work
for the next three-year phase of CERN openlab (phase VI)
* Data Center Technologies and Infrastructures, March 1, agenda:
http://tinyurl.com/l4ru23o
* Compute Platforms and Software, March 23, agenda:
http://tinyurl.com/l3uddp6
* Machine learning and data analytics, April 27, agenda not yet
available
* We will be asked to give a talk about the Intel Big Data project
* Community White Paper (CWP): A Roadmap for HEP Software and Computing R&D
for the 2020s
* main page: http://hepsoftwarefoundation.org/activities/cwp.html
* working groups:
http://hepsoftwarefoundation.org/cwp/cwp-working-groups.html
* Data Analysis and Interpretation WG: google docs:
http://tinyurl.com/jsxytph
* Planned workshops
* CWP discussions at HEP Analysis Ecosystem Retreat, May 22-24
(agenda: http://tinyurl.com/lctgf3f)
* Machine Learning WG: google docs: http://tinyurl.com/m559fmy
* Planned workshops
* a CWP session during the IML topical workshop at CERN, March
20 - 22, 2017
* a CWP session (TBC) during DS@HEP 2017, FNAL May 8 - 12, 2017
* followed by two days of tutorials at Fermilab
(Monday-Tuesday) about machine leanring
* hosted by Maurizio Perini
* entirely CMS, not sure
* hats in May as well, not including spark because of
complicated access
* subscribe to the google groups to stay informed and maybe get
involved
* Monthly Intel/Openlab meeting -> Vagg is going to join the meeting and
report *

* progress reports
* Meetings with Vagg
* hasn't contacted Matteo yet
* will be organized for next week when Matteo is at CERN
* JimP will go through spark-root library with Victor and Matteo
* meeting with EOS people happened
* First goal: accessing ROOT files from EOS directly from Spark
* Matteo:
* Thrust 1: full analysis
* use Panda ntuples to do analysis up to plots
* Thrust 2: data reduction
* from open data to ntuple
* does not need the full analysis use case
* sit down decide selection that makes sense from the physics
point of view
* Panda ntuple
* produced at CERN before, now they are produced at MIT, CERN
production should be fine
* Panda is a couple TB
* can share infrastructure between the two thrusts
* Victor:
* spark-root
* bugs are being worked on
* IBM is starting to use it
* JimP
* histogrammer: some bug fixes and new features: KPMG (consulting
company)
* added some vislauztaion for categorical data
* SparkR is already getting the parallelism that histogrammer will
get you
* Luca
* comments on the google doc, will discuss next week at CERN
* Illia
* Intel would like to start running a simulator to optimize the
performance
* need metrics to feed simulator
* Victor will show his metrics implementation and results form
running on the intel cluster CERN-It had access to next week
* Saba
* one paper was submitted in January focussed on HDF5 and data
layout and results on NERSC Edison, for International Parallel
and Distributed COmputing Symposium, IPDPS 2017
* workshop: High Performance Computation
* camera ready version of paper was uploaded last night
* submitted abstract to Grace Hopper -> Focus is on comparing MPI
and Spark
* nothing concrete, presentation not until October
* Saba wants to have a summer student from UIC

* ROOT files from EOS read in Spark
* 2 solutions
* access data in Spark via local mount point of EOS (POSIX filesystem
to EOS, FUSE mount), there are limitation
* solution should already exist and is not the preferred solution
* connect to EOS directly (without FUSE mount)
* two possibilitites
* write new java code following spark-root resurrection of old
ROOT-JAVA implementation
* or use C++ code in java (JNI) (did I get this right?)

* plan:
* have data reduction facility physics code ready to be used by next
meeting

* next meeting
* April 5th

There are minutes attached to this event. Show them.

- 16:00 → 16:05
  
  News 5m
  
  Speakers: Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
  
  * ACAT abstract submission deadline: 29 April 2017
  * https://indico.cern.ch/event/567550/abstracts/
  * CERN Openlab workshops with aim of identifying potential areas of work
  for the next three-year phase of CERN openlab (phase VI)
  * Data Center Technologies and Infrastructures, March 1, agenda:
  http://tinyurl.com/l4ru23o
  * Compute Platforms and Software, March 23, agenda:
  http://tinyurl.com/l3uddp6
  * Machine learning and data analytics, April 27, agenda not yet
  available
  * We will be asked to give a talk about the Intel Big Data project
  * Community White Paper (CWP): A Roadmap for HEP Software and Computing R&D
  for the 2020s
  * main page: http://hepsoftwarefoundation.org/activities/cwp.html
  * working groups:
  http://hepsoftwarefoundation.org/cwp/cwp-working-groups.html
  * Data Analysis and Interpretation WG: google docs:
  http://tinyurl.com/jsxytph
  * Planned workshops
  * CWP discussions at HEP Analysis Ecosystem Retreat, May 22-24
  (agenda: http://tinyurl.com/lctgf3f)
  * Machine Learning WG: google docs: http://tinyurl.com/m559fmy
  * Planned workshops
  * a CWP session during the IML topical workshop at CERN, March
  20 - 22, 2017
  * a CWP session (TBC) during DS@HEP 2017, FNAL May 8 - 12, 2017
  * subscribe to the google groups to stay informed and maybe get
  involved
  * Monthly Intel/Openlab meeting -> Vagg is going to join the meeting and
  report
- 16:05 → 17:00
  
  Discussion 55m

Choose timezone

CMS Big Data Science Project