- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
PPD/ Quarium-WH8SW - Wilson Hall 8th fl South West
Instructions to create a light-weight CERN account to join the meeting via Vidyo:
If not possible, people can join the meeting by the phone, call-in numbers are here:
The meeting id is hidden below the Videoconference Rooms link, but here it is again:
## 170322 - Big Data Meeting
* attendance: JimP, JimK, Matteo, Saba, Kacper, Luca, Vagg, Bo, Illia
* news
* ACAT abstract submission deadline: 29 April 2017
* https://indico.cern.ch/event/567550/abstracts/
* JimP wants to submit abstracts about FemtoCode
* IEEE Big Data December 2017
* Call for paper is open and will remain open for most of the year
* CERN Openlab workshops with aim of identifying potential areas of work
for the next three-year phase of CERN openlab (phase VI)
* Data Center Technologies and Infrastructures, March 1, agenda:
http://tinyurl.com/l4ru23o
* Compute Platforms and Software, March 23, agenda:
http://tinyurl.com/l3uddp6
* Machine learning and data analytics, April 27, agenda not yet
available
* We will be asked to give a talk about the Intel Big Data project
* Community White Paper (CWP): A Roadmap for HEP Software and Computing R&D
for the 2020s
* main page: http://hepsoftwarefoundation.org/activities/cwp.html
* working groups:
http://hepsoftwarefoundation.org/cwp/cwp-working-groups.html
* Data Analysis and Interpretation WG: google docs:
http://tinyurl.com/jsxytph
* Planned workshops
* CWP discussions at HEP Analysis Ecosystem Retreat, May 22-24
(agenda: http://tinyurl.com/lctgf3f)
* Machine Learning WG: google docs: http://tinyurl.com/m559fmy
* Planned workshops
* a CWP session during the IML topical workshop at CERN, March
20 - 22, 2017
* a CWP session (TBC) during DS@HEP 2017, FNAL May 8 - 12, 2017
* followed by two days of tutorials at Fermilab
(Monday-Tuesday) about machine leanring
* hosted by Maurizio Perini
* entirely CMS, not sure
* hats in May as well, not including spark because of
complicated access
* subscribe to the google groups to stay informed and maybe get
involved
* Monthly Intel/Openlab meeting -> Vagg is going to join the meeting and
report *
* progress reports
* Meetings with Vagg
* hasn't contacted Matteo yet
* will be organized for next week when Matteo is at CERN
* JimP will go through spark-root library with Victor and Matteo
* meeting with EOS people happened
* First goal: accessing ROOT files from EOS directly from Spark
* Matteo:
* Thrust 1: full analysis
* use Panda ntuples to do analysis up to plots
* Thrust 2: data reduction
* from open data to ntuple
* does not need the full analysis use case
* sit down decide selection that makes sense from the physics
point of view
* Panda ntuple
* produced at CERN before, now they are produced at MIT, CERN
production should be fine
* Panda is a couple TB
* can share infrastructure between the two thrusts
* Victor:
* spark-root
* bugs are being worked on
* IBM is starting to use it
* JimP
* histogrammer: some bug fixes and new features: KPMG (consulting
company)
* added some vislauztaion for categorical data
* SparkR is already getting the parallelism that histogrammer will
get you
* Luca
* comments on the google doc, will discuss next week at CERN
* Illia
* Intel would like to start running a simulator to optimize the
performance
* need metrics to feed simulator
* Victor will show his metrics implementation and results form
running on the intel cluster CERN-It had access to next week
* Saba
* one paper was submitted in January focussed on HDF5 and data
layout and results on NERSC Edison, for International Parallel
and Distributed COmputing Symposium, IPDPS 2017
* workshop: High Performance Computation
* camera ready version of paper was uploaded last night
* submitted abstract to Grace Hopper -> Focus is on comparing MPI
and Spark
* nothing concrete, presentation not until October
* Saba wants to have a summer student from UIC
* ROOT files from EOS read in Spark
* 2 solutions
* access data in Spark via local mount point of EOS (POSIX filesystem
to EOS, FUSE mount), there are limitation
* solution should already exist and is not the preferred solution
* connect to EOS directly (without FUSE mount)
* two possibilitites
* write new java code following spark-root resurrection of old
ROOT-JAVA implementation
* or use C++ code in java (JNI) (did I get this right?)
* plan:
* have data reduction facility physics code ready to be used by next
meeting
* next meeting
* April 5th