CMS Big Data Science Project

Name: CMS Big Data Science Project
Start: 2017-04-19T16:00:00+02:00
End: 2017-04-19T17:15:00+02:00
Location: CERN

Wednesday 19 Apr 2017, 16:00 → 17:15 Europe/Berlin

600/R-001 (CERN)

600/R-001

CERN

Show room on map

Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))

Description

FNAL: WH6NW

CERN: 600-R-001

Instructions to create a light-weight CERN account to join the meeting via Vidyo:

https://account.cern.ch/account/Externals/

If not possible, people can join the meeting by the phone, call-in numbers are here:

http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone

The meeting id is hidden below the Videoconference Rooms link, but here it is again:

10502145

Hide

News

Dates and Events
- CERN Openlab workshop on Machine Learning and Data Analytics, April 27, CERN
  - https://indico.cern.ch/event/627852/
- DS@HEP at FNAL, May 8-12, FNAL
  - https://indico.fnal.gov/conferenceDisplay.py?confId=13497
  - Matteo will give a talk
- HEP Analysis Ecosystem Workshop, May 22-24, Amsterdam
  - https://indico.cern.ch/event/613842/timetable/
- “Database Futures” workshop at CERN on May 29th-30th
  - https://indico.cern.ch/event/615499/
  - to discuss possible future needs in the database area for Run3+4. Today we see mostly relational and non-relational database models.
  - New trends are Cloud Computing, Big Data, proactive & predictive performance analysis, …
Abstracts:
- Databases Futures: http://tinyurl.com/mebdh9p
- ACAT 2017: http://tinyurl.com/mwsnj8b

Notes

attendance: Bo, Illia, Marco, Luca, Kacper, Viktor, Paul, Matteo, JimP, OLI

* news
* Marco Zanetti: introduction
* Padova group, has a group of computer scientist used to work on HEP
topics
* convert these resources into something related to big data and
data science in general
* started looking at what was on the market, talked already with DIANA
team
* played with whatever was there
* Paul Lujan (PostDoc) will dedicate some time on this topic
* Padova people should join slack, google groups
* goal is to deploy a full analysis workflow
* Matteo talked about the two thrusts, Marco is more interested in the
2nd trust
* Matteo will do an introduction
* Marco has infrastructure with Spark at Padova, happy to share this
resource for tests and development
* MESOS running underneath
* welcome the opportunity, think about where to run the tests,
discuss offline
* Jim mentions FemtoCode
* DianaHEP meeting update this Monday
* would like to test at Padova
* this is not a Spark project, but Padova would fit because Padova
allows to install software (through MESOS)

* Action item on Matteo
* transfer Panda samples to CERN for Viktor to run on them
* took 10 days to setup everything (no space, no permission)
* copy directly to HDFS, was only a temporary kerberos problem
* Bo is helping
* We can do this within days now
* few TB of files

* abstracts
* will contact Illia if something special needs to be done for
openlab/Intel related to abstracts

* CERN
* Vagg is away last and this week, will report when is back next Monday

* Short term planning for analysis thrust
* bottle neck was transferring files
* want to look at full Panda production sample set
* then we will write the analysis with the new tools
* python
* to make this project really useful for analysis, lets look at the fit
* data reduction thrust
* start with Scala, plan to eventually go to python (far off)
* For the analysis thrust, performance is critical, python functions would
have to get compiled, that is more complicated
* easier on the physics part
* write code as soon as Vagg is back
* some data is already there

* feedback from Intel team
* analysis could be reproduced with the simulation framework
* they have basic building blocks
* will do this when there is an established workflow, close to the real
thing
* no general suggestions, comments because the team needs more details and
access to logs

* move from open data to CMS data
* permission for Vagg will come, but not soon

There are minutes attached to this event. Show them.

- 16:00 → 16:10
  
  News 10m
  
  Speakers: Matteo Cremonesi (Fermi National Accelerator Lab. (US)), Oliver Gutsche (Fermi National Accelerator Lab. (US))
- 16:10 → 17:00
  
  Discussion 50m