CMS Big Data Science Project
FNAL: WH6NW
CERN: 600-R-001
Instructions to create a light-weight CERN account to join the meeting via Vidyo:
If not possible, people can join the meeting by the phone, call-in numbers are here:
The meeting id is hidden below the Videoconference Rooms link, but here it is again:
- 10502145
News
- Dates and Events
- CERN Openlab workshop on Machine Learning and Data Analytics, April 27, CERN
- DS@HEP at FNAL, May 8-12, FNAL
- https://indico.fnal.gov/conferenceDisplay.py?confId=13497
- Matteo will give a talk
- HEP Analysis Ecosystem Workshop, May 22-24, Amsterdam
- “Database Futures” workshop at CERN on May 29th-30th
- https://indico.cern.ch/event/615499/
- to discuss possible future needs in the database area for Run3+4. Today we see mostly relational and non-relational database models.
- New trends are Cloud Computing, Big Data, proactive & predictive performance analysis, …
- Abstracts:
- Databases Futures: http://tinyurl.com/mebdh9p
- ACAT 2017: http://tinyurl.com/mwsnj8b
Notes
attendance: Bo, Illia, Marco, Luca, Kacper, Viktor, Paul, Matteo, JimP, OLI
* news
* Marco Zanetti: introduction
* Padova group, has a group of computer scientist used to work on HEP
topics
* convert these resources into something related to big data and
data science in general
* started looking at what was on the market, talked already with DIANA
team
* played with whatever was there
* Paul Lujan (PostDoc) will dedicate some time on this topic
* Padova people should join slack, google groups
* goal is to deploy a full analysis workflow
* Matteo talked about the two thrusts, Marco is more interested in the
2nd trust
* Matteo will do an introduction
* Marco has infrastructure with Spark at Padova, happy to share this
resource for tests and development
* MESOS running underneath
* welcome the opportunity, think about where to run the tests,
discuss offline
* Jim mentions FemtoCode
* DianaHEP meeting update this Monday
* would like to test at Padova
* this is not a Spark project, but Padova would fit because Padova
allows to install software (through MESOS)
* Action item on Matteo
* transfer Panda samples to CERN for Viktor to run on them
* took 10 days to setup everything (no space, no permission)
* copy directly to HDFS, was only a temporary kerberos problem
* Bo is helping
* We can do this within days now
* few TB of files
* abstracts
* will contact Illia if something special needs to be done for
openlab/Intel related to abstracts
* CERN
* Vagg is away last and this week, will report when is back next Monday
* Short term planning for analysis thrust
* bottle neck was transferring files
* want to look at full Panda production sample set
* then we will write the analysis with the new tools
* python
* to make this project really useful for analysis, lets look at the fit
* data reduction thrust
* start with Scala, plan to eventually go to python (far off)
* For the analysis thrust, performance is critical, python functions would
have to get compiled, that is more complicated
* easier on the physics part
* write code as soon as Vagg is back
* some data is already there
* feedback from Intel team
* analysis could be reproduced with the simulation framework
* they have basic building blocks
* will do this when there is an established workflow, close to the real
thing
* no general suggestions, comments because the team needs more details and
access to logs
* move from open data to CMS data
* permission for Vagg will come, but not soon