DIR/ Black Hole-WH2NW - Wilson Hall 2nd fl North West (TBC)
CERM room:
Instructions to create a light-weight CERN account to join the meeting via Vidyo:
If not possible, people can join the meeting by the phone, call-in numbers are here:
The meeting id is hidden below the Videoconference Rooms link, but here it is again:
### Roundtable
* Saba
* started to talk to NERSC about the skimming code and HDF5 part
* Today will present in NERSC meeting about HDF5 and queries
* Hoping for feedback and some suggestions to run this workflow at NERSC
* Results for running code at NERSC are available, not sure if we can improve
* JimP
* work with Jin and Igor
* Query system based on CouchBase and JimP's way of accessing data column-wise
* heavy caching
* stopped language development for query language a month ago and concentrating on implementation
* CouchBase implementation progressing, loaded all data into CouchBase, 1E6 queries per second
* JimK concentrating on cache hits and how high that can go
* KNL will overcome limitation from CPU to RAM
* 7 GHz reached if it is in cache
* MCD RAM of the KNL
* without that, limited at 1 GHz
* in parallel, developing a direct ROOT reader
* can start with data directly from files in storage (EOS)
* Luca/Kacper
* Intel has given access to test cluster for February
* Idea is also to test CMS big data
* copied Victor's data to the intel cluster
* no progress in accessing root files directly from EOS, fellow starting March will work on this
### Spark and ROOT files presentation by Victor
* Spark is building schema before reading the data. It imposes constraints that all the data types must be known a priori to reading.
* Need to plan tests on Intel Lab Cluster ➜ email thread
* We want to concentrate on python and Jupyter ➜ python is important
* lets get histogrammer into the stack
* python + Jupyter + histogrammer + pyroot
* run Jupyter from lxplus and analytix, both have access to ROOT through CVMFS installed by the swan project
### Action items
* Thrust 1: prepare instructions for spark-root + python/Jupyter + histogrammer
* run python script
* use Jupyter notebook
* Thrust 2: data reduction facility: prepare instructions for spark-root + scala
* use scala script
* decide on output format
* Intel test cluster
* Victor has a plan what to test
* Need to get Matteo in the loop
* email thread?