ROOT I/O Meeting

Europe/Zurich
VIDYO

VIDYO

Attendees

Zhe: Implementing parallel unzipping based on TBB.  No result yet.  

Philippe: Are you using the ThreadExecutor?

Zhe: No not yet.

Danilo: See https://root.cern.ch/gitweb/?p=root.git;a=commitdiff;h=00f1d5a15132c1328c6d12511adc4347104ba982  for example on how to switch from raw TBB to using the executor.

Viktor: CERN IT has access to an Intel Spark cluster in the UK with 14 machines with 18 cores and 3TB of memory.  I used 6 public data set to set various Spark configuration and compares the performance.  It is difficult to monitor what is going on.  We can get to CPU time but no information on hardware I/O.  I am expecting to have something next week.

Danilo: Indeed, Spark is indeed poor at giving information on the bottlenecks.  If you are working with the Spark API, we should discuss.  We have submitted a google summer of code project to improve the monitoring informaiton available.

Viktor: I am not using the REST API but a jar file I included in the payload.

Jim: You ought to also communicate with the Spark developer via their old fashion mailing list.

Jim: See https://docs.google.com/document/d/1xHrKwRbpxnvUawYUDqRner1UHysrjfhWMoyHBuwUZWA/edit#heading=h.g6a3tql7drms for more details on the context of the work I have been doing in the last few weeks.

Philippe: What is your plan for C++ integration.

Jim: The parsing and jitting code is written in python and is (now) large enough that porting (reimplementing) it in C++ would be costly.  So I am considering have the C++ interface call back into python to execute the femtocode strings.

Danilo: You might want to look at TPython.

Guilherme: The change for VC external improvement have been merged.  Next is the integration of VecCore.

Danilo: I have been focusing on TDataFrame.   Next is implementing the output part of TDataFrame.

Philippe: Q about constructors ..

Next meeting in 2 weeks.

 

There are minutes attached to this event. Show them.