Please contact luca.canali@cern.ch if you have specific topics you want to discuss, so that we can better organize the discussion and time for Q&A.
-
Additional questions and follow-up from the morning's computing seminar.
-
Discussion on topics regarding integrating Spark with Python, performance and usability - including ideas on further use of Arrow integration to pass data from Spark to the Cofea framework developed and FNAL (Lindesy Gray, Andrew Melo, CMS)
-
Drill down on Performance and Spark+Parquet in the context of speeding up data extraction for the Spark based framework developed for NXCals project. Several optimizations have been tested or are in the pipeline so far (including sorting by timestamp, partitioning and splitting in multiple files). There is interest to understand roadmap and current work in this area from Spark and open source communities, which can be of help for further tuning of the platform (Jakub Wozniak, BE-CO).
-
Interest in Spark structured streaming discussion, evolution in Spark 3, integration with Kafka, possible Kafka client upgrade to 2.0 (from the IT-CM monitoring team)
-
Possible interest by team working on Kubernetes and Kubeflow (Ricardo Brito Da Rocha, IT-CM)
-
Possible interest by team working on SWAN, integrating Spark and Jupyter + developing distributed processing for ROOT with Spark (Enric Tejedor Saavedra, EP-SFT)