GSoC 2017 - Big Data Tools for Physics Analysis
Participants: Krishnan, Danilo, Prasanth, Enric
# Status of tasks
* Task information for JS display
- Via Spark listeners, we can obtain the same information we get from the Spark REST API plus some extra information about the RDDs of our application and the dependencies between stages. However, this does not help us link tasks with user code.
- Given the information that is at our disposal, we will start with a display that shows an event timeline for tasks, where the x axis represents the time (which is refreshed as the application runs) and the y axis corresponds to the executors. Tasks will be represented as rectangles much like in the Spark UI. We will also draw vertical lines to mark the start and end of a stage, according to the tasks shown in the display.
- ACTION: Krishnan will implement a first prototype of the display described above.
* Automatic detection of Spark jobs in a cell
- Krishnan has implemented a Python listener that automatically places a JS display in the output of the cell that triggered a Spark job.
- ACTION: Krishnan will implement the listener in both Python and Scala, since the Python-only version listens to all the possible event types and not only the job creation events.
- ACTION: Krishnan will check if the listener can be configured by setting the spark.extraListeners
property on a Python SparkConf object, just like it can be done via an argument of pyspark. Ideally, in SWAN we would create a SparkConf object with some default configuration that the user can extend; part of the default configuration would be the registration of the listener. No extra call from the user should be needed.
- ACTION: Krishnan will make sure that the display is always placed in the right cell also for special cases (kernel restarted, cell deleted).
* Synchronous execution - Communication frontend-kernel
- The control channel only contains messages such as "abort" for the kernel.
- ACTION: Krishnan will try to use the control channel to send "stop job" messages from the frontend to the kernel. Krishnan will implement a solution where he only inspects the control channel and not the shell channel by using lower level primitives in Jupyter.
* Code
- Krishnan placed his code at this repo:
https://github.com/krishnan-r/sparkmonitor
# Jupyter Spark roadmap
- The Jupyter people did not reply to our e-mail, we will continue on our own for now.