4–8 Nov 2019
Adelaide Convention Centre
Australia/Adelaide timezone

Featherweight Communication Between The Tasks For Spark

5 Nov 2019, 15:30
1h
Hall F (Adelaide Convention Centre)

Hall F

Adelaide Convention Centre

Poster Track 5 – Software Development Posters

Speaker

Mr YI WANG

Description

ABSTRACT

Apache Spark is a splendid framework for big data analysis nowadays. A Spark application can be divided into some jobs which are triggered by an action of RDD, then the jobs will be divided into stages by the DAGScheduler, after these processes, we will get the task which is a unit of work within a stage, corresponding to one RDD partition.

Task is the smallest unit when Spark executes the application. However there is no communication between tasks in the current Spark framework. This article discusses the reasons why we need to extend Spark by compiling an API which can offer featherweight communication between tasks. This API won’t break current communication mode and can be portable to standard Spark installations. At last we give some examples to explain how to address the specified situation in the high energy physics field.

Keywords
Spark, featherweight communication,task, high energy physics

Consider for promotion No

Primary author

Presentation materials

There are no materials yet.