Integrating Network Awareness in ATLAS Distributed Computing Using the ANSE Project

14 Apr 2015, 15:15
15m
B250 (B250)

B250

B250

oral presentation Track4: Middleware, software development and tools, experiment frameworks, tools for distributed computing Track 4 Session

Speaker

Dr Alexei Klimentov (Brookhaven National Laboratory (US))

Description

A crucial contributor to the success of the massively scaled global computing system that delivers the analysis needs of the LHC experiments is the networking infrastructure upon which the system is built. The experiments have been able to exploit excellent high-bandwidth networking in adapting their computing models for the most efficient utilization of resources. New advanced networking technologies now becoming available such as software defined networking hold the potential of further leveraging the network to optimize workflows and dataflows, through proactive control of the network fabric on the part of high level applications such as experiment workload management and data management systems. End to end monitoring of networks using perfSONAR combined with data flow performance metrics further allows applications to adapt based on real time conditions. We will describe efforts underway in ATLAS on integrating network awareness at the application level, particularly in workload management, building upon the ANSE (Advance Network Services for Experiments) project components. We will show how knowledge of network conditions, both historical and current, are used to optimize PanDA and other systems for ATLAS and describe how software control of end-to-end network paths can augment ATLAS's ability to effectively utilize its distributed resources.

Primary author

Kaushik De (University of Texas at Arlington (US))

Co-authors

Artem Petrosyan (Joint Inst. for Nuclear Research (RU)) Jorge Batista (U) Dr Shawn Mc Kee (University of Michigan (US))

Presentation Materials