1–5 Sept 2014
Faculty of Civil Engineering
Europe/Prague timezone

Planning for distributed workflows: constraint based co-scheduling of computational jobs and data placement in distributed environments.

4 Sept 2014, 15:25
25m
C217 (Faculty of Civil Engineering)

C217

Faculty of Civil Engineering

Faculty of Civil Engineering, Czech Technical University in Prague Thakurova 7/2077 Prague 166 29 Czech Republic
Oral Computing Technology for Physics Research Computing Technology for Physics Research

Speaker

Mr Dzmitry Makatun (Nuclear Physics Institute (CZ))

Description

When running data intensive applications on distributed computational resources long I/O overheads may be observed when access to remotely stored data is performed. Latencies and bandwidth can become the major limiting factors for the overall computation performance and reduce the application’s CPU/WallTime ratio due to excessive IO wait. For this reason further optimization of data management may imply increasing availability of data “closer” to the computational task and by then, reducing the overheads due to data access over long distance on the Grid. This ideal is in high demand in data intensive computational fields such as the ones in the HENP communities. In previous collaborative work of BNL and NPI/ASCR, we addressed the problem of efficient data transferring in a Grid environment and cache management. The transfer considered an optimization of moving data at N sites while the data may be located at M locations. However, the co-scheduling of data placement and processing was not (yet) approached as the hard problem was decomposed into simpler tasks. Leveraging the knowledge of our previous research, we propose a constraint programming based planner that schedules computational jobs, data placement (transfers) in a distributed environment in order to optimize resource utilization and reduce the overall processing completion time. The optimization is achieved by ensuring that none of the resources (network links, data storages and CPUs) are over-saturated at any moment of time and either (a) that the data is pre-placed at the site where the job runs or (b) that the jobs are scheduled where the data is already present. Such an approach would eliminate the idle CPU cycles and would have wide application in the community. In this talk, we will present the theoretical model behind our planner. We will further present the results of simulation based on input data extracted from log files of batch and data-management systems of experiment's STAR computation facility.

Primary authors

Mr Dzmitry Makatun (Nuclear Physics Institute (CZ)) Dr Hana Rudova (Masaryk University (CZ)) Dr Jerome LAURET (BROOKHAVEN NATIONAL LABORATORY) Michal Sumbera (Nuclear Physics Institute (CZ))

Presentation materials

Peer reviewing

Paper