11–15 Mar 2024
Charles B. Wang Center, Stony Brook University
US/Eastern timezone

AI-driven HPC Workflows Execution with Adaptivity and Asynchronicity in Mind

14 Mar 2024, 16:10
30m
Charles B. Wang Center, Stony Brook University

Charles B. Wang Center, Stony Brook University

100 Circle Rd, Stony Brook, NY 11794
Poster Track 1: Computing Technology for Physics Research Poster session with coffee break

Speaker

Ozgur Ozan Kilic (Brookhaven National Laboratory)

Description

With the increased integration of machine learning and the need for the scale of high-performance computing infrastructures, scientific workflows are undergoing a transformation toward greater heterogeneity. In this evolving landscape, adaptability has emerged as a pivotal factor in accelerating scientific discoveries through efficient execution of workflows. To increase resource utilization, reduce makespan, and minimize costs, it is essential to enable adaptive and asynchronous execution of heterogeneous tasks within scientific workflows. Consequently, middleware capable of scheduling and executing heterogeneous workflows must incorporate support for adaptive and asynchronous execution. We conduct an investigation into the advantages, prerequisites, and characteristics of a novel workflow execution middleware. Our proposed middleware dynamically adjusts the allocated resources for various task types based on historical execution data and executes them asynchronously. Through a comprehensive analysis, we elucidate how different degrees of asynchronicity impact workflow performance. Furthermore, we demonstrate the benefits in terms of performance and resource utilization by executing a real-world workflow (XYZ) at scale, using our execution middleware.

References

https://link.springer.com/chapter/10.1007/978-3-031-43943-8_2

Significance

This work will enable more suitable execution models for AI/ML-coupled scientific workflows. With adaptive and asynchronous execution, highly heterogeneous physics workflows can increase resource utilization and reduce the cost of execution. Hence, this will enable faster and cheaper scientific discoveries.

Primary author

Ozgur Ozan Kilic (Brookhaven National Laboratory)

Co-authors

Matteo Turilli (Rutgers University) Prof. Shantenu Jha (Rutgers University)

Presentation materials