Speaker
Description
With the increased integration of machine learning and the need for the scale of high-performance computing infrastructures, scientific workflows are undergoing a transformation toward greater heterogeneity. In this evolving landscape, adaptability has emerged as a pivotal factor in accelerating scientific discoveries through efficient execution of workflows. To increase resource utilization, reduce makespan, and minimize costs, it is essential to enable adaptive and asynchronous execution of heterogeneous tasks within scientific workflows. Consequently, middleware capable of scheduling and executing heterogeneous workflows must incorporate support for adaptive and asynchronous execution. We conduct an investigation into the advantages, prerequisites, and characteristics of a novel workflow execution middleware. Our proposed middleware dynamically adjusts the allocated resources for various task types based on historical execution data and executes them asynchronously. Through a comprehensive analysis, we elucidate how different degrees of asynchronicity impact workflow performance. Furthermore, we demonstrate the benefits in terms of performance and resource utilization by executing a real-world workflow (XYZ) at scale, using our execution middleware.
Significance
This work will enable more suitable execution models for AI/ML-coupled scientific workflows. With adaptive and asynchronous execution, highly heterogeneous physics workflows can increase resource utilization and reduce the cost of execution. Hence, this will enable faster and cheaper scientific discoveries.
References
https://link.springer.com/chapter/10.1007/978-3-031-43943-8_2