Speaker
Description
The growing complexity of high energy physics analysis often involves running a large number of different tools. This demands a multi-step data processing approach, with each step requiring different resources and carrying dependencies on preceding steps. It’s important and useful to have a tool to automate these diverse steps efficiently.
With the Production and Distributed Analysis (PanDA) system and the intelligent Data Delivery Service (iDDS), we provide a platform for coordinating sequences of tasks with a workflow, orchestrating the seamless execution of tasks in a specified order and under predefined conditions, in order to automate the task sequence. In this presentation, we will present our efforts, beginning with an overview of the platform's architecture. We'll then describe a user-friendly interface with workflows described in python and tasks described by python functions. Next, we detail the flow to transform python functions into tasks and schedule tasks to distributed heterogeneous resources, coupled with a messaging-based asynchronous result-processing mechanism. Finally, we'll showcase a practical example illustrating how this platform effectively converts a machine learning hyperparameter optimization processing on an ATLAS ttH analysis to a distributed workflow.