Conference on Computing in High Energy and Nuclear Physics

Name: Conference on Computing in High Energy and Nuclear Physics
Start: 2024-10-19T08:00:00+02:00
End: 2024-10-25T18:30:00+02:00
Location: No location set

19–25 Oct 2024

Europe/Zurich timezone

Contact Program Chairs

chep2024-pc@cern.ch

Event Workflow Management System - A SaaS Solution for Massively Divisible and Distributed Workflows

WED 17

23 Oct 2024, 15:18

57m

Exhibition Hall

Poster Track 4 - Distributed Computing Poster session

Ric Evans (Wisconsin IceCube Particle Astrophysics Center)

How does one take a workload, consisting of millions or billions of tasks, and group it into tens of thousands of jobs? Partitioning the workload into a workflow of long-running jobs minimizes the use of scheduler resources; however, smaller, more fine-grained jobs allow more efficient use of computing resources. When the runtime of a task averages a minute or less, severe scaling challenges due to scheduling overhead can surface. Employing jobs that run for several hours, each with a large input file comprising a bundle of tasks, is effective in ideal situations. However, given the heterogeneity of available distributed resources and limited control of task-job matching, runtimes can vary widely.
The Event Workflow Management System (EWMS) augments HTCondor to solve this issue. EWMS implements a pilot-based paradigm where each worker, running inside an HTCondor execution point, connects to a message broker and executes many individual fine-grained tasks. This adaptive design increases task throughput while incorporating additional fail-safe features. In addition, EWMS manages workflow scheduling, enables real-time worker scaling, and exports a public-facing interface for user accessibility. Here, we outline the EWMS technique, detail science driver workflows from the IceCube experiment, and provide system usage metrics.

Ric Evans (Wisconsin IceCube Particle Astrophysics Center)

Benedikt Riedel (University of Wisconsin-Madison) Brian Aydemir (Morgridge Institute for Research) Brian Paul Bockelman (University of Wisconsin Madison (US)) David Schultz (University of Wisconsin-Madison) MIRON LIVNY

There are no materials yet.

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Event Workflow Management System - A SaaS Solution for Massively Divisible and Distributed Workflows

Exhibition Hall

Speaker

Description

Author

Co-authors

Presentation materials

Choose timezone

Conference on Computing in High Energy and Nuclear Physics

Contact Program Chairs

Speaker

Description

Author

Co-authors

Presentation materials