Speakers
Description
High Energy Physics (HEP) computing at CERN has long relied on interactive SSH environments, shared software stacks and large-scale batch systems. As workloads increasingly adopt containerized and accelerator-driven execution models, a key requirement is to provide a consistent user interface while enabling modern orchestration platforms.
This contribution presents the computing platform developed for the Next Generation Triggers (NGT) project, which unifies traditional HEP workflows with a centralized pool of accelerator resources across on-premises and external infrastructures. The platform is built around a large centralized Kubernetes environment hosting GPUs and other accelerator technologies from multiple vendors, with heterogeneous interconnects including InfiniBand and RoCEv2. Users can access these resources interactively through SSH, notebooks, VSCode and standard Kubernetes interfaces, or through batch-style scheduling and quotas with support for MPI.
We also introduce the MLOps stack developed for NGT, including automated model training and inference pipelines, integration with GitLab CI and GitHub Actions, and a comprehensive monitoring system for workload-level observability, resource utilization and energy reporting. The platform demonstrates how cloud-native tooling can sustain familiar HEP development practices while enabling scalable and accelerator-efficient computing for future trigger and analysis applications.