Speaker
Description
Transformer architectures have rapidly become the state-of-the-art approach for machine-learning models across many domains in science, offering unprecedented performance on complex, high-dimensional tasks. Their adoption within the ATLAS experiment, starting with their usage for flavour tagging, has opened new opportunities, but also introduced substantial challenges regarding large-scale training, infrastructure integration, and deployment within established software and production system of ATLAS. In this talk, we will provide an overview on how we train some of the transformer-based models in ATLAS, including the preparation of the dataset as well as hardware-specific constraints. We will also outline the latest developments in deployment workflows and their integration in the ATLAS offline production system.