Speaker
Description
Machine learning models used in real-time and resource-constrained environments, such as hardware triggers, online reconstruction pipelines, and FPGA/GPU inference systems, must satisfy strict latency, memory, and numerical precision requirements. Achieving these targets typically requires extensive tuning of training schedules, quantization settings, sparsity levels, and architectural parameters. In current workflows, this optimization process is often manual and difficult to reproduce, especially when multiple objectives must be balanced simultaneously.
To address this challenge, we introduce a new hyperparameter optimisation platform within the PQuantML library, developed as part of the Next-Generation Trigger project. The platform provides an integrated framework for automated exploration of compression parameters and fine-tuning strategies, built on Optuna for adaptive sampling and MLflow for experiment tracking. Users define search spaces and evaluation metrics through configuration files, enabling large-scale optimisation experiments without modifying model code. We demonstrate the framework on representative convolutional and classifier models used in real-time ML studies, showing how automated optimisation systematically identifies the best configurations that meet HL-LHC latency and resource budgets while maintaining high physics performance. This integrated platform capability strengthens PQuantML as a toolchain for preparing deployable ML models and provides a reproducible workflow for tuning models designed for the NGT and online computing systems.
| I read the instructions above | Yes |
|---|