Speaker
Description
Complex accelerators must have control systems that can handle dynamic nonlinear environments. This makes traditional control methods unsuitable as they can struggle to adapt to these uncertainties. This provides an ideal environment for reinforcement learning algorithms as they are adaptable and generalizable. We present a reinforcement learning pipeline that can effectively handle the dynamics of a complex accelerator. We test and prove our pipelines capabilities on multiple environments including the Spallation Neutron Source (SNS) and the Beam Test Facility (BTF) at Oakridge National Lab (ORNL). Due to the limited time available to train an online algorithm like reinforcement learning on a real accelerator, we utilize a virtual twin accelerator (VIRAC) developed by ORNL to pretrain the policy and show its ability to converge in the virtual environment. We then test the adaptability of the pretrained RL model by applying it on the real accelerator and comparing the results. Utilizing our Scientific Optimization and Controls Toolkit (SOCT) and open-source standards such as Gymnasium we create and solve for a MEBT orbit correction problem in the SNS and an emittance maximization problem in the BTF. We show how Twin Delayed Deep Deterministic Policy Gradient (TD3) can solve this optimization environment in the virtual accelerator and transfer this policy onto the real accelerator for inference and model retraining. We show how reinforcement learning can be utilized as a control system for complex accelerators and provide a model pipeline for how an implementation performs and can be adapted to new accelerator control problems.