Reinforcement learning algorithms have demonstrated a remarkable performance in the area of fast feedback controls applications. At KIT (Karlsruhe Institute of Technology) storage ring KARA (Karlsruhe Research Accelerator) a fast and adaptive longitudinal feedback system based on reinforcement learning is considered to stabilize the longitudinal complex dynamics of the electron beam. In order to control the fast dynamics of the micro-bunching sub-structures, the feedbacks time latency of the reinforcement inference should not exceed 100 µs. Due to the necessary continuously learning mechanism and to provide a fast feedback signals to the RF accelerator system, we developed an ultra-fast reinforcement learning framework running on Xilinx ZYNQ US+ device. In this contribution, we present the reinforcement learning framework and the hardware development in the ZYNQ. As proof of concept, two examples of fast controls have been considered and will be presented. The control of the equilibrium of a cartpole and of a pendulum. The first control is based on Policy Gradient (PG) and the second on Deep Deterministic Policy Gradient (DDPG). A detailed comparison between the proposed framework and the implementation on standard PC by Tensorflow will be presented. The comparison shows a dramatically improvement, of both the inference and training, in the time step in continuous learning. The presented framework is developed to cover several reinforcement algorithms including the Deep-Q-Network (DQN), the Advantage Actor Critic (A2C) or Asynchronous Advantage Actor Critic.
|Are you a student?||Yes|