Particle therapy using protons or heavy ions is a relatively new cancer treatment modality which has acquired increasing popularity in the last decade, due to its potential in reducing undesired dose to the nearby healthy tissues, with respect to conventional radiotherapy. However, current clinical treatment planning based on computed tomography suffers from modest range uncertainties due to inaccurate conversions of Hounsfield units (HU) to relative stopping power (RSP). Proton computed tomography (pCT) poses an alternative imaging technique promising accurate pre-imaging of patients for treatment planning reducing uncertainties in dose distribution calculations. In contrast to X-rays used in conventional CT, protons do not travel on a straight line throughout patient and detector due to interactions with the traversed matter and thus require a reconstruction of the taken path prior to image reconstruction.
We propose a novel track following technique based on deep reinforcement learning (RL) for recovering proton traces inside the DTC where we formalize the task at hand as a Markov decision process (MDP) on a graph. Here we aim to learn a deterministic policy parametrized by a deep neural network optimizing the physical plausibility of sequential transitions between nodes, describing proton hit centroids.
In a proof of principle study on Monte Carlo simulated data, we show that modeling of elastic nuclear interactions is a sufficient metric for a dense reward function allowing the optimization of proton traces in homogeneous detector configurations without knowledge of the ground truth. Moreover, with reinforcement learning we can reconstruct at the current stage trajectories originating from a variety of phantoms and particle densities with accuracies in the 50-98\% range while being able to relocate the optimization steps to an initial training phase and thus avoid performing recursive or iterative optimization of proton tracks during inference. Currently this approach is limited to homogeneous detectors lacking the ability to efficiently trace protons over tracking layers. Finally, at the moment we rely on ground-truth seeding for finding initial track seeds in order to avoid unwanted behavior on the reinforcement learning approach.