In recent years the usage of machine learning techniques within data-intensive sciences in general and high-energy physics in particular has rapidly increased, in part due to the availability of large datasets on which such algorithms can be trained as well as suitable hardware, such as graphics or tensor processing units which greatly accelerate the training and execution of such algorithms. Within the HEP domain, the development of these techniques has so far relied on resources external to the primary computing infrastructure of the WLCG. In this paper we present an integration of hardware-accelerated workloads into the Grid through the declaration of dedicated queues with access to hardware accelerators and the use of linux container images holding a modern data science software stack. A frequent use-case of in the development of machine learning algorithms is the optimization of neural networks through the tuning of their hyper parameters. For this often a large range of network variations must be trained and compared, which for some optimization schemes can be performed in parallel -- a workload well suited for grid computing. An example of such a hyper-parameter scan on Grid resources for the case of Flavor Tagging within ATLAS is presented.