Felice Pantaleo (CERN) Julien Leduc
Data analyses based on evaluation of likelihood functions are commonly used in the high energy physics community for fitting statistical models to data samples. These procedures require several evaluations of these functions and they can be very time consuming. Therefore, it becomes particularly important to have fast evaluations. This paper describes a parallel implementation that allows to run cooperatively the evaluations of the negative log-likelihood function for data analysis methods on heterogeneous computational devices (i.e. CPU and GPU) belonging to a single computational node or on several homogeneous nodes connected by a network. The implementation is able to split and balance the workload needed for the evaluation of the function in corresponding sub-workloads to be executed in parallel on each computational device. The CPU parallelization is implemented using OpenMP, while the GPU implementation is based on CUDA. The parallelization over several nodes is based on MPI. The comparison of the performance of these implementations for different configurations and different hardware systems is reported. Tests are based on real data analyses carried out by the high energy physics community taken from RooFit and RooStats packages.