Hydra

A library for data analysis in multithreaded platforms

A. Augusto Alves Jr ( University of Cincinnati )
CERN Geneve, 31 October 2016
Presented in the DIANA Project Meeting.

Outline


  • Design, strategies and goals
  • Core Functionalities
    • Functors
    • Data containers
    • Generic function evaluation
  • Examples
    • Multidimensional numerical integration
    • Multidimensional random number generation
    • Phase-Space Monte Carlo
    • Interface to Minuit2
  • Summary and prospects

Hydra


Hydra is a header only templated C++ library to perform data analysis on massively parallel platforms.

hydra-fig

  • It is implemented on top of the C++ Standard Library and a variadic version of the Thrust library.
  • Hydra runs on Linux systems and can perform calculations using OpenMPCuda and TBB enabled devices.
  • It is focused on portability, usability, performance and precision.

Design


The main design features are:

  • The library is structured using static polymorphism.
  • Type safety is enforced at compile time in order to avoid the production of invalid or performance degrading code.
  • No function pointers or virtual functions at all: stack behavior is known at compile time.
  • No need to write explicit device code.
  • Clean and concise semantics. Interfaces are designed to be easy to use correctly and hard to use incorrectly.
  • Common C++ idioms: container iterators, named parameter semantics, RAII, CRTP…

The same source files written using Hydra components and standard C++ compiles on GPU or CPU just exchanging the extension from .cu to .cpp.

Functionalities currently implemented


  • Generation of phase-space Monte Carlo Samples with any number of particles in the final states. Weighed and unweighted samples are supported.
  • Sampling of multidimensional PDFs.
  • Data fitting of binned and unbinned multidimensional data sets.
  • Evaluation of multidimensional functions over heterogeneous data sets.
  • Numerical integration of multidimensional functions using Monte Carlo-based methods: flat or self-adaptive (Vegas-like)

Functors


In Hydra most of the calculations are performed using function objects. Hydra add features, type information and interfaces to generic functors using the CRTP idiom.

//Gaussian with two parameters: mean and sigma
struct Gauss:public BaseFunctor<Gauss,double, 2>
{
  //client need to implement the Evaluate(T* ) method for homogeneous data.
  template<typename T>
  __host__ __device__
  inline double Evaluate(T* x){/* implementation */}

  //or heterogeneous data. In this case <T> represents a variadic tuple.
  template<typename T>
  __host__ __device__
  inline double Evaluate(T x){/* implementation */}
}
  • For all functors deriving from hydra::BaseFunctor<Func, ReturnType, NParams>, if the calculation is expensive or the functor will be called many times, the results of the first call can be cached.

Arithmetic and composition with functors


Hydra provides a lot “syntax sugar” to deal with function objects.

  • All the basic arithmetic operators are overloaded. Composition is also possible. If A, B and C are Hydra functors, the code below is completely legal.
 //basic arithmetic operations
 auto plus_functor = A + B;  auto minus_functor = A - B;
 auto multiply_functor = A * B;  auto divide_functor = A/B; 

 //composition of basic operations
 auto any_functor = (A - B)*(A + B)*(A/C);
 
 // C(A,B) is represented by:
 auto composed_functor = compose(C, A, B)
  • The functors resulting from arithmetic operations and composition can be cached as well.
  • No intrinsic limit on the number of functors participating on arithmetic or composition mathematical expressions.

Support for C++11 lambdas


Lambda functions are a very precious C++ resources that allow the implementation of new functionalities on-the-fly. These closures can hold state, capture variables defined in the enclosing scope etc…

  • Lambda are fully supported in Hydra.
  • In the client code one can define a lambda function at any point and convert it into a Hydra functor using hydra::wrap_lambda():
double two = 2.0;
//define a simple lambda and capture "two" by value
auto my_lambda = [] __host__ __device__(double* x)
    { return two*sin(x[0]); };

//convert is into a Hydra functor
auto my_lamba_wrapped  = wrap_lambda(my_lambda);

CUDA 8.0 supports lambda functions on device and host code.

Data containers


  • Hydra algorithms can operate over any iterable C++ container. Hydra provides hydra::PointVector.
  • hydra::PointVector is an iterable collection of hydra::Point objects: hydra::Point represents multidimensional data points with coordinates and value, error of the coordinates and error of the value.
  • hydra::Point uses conditional base class members and methods injection in order to save memory and stack size.
 //two dimensional data set
 PointVector<device, double, 2> data_d(1e6);

 // fill data_d and get data from device and fill a ROOT 2D histogram
 PointVector<host> data_h(data_d);

 TH2D hist("hist", "my histogram", 100, min, max); 
for(auto point : data_h )
{
   std::cout << point << std::endl; // stream the points to std::out
   hist.Fill(point.GetCoordinate(0), point.GetCoordinate(1));
   }

Generic function evaluation


Functors can be evaluated over large data sets using the template function hydra::Eval.

  • hydra::Eval will return a vector with the results.
  • hydra::Range specifies the input ranges
//allocate a vector [nentries] real numbers in the device (note the _d subscript )
RealVector_d angles_d(nentries);
// Generate angles uniformly in the range
// generation will be performed in the device
Generator.Uniform(0 , 2*PI, angles_d.begin(), angles_d.end() );

// A lambda to calculate sin(x)
auto sinL = [] __host__ __device__(double* x){ return sin(x[0]);};
auto sinW = wrap_lambda(sinL);

// A lambda to calculate cos(x)
auto cosL = [] __host__ __device__(double*  x){ return cos(x[0]);};
auto cosW = wrap_lambda(cosL);

// Evaluation
auto functors = thrust::make_tuple( sinW, cosW);
auto range    = make_range( angles_d.begin(), angles_d.end());
auto result   = Eval( functors, range);

Multidimensional PDF sampling


The template class hydra::Random manages the multidimensional function sampling in Hydra.

  • Generic PDF sampling using accept-reject method.
  • hydra::Random provides a number of basic 1D distributions: Uniform, Breit-Wigner…
  • Can be configured via template parameters (policies) to use different random number generators.
  • Methods take the target's container iterators as input and perform the generation on the host or device backend.
    //Random object with current time count as seed.
    Random<thrust::random::default_random_engine>
    Generator( std::chrono::system_clock::now().time_since_epoch().count() );
    //1D host buffer
    hydra::mc_host_vector<double>   data_h(nentries);
    //uniform
    Generator.Uniform(-5.0, 5.0, data_h.begin(), data_h.end());
    //gaussian
    Generator.Gauss(0.0, 1.0, data_h.begin(), data_h.end());
    //exponential
    Generator.Exp(1.0, data_h.begin(), data_h.end());
    //breit-wigner
    Generator.BreitWigner(2.0, 0.2, data_h.begin(), data_h.end());

Multidimensional PDF sampling


//------------------------------
// Two gaussians hit-and-miss
//------------------------------
// Gaussian 1
std::array<double, 2>  means1  ={2.0, 2.0 };
std::array<double, 2>  sigmas1 ={1.5, 0.5 };
Gauss<2> Gaussian1(means1, sigmas1 );

// Gaussian 2
std::array<double, 2>  means2  ={-2.0, -2.0 };
std::array<double, 2>  sigmas2 ={0.5, 1.5 };
Gauss<2> Gaussian2(means2, sigmas2 );

//add the pdfs
auto Gaussians = Gaussian1 + Gaussian2;

//2D range
std::array<double, 2>  min  ={-5.0, -5.0 };
std::array<double, 2>  max  ={ 5.0,  5.0 };

auto data_d = Generator.Sample<device>(Gaussians,min, max, ntrials );
auto data_h = Generator.Sample<host>(Gaussians,min, max, ntrials );

Multidimensional PDF sampling


HIST_2D_gaussians_DEVICE

HIST_2D_gaussians_HOST

  • Same algorithm and functor invoked on the device and on the host.
  • Iterable container is filled on the memory space of the execution backend.
  • To perform 10 million trials takes:
    • GeForce Titan-Z : 50 ms
    • Intel i7 4 cores @ 3.60GHz : 600 ms

Phase-Space Monte Carlo


Hydra supports the production of phase-space Monte Carlo samples. The generation is managed by the class hydra::PhaseSpace and the storage is managed by the specialized container hydra::Events.

  • Policies to configure the underlying random number engine and the backend used for the calculation.
  • No limitation on the number of final states.
  • Support the generation of sequential decays.
  • hydra::Events is iterable and therefore is fully compatible with C++11 iteration over range semantics.
  • Generation of weighted and unweighted samples.

The Hydra phase-space generator supersedes the library MCBooster, from the same developer.

Phase-Space Monte Carlo


// B0 -> J/psi K pi
Vector4R B0(5.27961, 0.0, 0.0, 0.0);
std::vector<double> 
massesB0{3.096916, 0.493677, 0.13957018 };

// PhaseSpace object for B0-> K pi J/psi
PhaseSpace<3> phsp(B0.mass(), massesB0);
// container
Events<3, device> Evts_d(10e7);
// generate...
phsp.Generate(B0, Evts_d.begin(), Evts_d.end() );

//copy events to the host
Events<3, host> Evts_h(Evts_d);

for(auto event: Evts_h)
{ /*fill histogram*/}

PHSP

Time to generate 10M events:

  • GeForce Titan-Z: 20 ms
  • Intel i7 4 cores @ 3.60 GHz: 413 ms

Multidimensional numerical integration


Hydra provides two MC based methods for multidimensional numerical integration: “Plain Monte Carlo” and self-adaptive Vegas-like algorithm (importance sampling).

  • Hydra implementations follow closely the corresponding GSL algorithms.
  • Methods can be configured via template parameters (policies) to call the integrand on the host or on the device and use different random number engines.
  • Both methods use RAII to acquire, initialize and release the resources.
  • Example of Vegas usage:
//Vegas-state hold the resources for performing the integration
VegasState<1> state =  VegasState<1>( min, max);
// tune the algorithm
state->SetAlpha(1.75);
state->SetIterations(5);
state->SetMaxError(1e-3);
//10,000 maximum function calls in maximum 5 iterations
Vegas<1> vegas( state, 10000);

Interface to Minuit2 and fitting


Hydra implements an interface to Minuit2 that parallelizes the FCN calculation. This accelerates dramatically the calculation over large datasets.

  • The fit parameters are represented by the class hydra::Parameter and managed by the class hydra::UserParameters.
  • hydra::UserParameters has the same semantics of Minuit2::MnUserParameters.
  • Any positive definite Hydra-functor can be converted into PDF.
  • The PDFs are normalized on-the-fly.
  • The estimator is a policy in the FCN.
  • Data is passed via iterators. Any iterable container can be used. The developer advises to use hydra::PointVector.

The FCN provided by Hydra can be used directly in Minuit2.

Interface to Minuit2 and data fitting


Parameter definition syntax is very intuitive:

// 1) using the Named Parameter Idiom
Parameter  mean  = Parameter::Create().Name("Mean").Value(3.0).Error(0.001).Limits( 1.0, 4.0);
Parameter  sigma = Parameter::Create().Name("Sigma").Value(0.5).Error(0.001).Limits(0.1, 1.5);

or

// 2) using unnamed parameter idiom
Parameter mean("Mean", 3.0, 0.001, 1.0, 4.0) ;
Parameter sigma("Sigma", 0.5, 0.001, 0.1, 1.5) ;

Pass the parameters to Hydra and Minuit2:

hydra::UserParameters upar;
upar.AddParameter(&mean);
upar.AddParameter(&sigma);

//Printing the registered parameters
upar.PrintParameters();

Interface to Minuit2 and data fitting


Defining the functor:

struct Gauss:public BaseFunctor<Gauss,double, 2>
{
Gauss(Parameter const& mean, Parameter const& sigma, GUInt_t position=0 ):
  BaseFunctor<Gauss,double,2>(),
  fPosition(position),
  fM(mean),
  fS(sigma)
  { RegistryParameters({&fM, &fS}); /* registry the parameters here */ }

// Note: Copy ctor and operator= definitions are mmited for brevity

  template<typename T>
    __host__ __device__    inline double Evaluate(T* x)
    {    return exp(-((x[fPosition] - fM) * (x[fPosition] - fM)) / (2 * fS * fS));}

    GUInt_t  fPosition;
    Parameter fM;
    Parameter fS;
};
// Associate the functor to a integrator and make a pdf.
auto Gauss_PDF   = make_pdf(Gauss(mean, sigma), integrator);

Interface to Minuit2 and data fitting


Complete example:

  • Model = Gaussian1 + Gaussian2 + Exponential
  • Unbinned maximum likelihood extended fit
  • Data: 10 Million events.
//Allocate memory to store data 
PointVector<device, double, 1> data_d(nentries);

//Fill data container...

//Get the FCN
auto modelFCN = make_loglikehood_fcn(model, data_d.begin(), data_d.end() );
//Strategy
MnStrategy strategy(1);
//Create the Migrad minimizer
MnMigrad migrad(modelFCN, upar.GetState(),  strategy);
//Perform the fit
FunctionMinimum minimum = migrad();

Interface to Minuit2 and data fitting


Fit

  • Black dots: Data points
  • Red line: Fit result
  • Blue shadowed area: events sampled from fit result.

Time spent performing the fit:

  • GeForce Titan-Z: 1.6 s
  • Intel i7 4 cores @ 3.60 GHz: 20 s

Testing


OpenMP backends (Virtual Machines):

CUDA

  • Fedora 23: kernel-4.7.7, gcc-5.3.1, cuda-8.0, arch Kepler (Titan Z)
  • Fedora 23: kernel-4.7.7, gcc-5.3.1, cuda-8.0, arch Quadro (K2200)
  • Fedora 23: kernel-4.7.7, gcc-5.3.1, cuda-8.0, arch Maxwell (GTX970)
  • Fedora 23: kernel-4.7.7, gcc-5.3.1, cuda-8.0, arch Maxwell (GTX980M)
  • SUSE: kernel-4.8.0(?) and gcc-5.4.0, cuda-8.0, arch Pascal (GTX1060)

Summary


Please, visit the page of the project, give a try, report bugs, make suggestions and if you like it give a star. Thanks !