A. Augusto Alves Jr ( University of Cincinnati )
CERN Geneve, 31 October 2016
aalvesju@cern.ch |
Hydra is a header only templated C++ library to perform data analysis on massively parallel platforms. |
The main design features are:
The same source files written using Hydra components and standard C++ compiles on GPU or CPU just exchanging the extension from .cu to .cpp.
In Hydra most of the calculations are performed using function objects. Hydra add features, type information and interfaces to generic functors using the CRTP idiom.
//Gaussian with two parameters: mean and sigma
struct Gauss:public BaseFunctor<Gauss,double, 2>
{
//client need to implement the Evaluate(T* ) method for homogeneous data.
template<typename T>
__host__ __device__
inline double Evaluate(T* x){/* implementation */}
//or heterogeneous data. In this case <T> represents a variadic tuple.
template<typename T>
__host__ __device__
inline double Evaluate(T x){/* implementation */}
}
hydra::BaseFunctor<Func, ReturnType, NParams>
,
if the calculation is expensive or the functor will be called many times, the results of the
first call can be cached.
Hydra provides a lot “syntax sugar” to deal with function objects.
A
, B
and C
are Hydra functors, the code below is
completely legal.
//basic arithmetic operations
auto plus_functor = A + B; auto minus_functor = A - B;
auto multiply_functor = A * B; auto divide_functor = A/B;
//composition of basic operations
auto any_functor = (A - B)*(A + B)*(A/C);
// C(A,B) is represented by:
auto composed_functor = compose(C, A, B)
Lambda functions are a very precious C++ resources that allow the implementation of new functionalities on-the-fly. These closures can hold state, capture variables defined in the enclosing scope etc…
hydra::wrap_lambda()
:
double two = 2.0;
//define a simple lambda and capture "two" by value
auto my_lambda = [] __host__ __device__(double* x)
{ return two*sin(x[0]); };
//convert is into a Hydra functor
auto my_lamba_wrapped = wrap_lambda(my_lambda);
CUDA 8.0 supports lambda functions on device and host code.
hydra::PointVector
.
hydra::PointVector
is an iterable collection of hydra::Point
objects:
hydra::Point
represents multidimensional data points with coordinates and value,
error of the coordinates and error of the value.
hydra::Point
uses conditional base class members and methods injection in order to
save memory and stack size.
//two dimensional data set
PointVector<device, double, 2> data_d(1e6);
// fill data_d and get data from device and fill a ROOT 2D histogram
PointVector<host> data_h(data_d);
TH2D hist("hist", "my histogram", 100, min, max);
for(auto point : data_h )
{
std::cout << point << std::endl; // stream the points to std::out
hist.Fill(point.GetCoordinate(0), point.GetCoordinate(1));
}
Functors can be evaluated over large data sets using the template function hydra::Eval
.
hydra::Eval
will return a vector with the results.
hydra::Range
specifies the input ranges
//allocate a vector [nentries] real numbers in the device (note the _d subscript )
RealVector_d angles_d(nentries);
// Generate angles uniformly in the range
// generation will be performed in the device
Generator.Uniform(0 , 2*PI, angles_d.begin(), angles_d.end() );
// A lambda to calculate sin(x)
auto sinL = [] __host__ __device__(double* x){ return sin(x[0]);};
auto sinW = wrap_lambda(sinL);
// A lambda to calculate cos(x)
auto cosL = [] __host__ __device__(double* x){ return cos(x[0]);};
auto cosW = wrap_lambda(cosL);
// Evaluation
auto functors = thrust::make_tuple( sinW, cosW);
auto range = make_range( angles_d.begin(), angles_d.end());
auto result = Eval( functors, range);
The template class hydra::Random
manages the multidimensional function sampling in Hydra.
hydra::Random
provides a number of basic 1D distributions: Uniform, Breit-Wigner…
//Random object with current time count as seed.
Random<thrust::random::default_random_engine>
Generator( std::chrono::system_clock::now().time_since_epoch().count() );
//1D host buffer
hydra::mc_host_vector<double> data_h(nentries);
//uniform
Generator.Uniform(-5.0, 5.0, data_h.begin(), data_h.end());
//gaussian
Generator.Gauss(0.0, 1.0, data_h.begin(), data_h.end());
//exponential
Generator.Exp(1.0, data_h.begin(), data_h.end());
//breit-wigner
Generator.BreitWigner(2.0, 0.2, data_h.begin(), data_h.end());
//------------------------------
// Two gaussians hit-and-miss
//------------------------------
// Gaussian 1
std::array<double, 2> means1 ={2.0, 2.0 };
std::array<double, 2> sigmas1 ={1.5, 0.5 };
Gauss<2> Gaussian1(means1, sigmas1 );
// Gaussian 2
std::array<double, 2> means2 ={-2.0, -2.0 };
std::array<double, 2> sigmas2 ={0.5, 1.5 };
Gauss<2> Gaussian2(means2, sigmas2 );
//add the pdfs
auto Gaussians = Gaussian1 + Gaussian2;
//2D range
std::array<double, 2> min ={-5.0, -5.0 };
std::array<double, 2> max ={ 5.0, 5.0 };
auto data_d = Generator.Sample<device>(Gaussians,min, max, ntrials );
auto data_h = Generator.Sample<host>(Gaussians,min, max, ntrials );
|
|
Hydra supports the production of phase-space Monte Carlo samples. The generation is managed by
the class hydra::PhaseSpace
and the storage is managed by the specialized container hydra::Events
.
hydra::Events
is iterable and therefore is fully compatible with C++11 iteration over range semantics.
The Hydra phase-space generator supersedes the library MCBooster, from the same developer.
|
|
Time to generate 10M events:
Hydra provides two MC based methods for multidimensional numerical integration: “Plain Monte Carlo” and self-adaptive Vegas-like algorithm (importance sampling).
//Vegas-state hold the resources for performing the integration
VegasState<1> state = VegasState<1>( min, max);
// tune the algorithm
state->SetAlpha(1.75);
state->SetIterations(5);
state->SetMaxError(1e-3);
//10,000 maximum function calls in maximum 5 iterations
Vegas<1> vegas( state, 10000);
Hydra implements an interface to Minuit2 that parallelizes the FCN calculation. This accelerates dramatically the calculation over large datasets.
hydra::Parameter
and managed by the
class hydra::UserParameters
.
hydra::UserParameters
has the same semantics of Minuit2::MnUserParameters
.
hydra::PointVector
.
The FCN provided by Hydra can be used directly in Minuit2.
Parameter definition syntax is very intuitive:
// 1) using the Named Parameter Idiom
Parameter mean = Parameter::Create().Name("Mean").Value(3.0).Error(0.001).Limits( 1.0, 4.0);
Parameter sigma = Parameter::Create().Name("Sigma").Value(0.5).Error(0.001).Limits(0.1, 1.5);
or
// 2) using unnamed parameter idiom
Parameter mean("Mean", 3.0, 0.001, 1.0, 4.0) ;
Parameter sigma("Sigma", 0.5, 0.001, 0.1, 1.5) ;
Pass the parameters to Hydra and Minuit2:
hydra::UserParameters upar;
upar.AddParameter(&mean);
upar.AddParameter(&sigma);
//Printing the registered parameters
upar.PrintParameters();
Defining the functor:
struct Gauss:public BaseFunctor<Gauss,double, 2>
{
Gauss(Parameter const& mean, Parameter const& sigma, GUInt_t position=0 ):
BaseFunctor<Gauss,double,2>(),
fPosition(position),
fM(mean),
fS(sigma)
{ RegistryParameters({&fM, &fS}); /* registry the parameters here */ }
// Note: Copy ctor and operator= definitions are mmited for brevity
template<typename T>
__host__ __device__ inline double Evaluate(T* x)
{ return exp(-((x[fPosition] - fM) * (x[fPosition] - fM)) / (2 * fS * fS));}
GUInt_t fPosition;
Parameter fM;
Parameter fS;
};
// Associate the functor to a integrator and make a pdf.
auto Gauss_PDF = make_pdf(Gauss(mean, sigma), integrator);
Complete example:
//Allocate memory to store data
PointVector<device, double, 1> data_d(nentries);
//Fill data container...
//Get the FCN
auto modelFCN = make_loglikehood_fcn(model, data_d.begin(), data_d.end() );
//Strategy
MnStrategy strategy(1);
//Create the Migrad minimizer
MnMigrad migrad(modelFCN, upar.GetState(), strategy);
//Perform the fit
FunctionMinimum minimum = migrad();
|
|
Time spent performing the fit:
OpenMP backends (Virtual Machines):
CUDA
Hydra development is supported by NSF. The project is hosted on GitHub:
https://github.com/MultithreadCorner/Hydra
https://multithreadcorner.github.io/Hydra/
https://github.com/AAAlvesJr/Hydra
The package includes a suite of examples.
The next version will expand the range of options for data fitting, and implement convolution and histograming-related functionalities.
Some algorithms are being re-optimized as well.
Please, visit the page of the project, give a try, report bugs, make suggestions and if you like it give a star. Thanks !