A. Augusto Alves Jr
University of Cincinnati
aalvesju@cern.ch |
Hydra is a header only templated C++ library to perform data analysis on massively parallel platforms. |
|
The source files written using Hydra components and standard C++ compiles for GPU, CPU, Xeon Phi…
In Hydra, all the calculations are performed using function objects. Hydra adds features, type information and interfaces to generic functors using the CRTP idiom.
//Gaussian with two parameters: mean and sigma
struct Gauss:public BaseFunctor<Gauss,double, 2>
{
//client need to implement the Evaluate(T* ) method for homogeneous data.
template<typename T>
__host__ __device__
inline double Evaluate(T* x){/* implementation */}
//or heterogeneous data. In this case <T> represents a variadic tuple.
template<typename T>
__host__ __device__
inline double Evaluate(T x){/* implementation */}
}
hydra::BaseFunctor<Func, ReturnType, NParams>
,
if the calculation is expensive or the functor will be called many times, the results of the
first call can be cached.
Hydra provides a lot “syntax sugar” to deal with function objects.
A
, B
and C
are Hydra functors, the code below is
completely legal.
//basic arithmetic operations
auto plus_functor = A + B; auto minus_functor = A - B;
auto multiply_functor = A * B; auto divide_functor = A/B;
//composition of basic operations
auto any_functor = (A - B)*(A + B)*(A/C);
// C(A,B) is represented by:
auto composed_functor = compose(C, A, B)
hydra::wrap_lambda()
:
double two = 2.0;
//define a simple lambda and capture "two" by value
auto my_lambda = [] __host__ __device__(double* x)
{ return two*sin(x[0]); };
//convert is into a Hydra functor
auto my_lamba_wrapped = wrap_lambda(my_lambda);
CUDA 8.0 supports lambda functions in device and host code.
hydra::PointVector
is an iterable collection of hydra::Point
objects:
hydra::Point
represents multidimensional data points with coordinates and value,
error of the coordinates and error of the value(usage mandatory for data fitting).
//two dimensional data set
PointVector<device, double, 2> data_d(1e6);
// fill data_d and get data from device and fill a ROOT 2D histogram
PointVector<host> data_h(data_d);
TH2D hist("hist", "my histogram", 100, min, max);
for(auto point : data_h )
{
std::cout << point << std::endl; // stream the points to std::out
hist.Fill(point.GetCoordinate(0), point.GetCoordinate(1));
}
This class allows to store in memory a table with columns of different types.
The table can be accessed line-by-line or element-by-element, in a given column.
It is basicaly a SoA generic container with limitations to the type and number of columns.
Interface mimics std::vector and thrust::host(device)_vector
//each line is modeled as a tuple
typedef thrust::tuple<unsigned int, float, double> line_t;
//prototype like it was a vector of lines
typedef thrust::host_vector<line_t> prototype_h;
typedef thrust::device_vector<line_t> prototype_d;
//the table
typedef hydra::experimental::multivector<prototype_h> table_h;
typedef hydra::experimental::multivector<prototype_d> table_d;
//so use that...
table_d table;
table.resize(10);
table.push_back(thrust::make_tuple( 1, 2.0 , 3.0));
Users can manage collections of objects using the macro
_DeclareStorable(class_name, args...)
.
Objects will get stored as a
hydra::multivector
and individual objects can be accessed without cache misses and stored without padding.
Example:
class Vector4R
{
public:
// ctors and public class methods etc
private:
GReal_t v[4]; //<< four-vector components are stored here
_DeclareStorable(Vector4R, v[0], v[1] , v[2], v[3])
};
typedef thrust::host_vector<Vector4R::args_type> prototype_t;
//--------------------------^^^^^^^^^^^^^^^^^^^--------------
typedef multivector<prototype_t> vector_particles;
//_______________________________________________________
//exemple of usage: an array of daughter particle vectors.
vector_particles fDaughters;
fDaughters.push_back( Vector4R(5.279, 0, 0, 0) );
Deployed in several classes and algorithms already:
Functors can be evaluated over large data sets using the template function hydra::eval
.
hydra::eval(Iterarors ...itors)
will return a vector of tuples with the results.
//allocate a vector [nentries] real numbers in the device
RealVector_d angles_d(nentries);
// Generate angles uniformly in the range generation will be performed in the device
Generator.Uniform(0 , 2*PI, angles_d.begin(), angles_d.end() );
// A lambda to calculate sin(x) and cos(x)
auto sinL = [] __host__ __device__(double* x){ return sin(x[0]);};
auto sinW = wrap_lambda(sinL);
auto cosL = [] __host__ __device__(double* x){ return cos(x[0]);};
auto cosW = wrap_lambda(cosL);
// Evaluation
auto functors = thrust::make_tuple( sinW, cosW);
auto range = make_range( angles_d.begin(), angles_d.end());
auto result = eval( functors, range);
The template class hydra::Random
manages the multidimensional PDF sampling in Hydra.
hydra::Random
also provides some basic 1D distributions: Uniform, Gaussian, Breit-Wigner…
//Random object with current time count as seed.
Random<thrust::random::default_random_engine>
Generator( std::chrono::system_clock::now().time_since_epoch().count() );
//1D host buffer
hydra::mc_host_vector<double> data_h(nentries);
//Uniform
Generator.Uniform(-5.0, 5.0, data_h.begin(), data_h.end());
//Gaussian
Generator.Gauss(0.0, 1.0, data_h.begin(), data_h.end());
//Exponential
Generator.Exp(1.0, data_h.begin(), data_h.end());
//Breit-Wigner
Generator.BreitWigner(2.0, 0.2, data_h.begin(), data_h.end());
//------------------------------
// Two gaussians hit-and-miss
//------------------------------
// Gaussian 1
std::array<double, 2> means1 ={2.0, 2.0 };
std::array<double, 2> sigmas1 ={1.5, 0.5 };
Gauss<2> Gaussian1(means1, sigmas1 );
// Gaussian 2
std::array<double, 2> means2 ={-2.0, -2.0 };
std::array<double, 2> sigmas2 ={0.5, 1.5 };
Gauss<2> Gaussian2(means2, sigmas2 );
//add the pdfs
auto Gaussians = Gaussian1 + Gaussian2;
//2D range
std::array<double, 2> min ={-5.0, -5.0 };
std::array<double, 2> max ={ 5.0, 5.0 };
auto data_d = Generator.Sample<device>(Gaussians,min, max, ntrials );
auto data_h = Generator.Sample<host>(Gaussians,min, max, ntrials );
|
|
Hydra supports the production of phase-space Monte Carlo samples. The generation is managed by
the class hydra::PhaseSpace
and the storage is managed by the specialized container hydra::Events
.
hydra::Events
is iterable and therefore is fully compatible with C++11 iteration over range semantics.
The Hydra phase-space generator supersedes the library MCBooster, from the same developer.
|
|
Time to generate 10M events:
Hydra implements some popular methods for one- and multi-dimensional numerical integration:
The upcoming release implements 4 quadrature methods:
General strategy is to parallelize the function calls.
Hydra implements an interface to Minuit2 that parallelizes the FCN calculation. This accelerates dramatically the calculation over large datasets.
hydra::Parameter
and managed by the
class hydra::UserParameters
.
hydra::UserParameters
has the same semantics of Minuit2::MnUserParameters
.
hydra::PointVector
.
The FCN provided by Hydra should be used directly in Minuit2.
Parameter definition syntax is very intuitive:
// 1) using the Named Parameter Idiom
Parameter mean = Parameter::Create().Value(3.0).Name("Mean").Error(0.001).Limits( 1.0, 4.0);
Parameter sigma = Parameter::Create().Name("Sigma").Value(0.5).Error(0.001).Limits(0.1, 1.5);
or
// 2) using unnamed parameter idiom
Parameter mean("Mean", 3.0, 0.001, 1.0, 4.0) ;
Parameter sigma("Sigma", 0.5, 0.001, 0.1, 1.5) ;
Pass the parameters to Hydra and Minuit2:
hydra::UserParameters upar;
upar.AddParameter(&mean);
upar.AddParameter(&sigma);
//Printing the registered parameters
upar.PrintParameters();
Defining the functor (current implementation):
struct Gauss:public BaseFunctor<Gauss,double, 2>
{
Gauss(Parameter const& mean, Parameter const& sigma, GUInt_t position=0 ):
BaseFunctor<Gauss,double,2>()
fPosition(position),
fM(mean),//<--- not necessary anymore
fS(sigma)//<--- to declare a member for each parameter
{RegistryParameters({&fM, &fS}); /* <-- explicit registration not necessary */ }
// Note: Copy ctor and operator= definitions are mmited for brevity
template<typename T>
__host__ __device__ inline double Evaluate(T* x)
{ return exp(-((x[fPosition] - fM) * (x[fPosition] - fM)) / (2 * fS * fS));}
GUInt_t fPosition;
Parameter fM;//<--- not necessary anymore
Parameter fS;//<--- not necessary anymore
};
Defining the functor (new implementation):
struct Gauss:public BaseFunctor<Gauss,double, 2>
{
Gauss(Parameter const& mean, Parameter const& sigma, unsigned position=0 ):
BaseFunctor<Gauss,double,2>({mean, sigma}), //registry parameters here
fPosition(position)
{ }
// Note: Copy ctor and operator= definitions are mmited for brevity
template<typename T>
__host__ __device__ inline double Evaluate(T* x)
{
// _par[i] <--- access the ith parameter
double y = exp(-((x[fPosition]-_par[0])*(x[fPosition]-_par[0]))/(2*_par[1]*_par[1]))
return y ;
}
unsigned fPosition;
};
Complete example:
//Allocate memory to store data
PointVector<device, double, 1> data_d(nentries);
//Fill data container...
//Get the FCN
auto modelFCN = make_loglikehood_fcn(model, data_d.begin(), data_d.end() );
//Strategy
MnStrategy strategy(1);
//Create the Migrad minimizer
MnMigrad migrad(modelFCN, upar.GetState(), strategy);
//Perform the fit
FunctionMinimum minimum = migrad();
|
|
Time spent performing the fit:
OpenMP and TBB backends (x86_64 Virtual Machines):
CUDA
At design/implementation phase :
Testing phase (to be in the next release):
Hydra development is supported by NSF. The project is hosted on GitHub:
https://github.com/MultithreadCorner/Hydra
https://multithreadcorner.github.io/Hydra/
https://github.com/AAAlvesJr/Hydra
The package includes a suite of examples.
Please, visit the page of the project, give a try, report bugs, make suggestions and if you like it give a star. Thanks !