This exercise will get you started on using TMVA.
TMVA is the Toolkit for Multivariate Data Analysis with ROOT (more information http://tmva.sourceforge.net/) and it is a powerful tool to perform multivariate analysis.
This is a short introduction to the standard functionality with some objectives at the end.
from ROOT import TH1D,TH2D,TFile,TTree,TCanvas,TCut,TMVA
Let's read in a ROOT file which contains a signal and background tree with some variables var1, var2, var3, var4
.
fin = TFile("toy_sigbkg.root")
signal = fin.Get("TreeS")
background = fin.Get("TreeB")
We create a canvas and histograms to plot two of the variables for signal (red) and background (blue).
c = TCanvas("c", "c", 400, 400)
sigvar1 = TH1D("sigvar1","",20,-5,5)
bkgvar1 = TH1D("bkgvar1","",20,-5,5)
signal.Draw("var1>>sigvar1")
background.Draw("var1>>bkgvar1","","SAME")
# draw the signal events in red
sigvar1.SetLineColor(2)
sigvar1.Scale(1/sigvar1.Integral())
# draw the background events in blue
bkgvar1.SetLineColor(4)
bkgvar1.Scale(1/bkgvar1.Integral())
c.Draw()
sigvar2 = TH1D("sigvar2","",20,-5,5)
bkgvar2 = TH1D("bkgvar2","",20,-5,5)
signal.Draw("var2>>sigvar2")
background.Draw("var2>>bkgvar2","","SAME")
# draw the signal events in red and normalize
sigvar2.SetLineColor(2)
sigvar2.Scale(1/sigvar2.Integral())
# draw the background events in blue
bkgvar2.SetLineColor(4)
bkgvar2.Scale(1/bkgvar2.Integral())
c.Draw()
There is some clear separation between signal and background, but it is not easy to cut on any value.
And that is not all, the variables can be also correlated, lets plot them also in a 2D plot.
# draw an empty 2D histogram for the axes
hsig = TH2D("histosig","",1,-5,5,1,-5,5)
hbkg = TH2D("histobkg","",1,-5,5,1,-5,5)
sigCut = TCut("")
bgCut = TCut("")
signal.SetMarkerColor(2)
signal.Draw("var1:var2>>hsig") #,sigCut,"same")
background.SetMarkerColor(4)
background.Draw("var1:var2>>hbkg","","SAME") #,bgCut,"same")
c.Draw()
There is clearly a stron correlation for the background events.
You can see, that just by looking at the two variables, it is hard to separate signal from background, there is no clear cut to apply. So maybe we should use a multivariate technique to combine them in order to find a better cut value.
We will use TMVA for this, first we need to create the TMVA factory, the basic component of TMVA for the training of multivariate methods.
TMVA.Tools.Instance()
# note that it seems to be mandatory to have an
# output file, just passing None to TMVA::Factory(..)
# does not work. Make sure you don't overwrite an
# existing file.
fout = TFile("TMVAResults.root","RECREATE")
factory = TMVA.Factory("TMVAClassification", fout,
":".join([
"!V",
"!Silent",
"!Color",
"!DrawProgressBar",
"Transformations=I;D;P;G,D",
"AnalysisType=Classification"]
))
Add the variables and trees to the factory. TMVA will handle them completely automatically.
factory.AddVariable("var1","F")
factory.AddVariable("var2","F")
factory.AddVariable("var3","F")
factory.AddVariable("var4","F")
factory.AddSignalTree(signal)
factory.AddBackgroundTree(background)
A preparation function defines how many events should be used for training and how the sample is split, what cuts should be applied and the normalization method.
factory.PrepareTrainingAndTestTree(sigCut,bgCut,
"nTrain_Signal=1000:nTrain_Background=1000:SplitMode=Random:NormMode=NumEvents:!V" )
Now we book the multivariate method to train. In this case we are training a BDT and we specify the settings:
methodBDT = factory.BookMethod(TMVA.Types.kBDT, "BDT",
":".join([
"!H",
"!V",
"NTrees=850",
"nEventsMin=150",
"MaxDepth=3",
"BoostType=AdaBoost",
"AdaBoostBeta=0.5",
"SeparationType=GiniIndex",
"nCuts=20",
"PruneMethod=NoPruning",
]))
Everything is setup, now train, test and evaluate all methods (the BDT in this case).
factory.TrainAllMethods()
factory.TestAllMethods()
factory.EvaluateAllMethods()
Great we have fully trained, tested and evaluated a BDT, with just a few lines of code. We got already a lot of useful and interesting output and TMVA has a GUI to look at the results. However you can also manually read and plot everything.
Let's plot the results of the training, we need a new instance and the TMVA Reader for this:
from ROOT import TH1D,TH2D,TFile,TTree,TCanvas,TCut,TMVA
TMVA.Tools.Instance()
reader = TMVA.Reader()
import array
var1= array.array('f',[0])
var2= array.array('f',[0])
var3= array.array('f',[0])
var4= array.array('f',[0])
reader.AddVariable("var1",var1)
reader.AddVariable("var2",var2)
reader.AddVariable("var3",var3)
reader.AddVariable("var4",var4)
reader.BookMVA("BDT","weights/TMVAClassification_BDT.weights.xml")
First, we want to plot the output distributions for signal and backround, so lets create two histograms and fill them.
hsig_bdt = TH1D("hsig_bdt","",20,-1,1)
hbkg_bdt = TH1D("hbgk_bdt","",20,-1,1)
for i in signal:
var1[0] = signal.var1
var2[0] = signal.var2
var3[0] = signal.var3
var4[0] = signal.var4
hsig_bdt.Fill(reader.EvaluateMVA("BDT"))
for i in background:
var1[0] = background.var1
var2[0] = background.var2
var3[0] = background.var3
var4[0] = background.var4
hbkg_bdt.Fill(reader.EvaluateMVA("BDT"))
Give them color and draw:
hbkg_bdt.SetLineColor(4)
hsig_bdt.SetLineColor(2)
hbkg_bdt.Draw()
hsig_bdt.Draw("SAME")
c.Draw()
Ok, this way we can much better separate signal from background and define a cut value! To better understand what is going and the correlations of the two variables, we can plot the BDT output dependent on the two variables in a 2D plot
# create a new 2D histogram with fine binning
histo2 = TH2D("histo2","",200,-5,5,200,-5,5)
# loop over the bins of a 2D histogram
for i in range(1,histo2.GetNbinsX() + 1):
for j in range(1,histo2.GetNbinsY() + 1):
# find the bin center coordinates
var1[0] = histo2.GetXaxis().GetBinCenter(i)
var2[0] = histo2.GetYaxis().GetBinCenter(j)
# calculate the value of the classifier
# function at the given coordinate
bdtOutput = reader.EvaluateMVA("BDT")
# set the bin content equal to the classifier output
histo2.SetBinContent(i,j,bdtOutput)
c2 = TCanvas("c2", "c2", 800, 400)
c2.Divide(2,1)
c2.cd(1)
histo2.Draw("colz")
c2.Draw()
Ok, now you should know enough to be able to play with TMVA and understand what you are doing.
cp -r $ROOTSYS/tutorials/tmva mytmva
TMVAClassification.C
Use[XXX]
for the different MVA methods.Likelihood
, MLP
and BDT
.root TMVAClassification.C
TMVA::TMVAGui("TMVA.root")
should open the GUI with the file again, however this depends a bit on the ROOT version.var1
and var2
as in Exercise3?