Independent for each channel and pTV bin: how different is the correction in each channel and pTV bin? Motivation:
didn’t get good closure when inclusive; different phase spaces
More freedom in the fit to pull the associated systematic uncertainties, which are also per channel/pTV bin
Less constraints; not propagated to different channels/bins
Impact of uncertainties on signal extraction is small
Because of mjj window cut in SRs
Main effect is on V+HF CR; lower mjj cut? Yes, p17 simplified description: V+HF CR: 50 GeV < mjj < 250 GeV
One shape unc. per pTV/bin channel included in the final fit
Different samples for 2016 and 2017+2018; mismodelling in both? No, correction only for 2017+2018; well-known issue of MADGRAPH5aMC@NLO v2.3.3 using the FxFx merging scheme. For 2016 using LO sample.
What is the cause of the issue in MADGRAPH5aMC@NLO v2.3.3? Not sure; possibly merging of soft jets
How much V+LF background is still left in the SR? Dedicated V+LF CR; means quite substantial? No. Also, because b-tag score is used in the DNN → V+LF usually to the left of the distribution in the SR
Fitted distribution in V+LF CR: pTV (see p22)
P25-31 20 normalisation factors (NFs): why split in lepton flavours? Should expect the same modelling for ele and mu channels…
Main reason: improves goodness of fit; account for different phase spaces
Postfit values are the same within unc as expected
Constrain Wenu+HF and Wmunu+HF NFs from 1L; how propagate to 0L?
Taking the average in 0L; done by fitting simultaneously 2NFs in the 3 regions of 0 and 1L
Extrapolation unc. from 1L to 0L (i.e. nuisance parameter with a prior in 0L)? No; but experimental unc lepton eff / MET trigger provide degree of freedom that could be used by the fit but are not, i.e. there are no related pulls in the fit
p23 Plots show only 59.8 fb-1, from fit to full Run-2 dataset? Yes, fit is done in all the regions (~80) shown in p18/19 times 3 years; i.e. each region is separately for 2016, 2017, and 2018 and the fit is performed simultaneously to all the regions.
Plots are just examples; considering making all the regions and years public
So you also have year-dependent NFs? Yes. Are they consistent across the different years? Yes.
Motivation for splitting by year?
Different detector performance
Object calibrations provided individually per year? Yes.
Some simulated samples are different for the years (e.g. V+jets)
P23 HF DNN
One output node per process? Yes
But then fit all together? Yes
In addition to ttbar dominated bin in HF DNN still have dedicated ttbar CR? Yes
How much is the gain of having both? Not much.
This is 0L; in 1L even more dominated by ttbar, but wanted to be harmonized between channels
Why have separate regions for ele/mu channels in 1/2L? Improvement in goodness of fit. Also, different triggers, reco etc so having them separately seems more correct. But indeed don’t expect nor see any differences.
P24 How are the “large prior uncertainties” for the category migration derived? No formal derivation; chosen as large as possible w/o allowing to go negative at the 400 GeV bin boundary. Also, done such to mimic flat priors.
Why are those constraints needed? You have control regions; their constraints are not enough to give a proper pTV dependence? Did consider having NFs per pTV bin, but found to have discontinuities at the bin boundaries.
In ATLAS, this is achieved through pTV shape unc within each bin?
How avoid that two regions constrain a third etc? There are two linear pTV shape migration unc for four regions (see p25)
And for 250/400 GeV boundary? No split of the CRs. But still splitting the SR, so some extrapolation unc might be needed here…
P29 You correlate W+HF and Z+HF (per lepton sub-channel).
Why? Why not Z and W across 0/2 and 0/1L? Account for missing efficiency measurement of tagging algorithm; split in e/mu to account for phase space differences
Do you have extrapolation unc considering differences between W and Z? No.
Request: can you publish a pull plot for the most highly ranked NPs? Will be discussed.
Did you try an STXS-like diboson measurement? I.e. you only showed the inclusive mu cross-check, but did you also try POI decorrelation in diboson? Yes; did not find the trend found for VH.
P34 On the scale variations:
Decorrelated between processes, does that mean between V+jets flavour components? Yes, separately for V+LF, V+c etc
Correlated across all regions? Yes; also years.
NFs are not included here? No, considered in data stat.
Large impact from MC stat. Did you consider heavy-flavour filtered samples or truth tagging technique?
LO V+jets are filtered, more difficult for NLO
NLO samples are sliced in jet multiplicity etc
Also last round with LO samples, MC stat was the leading unc
This round also affected by reweighting technique
Truth tagging is being considered (detector note on GNN TT: CMS-DP-2022-051)
Why not split 1L into 0 and 1 add. Jet? No gain in exp. sensitivity.
In the resolved analysis, yes. For each of the SRs in p7 there are a low and high dR CR.
In the boosted analysis, we fit the large-R jet mass in the top CR.
That the pTV distribution is smooth only comes from the shape unc you apply? No, the CRs are also split in pTV, the same as the SR.
What you call 2 and 3J regions is the same as 0/1 add. Jet in CMS? Yes. Why did you not do the nJet split in the STXS? At the time, the priority was the pTV STXS; but indeed all ingredients are in there to do it.
P26 replying to a comment during the talk: CMS does have the postfit expected numbers on sigmaxBR. ⇒ would be nice to also make the significance public
How do you derive templates for nJet migrations? How do you model the CR-to-SR extrapolations and estimate the size?
Brief recap of strategy:
For every process we consider a list of variations: scale variations, PDF etc, but especially also alternative generator
So we study for each the effect on e.g. the 2/3J ratio; sum all the effects in quadrature → extrapolation unc.
Serves as a prior for a normalisation NP in the fit applied to the “weaker” region; e.g. for pTV the acc. NP is applied to the high pTV region assuming the low pTV region is driving the common NF
On the W+jets example, how do the nominal and alternative generator compare to data in the CR? Does their difference capture and data/nominal MC difference?
Sherpa does a very good job describing the data; MG is not the most up to date, but it is what we have. There’s no smoking gun telling us to get rid of MG.
The approach where we decorrelate the effects, i.e. have different NPs for nJet, pTV, CR/SR etc, avoids strong constraints from mismodelling in data and gives more degrees of freedom in the fit
For the main V+jets modelling, what is the correlation with the signal strength? Do you have any pulls? Don’t have correlations public, but on p39 in backup there’s the ranking plots. On the WH signal strength, the shape unc on W+jets encompassing everything except the pTV shape is the highest ranked NP and it is a bit pulled.
And you don’t pay in sensitivity because you still have some constraining power? No, we do pay in sensitivity. But it the price we are willing to pay to avoid random pulls on other NPs and to be able to describe the data accurately
The “W+jets R_BDT Generator” NP is all the shape and acc variations taken together? This is only the shape variation on the BDT score from comparing to the alternative generator.
Generally, the comparison to alternative generators gives the largest uncertainties, both in terms of shape and acc effects.
The SR selection is rather loose and close to the CR definition. Is it really necessary to have extrapolation unc? Did you see pulls on those NPs in the fit?
We think it is necessary, we do not want to (more or less blindly) trust our nominal MC
For sure we checked and investigated all the pulls and there may be some. Once you give this degree of freedom to the fit, it is straightforward to interpret a pull as an indication for addressing a particular mismodelling or whether the fit uses it to fix something else. Many CRs are not entirely pure in only one bkg, so there are also correlations.
Offline comment: in the ranking plots on p39 one can see several CR->SR extrapolations and some of them are being pulled
On the BDT reweighting method p12: are you applying a regression to get a smoother ratio? The approach itself gives a smoother variation. When just propagating the alternative generator through the analysis, the shape is not very smooth because the alternative sample size is limited, see blue points in the plot. The reweighting of the nominal generator results in the smoother distribution, see the black points. The weights are obtained from training two BDTs to separate the nominal and the alternative generators; the ratio of the resulting BDT scores then serves as weight.
We primarily smooth experimental unc.; especially those that change the events in a region, e.g. jet energy scale and resolution unc.
Sample sizes of the nominal and alternative generator - roughly the same? Advantage of MG is that it doesn’t have negative weights (LO generator). Sherpa is our main generator used for all the analyses in ATLAS. Also, have constantly requested extension since 2017; so the number of events is huge. Still the MG stat is not terrible; you can compare the error bands on p12.
P10 ttbar in 2L is fully data-driven; why use MC templates elsewhere?
In boosted, because there is no ttbar at high pTV
In the other resolved channels, because we don’t have corresponding CRs. The advantage of the 2L top e-mu CR is that it is kinematically identical to the SR; we checked and any extrapolation is at the %-level. In 0/1L, it is difficult to get such a pure CR that is not kinematically different.
P7 resolved-boosted transition: did you study whether a prioritisation strategy like done in CMS can gain you anything? For the current combination, choosing the pTV boundary for the transition meant to minimally change two existing analyses; furthermore, it simplifies the usage of truth tagging a lot. And using truth tagging to reduce the MC stat unc is quite important. The strategy may not be optimal, but still works pretty well: got 1 sigma sensitivity for pTV > 400 GeV.
ATLAS and CMS have similar sensitivity, even though ATLAS is using only a BDT and CMS a DNN. In CMS, the BDT didn’t perform very well. Are you planning to use a DNN in future?
Can’t answer. ;) But we think there are not so many event features that can be exploited that there is a major difference in the performance expected.
CMS uses a BDT in the boosted regime; ATLAS uses the large-R jet mass which is the most sensitive discriminating variable.