• No split in years? Yes
  • In the CRs you only fit the normalisation?
    • In the resolved analysis, yes. For each of the SRs in p7 there are a low and high dR CR.
    • In the boosted analysis, we fit the large-R jet mass in the top CR.
  • That the pTV distribution is smooth only comes from the shape unc you apply? No, the CRs are also split in pTV, the same as the SR.
  • What you call 2 and 3J regions is the same as 0/1 add. Jet in CMS? Yes. Why did you not do the nJet split in the STXS? At the time, the priority was the pTV STXS; but indeed all ingredients are in there to do it.
  • P26 replying to a comment during the talk: CMS does have the postfit expected numbers on sigmaxBR. ⇒ would be nice to also make the significance public
  • How do you derive templates for nJet migrations? How do you model the CR-to-SR extrapolations and estimate the size?
    • Brief recap of strategy: 
      • For every process we consider a list of variations: scale variations, PDF etc, but especially also alternative generator
      • So we study for each the effect on e.g. the 2/3J ratio; sum all the effects in quadrature → extrapolation unc.
      • Serves as a prior for a normalisation NP in the fit applied to the “weaker” region; e.g. for pTV the acc. NP is applied to the high pTV region assuming the low pTV region is driving the common NF
  • On the W+jets example, how do the nominal and alternative generator compare to data in the CR? Does their difference capture and data/nominal MC difference?
    • Sherpa does a very good job describing the data; MG is not the most up to date, but it is what we have. There’s no smoking gun telling us to get rid of MG.
    • The approach where we decorrelate the effects, i.e. have different NPs for nJet, pTV, CR/SR etc, avoids strong constraints from mismodelling in data and gives more degrees of freedom in the fit
  • For the main V+jets modelling, what is the correlation with the signal strength? Do you have any pulls? Don’t have correlations public, but on p39 in backup there’s the ranking plots. On the WH signal strength, the shape unc on W+jets encompassing everything except the pTV shape is the highest ranked NP and it is a bit pulled.
    • And you don’t pay in sensitivity because you still have some constraining power? No, we do pay in sensitivity. But it the price we are willing to pay to avoid random pulls on other NPs and to be able to describe the data accurately
    • The “W+jets R_BDT Generator” NP is all the shape and acc variations taken together? This is only the shape variation on the BDT score from comparing to the alternative generator. 
  • Generally, the comparison to alternative generators gives the largest uncertainties, both in terms of shape and acc effects.
  • The SR selection is rather loose and close to the CR definition. Is it really necessary to have extrapolation unc? Did you see pulls on those NPs in the fit?
    • We think it is necessary, we do not want to (more or less blindly) trust our nominal MC
    • For sure we checked and investigated all the pulls and there may be some. Once you give this degree of freedom to the fit, it is straightforward to interpret a pull as an indication for addressing a particular mismodelling or whether the fit uses it to fix something else. Many CRs are not entirely pure in only one bkg, so there are also correlations. 
    • Offline comment: in the ranking plots on p39 one can see several CR->SR extrapolations and some of them are being pulled
  • On the BDT reweighting method p12: are you applying a regression to get a smoother ratio? The approach itself gives a smoother variation. When just propagating the alternative generator through the analysis, the shape is not very smooth because the alternative sample size is limited, see blue points in the plot. The reweighting of the nominal generator results in the smoother distribution, see the black points. The weights are obtained from training two BDTs to separate the nominal and the alternative generators; the ratio of the resulting BDT scores then serves as weight. 
    • We primarily smooth experimental unc.; especially those that change the events in a region, e.g. jet energy scale and resolution unc.
  • Sample sizes of the nominal and alternative generator - roughly the same? Advantage of MG is that it doesn’t have negative weights (LO generator). Sherpa is our main generator used for all the analyses in ATLAS. Also, have constantly requested extension since 2017; so the number of events is huge. Still the MG stat is not terrible; you can compare the error bands on p12.
  • P10 ttbar in 2L is fully data-driven; why use MC templates elsewhere?
    • In boosted, because there is no ttbar at high pTV
    • In the other resolved channels, because we don’t have corresponding CRs. The advantage of the 2L top e-mu CR is that it is kinematically identical to the SR; we checked and any extrapolation is at the %-level. In 0/1L, it is difficult to get such a pure CR that is not kinematically different. 
  • P7 resolved-boosted transition: did you study whether a prioritisation strategy like done in CMS can gain you anything? For the current combination, choosing the pTV boundary for the transition meant to minimally change two existing analyses; furthermore, it simplifies the usage of truth tagging a lot. And using truth tagging to reduce the MC stat unc is quite important. The strategy may not be optimal, but still works pretty well: got 1 sigma sensitivity for pTV > 400 GeV.
  • ATLAS and CMS have similar sensitivity, even though ATLAS is using only a BDT and CMS a DNN. In CMS, the BDT didn’t perform very well. Are you planning to use a DNN in future? 
    • Can’t answer. ;) But we think there are not so many event features that can be exploited that there is a major difference in the performance expected. 
    • CMS uses a BDT in the boosted regime; ATLAS uses the large-R jet mass which is the most sensitive discriminating variable.