August 2, 2021
BOOST Conference
"This text is being provided in a rough draft format.
Communication Access Realtime Translation (CART) is provided in
order to facilitate communication accessibility and may not be a
totally verbatim record of the proceedings."
>> Hey, Andrew, can you hear me still?
>> Yep.
>> As usual, we're getting exponential increase in
participants. So maybe wait a minute.
>> I think we can get started slowly. And people can join
during the introduction.
Just so that we don't start running late already.
Okay.
>> I'll start recording.
>> Okay. You should be able to -- okay.
I guess I can't see everyone now. Good. Okay. So to
everyone who is already connected and those joining us, as I go
through these slides, welcome enthusiastically to BOOST 2021.
It is online again this year. I look forward to seeing
everyone in person again, hopefully next year. But we hope that
even though we're online, we have a very enjoyable few days of
thinking about physics ahead of us.
Okay. The conference is online this year and we've changed
the format a little bit. It's quite different, actually, than
previous BOOSTs. The major change is that all the plenary talks
have been prerecorded and they're all already available and they
have been, most of them, for almost a week now.
You can watch them at your leisure. The plenary talks will
have already been watched by you, the participants, and the live
periods of the conference are devoted to discussion sessions on
the results themselves.
But we won't go through the prerecorded videos at length.
Instead, we'll have small reminders of the most salient aspects of
the different results that we're going to talk about.
This is a very good pro tip, if you haven't watched the
videos yet. You can watch them at higher speed in a video
player. 2X speed is maybe a little fast for me. But even 1.25
or 1.5 might mean you can watch a couple of extra videos and
participate more effectively in the discussion.
I want to emphasize that the format probably won't work if
people haven't watched the videos, so we encourage you to talk a
look and put discussion points into the Google doc ahead of time.
We have 29 excellent talks which cover a really wide variety
of subjects on CDS already.
There is experimental representation from ATLAS, CMS, LHCb,
and STAR this year.
We have, in the end, over 7 hours of content. It's hard to
watch it at the last minute. Hopefully, you've already seen some
of the videos.
I mentioned the schedule that we plan on this week. Today
we have three live summary talks. One from the theory side, one
from the experimental side and then the recent summary for end
all jets.
And then we'll have our first discussion related to ML
results and pilot mitigation, one for jet tagging.
The next few days we will have discussion sessions about
various topics. On Thursday, the end of the day, we'll close the
conference out with a panel discussion.
We have a great panel, and you can put questions or topics
that you would like to discuss in the Google doc and I think the
idea is we want to talk about where we want to BOOST to next.
Either in the next year or in the future as we think about
future colliders, the FCC, the EIC and things like this.
I guess it's also up to the panel to come up with the BOOST
catch phrase for this year. But I think that task is in good
hands.
We wanted to make a couple of comments about the videos and
some of the things that we've been trying to make sure were done
for the conference. All of the prerecorded videos have been
captioned. You can enable the captions, they're not turned on by
default on the video player by clicking this button and turning
on the English subtitles slider.
The live event will also be captioned. We have one of our
awesome captioners already connected. You may see the subtitles
on your screen. You may need to turn them on with the live
transcript in Zoom. It's also possible to follow the transcript
in a separate link and you can click that.
One of the reasons that we changed the format is to try to
make things more accessible. We have a lot of collaborators
around the world and we are in a very, sort of, Europe, US
centric time zone with the live sessions. So it was our hope,
partially, for people who are in Asia or Australia or otherwise
just can't connect for full days of Zoom meetings, the
prerecorded talks might give them a chance to see some of the
things that we're talking about and asynchronously participate in
the discussion. Which is not the best, but maybe it's better
than other formats and it's something we wanted to try.
Okay.
I've mentioned that we are collecting points for discussion
in a Google doc. So I won't repeat myself here. You can find
the link to it on the Indico page. You have to be signed in to
see the page.
If you don't leave your name in the Google doc, the chairs
in the sessions can't call on you. We encourage you to leave
your name but it's not mandatory. During the discussion
sessions, we don't want to be too formal about them. We
encourage you to turn your video on if you want. You don't have
to. Some people think it's nice and some people can't. And
that's okay, too.
Yes, the Zoom information is on Indico. I assume that you
found it if you're listening to me now. If you want to ask a
question, raise your hand and we'll call on you or the chair of
the session will call on you.
Breaks will take place ton gather town. You can find a link
to that on the Indico agenda. I put a link here to the Slack
from last year. If you click you can join it. That might be
useful if you're trying to get in touch with someone or you just
want a chat room for something.
Now, the gather town is really cool. I was checking it out
before the conference. We have it open basically 24/7 right now.
There's lots of stuff that you can do. There are games for
relaxing and private spaces for arguing. So I'm sure people will
do both of those quite a bit. Feel free to join in and mingle
and socialize as much as you want.
We really want to thank Anna Benecke for her help in making
the space really, really fun. So thank you, very much, Anna.
The last thing we wanted -- the second to last thing we
wanted to mention is the BOOST community values are linked from
the conference webpage. We strongly encourage, we expect that
all participants have read the community values. It's important
that we BOOST each other and be, provide great guidelines that
it's important everyone follows to make sure we do.
If any issues come up during the conference, which you feel,
well, if you feel there is an issue, particularly if they go
against any of the community values that are written on this
page, please feel free and you should contact Ayana and Jessie
and the local committee. Whoever you feel most comfortable
contacting and we'll do our best to address it .P.
The last thing that I wanted to say is a huge, huge thank
you to several people. First thanks is to Connie who is part of
the local organizing committee and she has been extremely helpful
trying to organize all of the behind-the-scenes things for the
conference. There are more than you think for an online event.
We really want to thank the IAC who provided a lot of advice
and feedback as we were putting together the program for this
week and provided quite a bit of financial support for the
conference.
And finally, we want to thank CERN, particularly Joachim the
director and the financial team. Finally, we want to thank all
of you, the participants, especially for your hard work reviewing
contributions already for the conference.
Okay. That's all I had. The last thing to do is to BOOST,
maybe you're BOOSTing from home or BOOSTing from some place that
looks like this around CERN this week.
If there are any questions for myself or any other members
of the LOC or any other things that people want to ask right now,
I'm sure we would be happy to answer them.
Otherwise, I will hand things over to our first speaker.
I don't know if I can see if hands are up. Yes, I can.
>> Siri had a question, sorry.
Okay. I don't see any raised hands. So Wouter, take it
away.
>> Let me make sure everything works as required.
Okay. Good afternoon. It's a pleasure, well at least good
afternoon for me and people in this time zone, good morning for
those in the US or wherever you may find yourself.
It's a pleasure to give this theory introduction at BOOST.
When I was asked to give this talk, I was told I had cart blanch
to talk about whatever I thought was interesting and relevant to
the BOOST community. This is a personal selection of some theory
developments.
I will try to keep it not technical by instead emphasize the
ideas. What is it. How does it work. And most importantly, why
should you care about these developments.
Here is an overview of some of the topics that will come
back. Track based measurements. Precision based measurements.
Anti-KT jet functions at NLLO. Non-perturbative effects and spin
correlations.
I'm a theorist and sometimes our view of reality is more
like this. But for the experimentalists who are here, they have
to deal with other complications and I'll try to bridge this gap.
But there should be plenty of time for discussion at the end
and I encourage that as well. The very first topic I want to
talk about is calculations for track-based measurements.
The motivation for doing track-based measurements is
twofold. First of all, you have much better angular resolution.
Otherwise, you're limited by the calorimeter cell size. You can
remove pile up and identify the vertex where it comes from as you
can see in the picture here.
To maybe make this more complete, you can see a measurement
of the jet mass from ATLAS. And to be precise, it's not the jet
mass, it's the jet mass divided by the PT of the jet and then
take the logarithm. Then you see all particles using tracking.
First of all, you see here large uncertainty of the data.
That's the gray bands here. That is mostly coming from pileup
because the groom that has been used is not very aggressive. If
you pick a smaller -- you more aggressively see this less.
The other region, away from the region where you have a lot
of pileup, if you look down here, also here there's a large
uncertainty, if you use all particles versus using tracking.
That is something that you can gain by using tracking.
Well, so you may want to do a track-based measurement but can we
theorists handle that. They're the challenges where if you try
to do a calculation for a track-based measurement you run into
divergences. So you do calculations of Quarks and gluons. Take
into account Hadronization effects. This is not too complicated
in principle because we've done this before.
We talk about parton distribution functions and
fragmentation functions something similar happens. A partonic
calculation contains divergences and that are absorbed into a
non-perturbative object and in this case that is a track
function.
To give you intuition of the track function, it's Z of an
initial parton that is converted to charged particles. It
hadronizes and we get two plus and a Pi zero. In this case you
say the momentum fraction in terms of charged particles is Z1
plus Z2.
If you compare this with the fragmentation function for
pions you get two contributions. You find either this Pion or
this pion, the two terms are added up. While fragmentation
function, we look at one hadron at a time, these track functions
look at all hadrons at the same time and that leads to different
behavior.
So, when you looked at this -- well, several years ago,
2013, she is Jessie, we first had a formalism and thought let's
apply it and we picked track-based thrusts. You can see data
from Delphi.
And the measurement of thrust in terms of measurements of
tracks and all particles and thrust is like something for jet
mass. The uncertainties are smaller on the data when you're
using tracks.
So we went ahead and started calculating and the calculation
is fairly complicated. One of the ingredients you need is a jet
function that describes both the contribution to the thrust from
the radiation and you also need to know in addition the momentum
fraction of the charged particles in the jets.
It's the other variable X. It really probes the details of
the track function that appear in there. So you need to know the
Quark and gluon functions to make the prediction. Maybe you can
extract that from the data in some way. If you're interested, I
can tell you more about that but after the talk.
After you've done the hard work, you get the plot out and
find out for basically almost the whole distribution, the tracks
and calorimeters of all particles are basically the same except
for this peak here. This peak you have to deal with other
non-perturbative effects. Not just the effects of converting to
charged particles but from having gluons with small transfer
momenta. You can feel a little dubious about this.
That felt a little unsatisfying. I fine, fine, the data,
these curves are very similar. Maybe that is just what it is.
But we do all this work and you find curves that are lying on top
of each other is maybe not the best.
Thankfully, there has been some developments since then.
Quite a bit is looking at new observables. I'll highlight two
examples. The first is the azimuthal angle in the Z plus jet
production. Here you see the boson, here you see a jet, red
blobs are incoming protons and we're looking at the azimuthal
angle which is the angle between the jet and the effective boson
in the transverse plane.
The key ingredient here is this one that we use the
winner-take-all axis. By doing this, the whole jet -- the
effective soft radiation and jet finding becomes suppressed as a
consequence of that, we don't need to know about the details of
how soft radiation is converted to charged atoms. That doesn't
play a role in this story.
This is the effect of doing the track-based measurement is
suppressed. It starts next to next leading log accuracy.
And in addition, it didn't require full details of track
functions but we need a single number for Quark jets and another
for gluon jets. To make this more concrete, we can look at the
distributions for this angle delta phi in Pythia using all
particles are using only charged particles and the curves are
basically on top of each other.
If you look at the ratio plot below, within statistics is
one except for maybe at the very end of the distribution. This
is an observable where you see small effects going from all
particles to charged particles only. And the effect, we can
understand is theoretically and theoretical calculation, not very
difficult.
If you wanted to properly account for this conversion, we
can do this with not non-perturbative functions but basically
with two numbers. That is, I would say a positive development
compared to the complication of this. Where also, again, there
is not much effect when switching from tracks to all particles or
vice-versa.
The next example is a case where the effect of switching two
tracks -- there is a really effect. It's not that the observable
doesn't change.
But this is an example where implementing tracks is easy.
And the example here is energy-energy correlators. These
are not cross section measurements. These are really weighted
cross sections. So you take the cross section and you look at
two points at some fixed angle chi. That is this angle here.
You take the energy deposits at these points that are separated
by this angle and you weight, you take the cross section weighted
by these energies.
The reason this is easy to convert to tracks is you just
have to say, well, instead of measuring the energy in all
hadrons, I just want to know the energy of charged hadrons. You
multiply by the momentum fraction. And then this X is drawn from
the track function but actually I can just put brackets around
all this and it just, a single number which is a moment of the
track function.
I should mention here, if you want to do these measurements
at very small angle, chi, you want to, of course, use tracks.
So something that I've been working on recently with Yibei,
is the calculation for both the track function evolution and the
track based EEC.
So I just want to briefly show some results. So this
evolution of the third moment of the gluon track function and
this very first term is just the same as you expect for the
evolution of the third moment of the fragmentation function.
We get additional terms like this one that involves two
track functions or this one is three track functions. And this
nonlinear behavior arises because we're not measuring a single
hadron but we want to know about multiple hadrons in the final
state.
We can use these results to get track-bailed prediction for
the energy-energy correlator and this plot is for the asymmetry
of the energy-energy correlator which is plotted here. The angle
chi again. The orange is the leading order. And then in green,
the NLO or alpha squared prediction. If you want to see how this
compares to data, the INT is helpful.
There is more work to done because we should do the
resummation.
Okay. Something a little related to track function that I
wanted to highlight is a recent development by -- where he looked
at the spectrum of charged hadrons and the perspective he took,
if you think of Quark or gluon, it will emit radiation and that
is described by perturbative physics and at the end of the day it
will hadronize. What if we just pretend that all this is
produced by this evolution, this fragmentation process and the
Hadronization is basically irrelevant.
The key thing that he did, not just normal evolution, but
did evolution that involves the resummation of the small ZH
resummation. It's not a normal D -- evolution.
At the end of the day he used three parameters. The cut off
of the evolution at small scales. How you freeze a coupling and
a normalization factor. With the three-parameter model, he could
compare to various measurements of the charged hadron spectrum at
different energies and the results are very promising.
This is perhaps the best picture. You get this red curve,
this prediction and it goes nicely through the dots. Here is a
momentum fraction, log over one of the momentum fraction. And
there are regions that even though the agreement is fairly good,
you shouldn't trust this prediction because of the hadronized
effects to worry about, and the other end of the distribution,
you have to worry about not just small momentum fraction
de-summation but the DGLAP.
This looks promising for fragmentation, can you use it for
track functions. It would be cool to get the results for the
track functions.
Okay. Now I want to look at precision calculations and I'll
look at some different examples. The first one is the anti-KT
jet function at NNLO.
And maybe just to break this down a little bit, if you fry
to do higher orbed resumed calculations, then you have to worry
about, well, one of the things you need to calculate is this
object. And it has to do with, for example, three colinear
partons and then calculating the effect of the jet algorithm on
this.
We have an incoming one that comes out of the collision and
this produced. This object shows up in Higgs plus one jet
production exclusively. And this whole formula for the cross
section you don't have to digest. But I wanted to highlight this
jet function appears in a formula. It's something that is
relevant for physics.
The problem is that with these three different partons,
actually applying a jet algorithm becomes more cumbersome. If
you say three partons and parton I and J are first clustered,
then you have to impose this is measuring the smallest.
Secondly, you want to combine the I and J and that distance
should be small enough to lie within one jet.
Of course, this doesn't sound too complicated by the
challenge is you cannot do this numerically. There are
divergences. You have to isolate the divergences and you can do
the remaining integrals numerically.
There was a recent calculation earlier this year where they
went through different clustering histories, they did sector
decomposition and soft subtractions but they managed to do it.
This is the result for this jet function.
I should actually add this is only the Quark jet function.
They didn't do the gluon case yet. This is for a specific choice
of scales and also some logarithmic terms in there.
These, you can already get from other results through
consistency and provide a check on their calculation.
So you might say, well, this is, this cross section you're
talking about is maybe not the thing I'm interested in because
it's looking at a whole jet and not the jet substructure. But
the fact that you can do such a calculation holds promise for
doing these higher order calculations for jet substructure.
Okay. Another topic in the context of precision that I want
to talk about is precision calculation for Soft Drop. This is a
bit of overkill to talk about how Soft Drop works but very
briefly, if you have your jet, you recluster it using the
Cambridge Aachen algorithm that results in an angular order tree
that you see here.
What you do is go through the jets and at every splitting
you check whether the splitting is too asymmetric. The basic
condition you use is this one here.
If, instead the Z, which is the momentum fraction of the
softer sub jet, if it's smaller than, you throw away the softer
sub jet, the softer branch, sorry, I should say branch.
You continue until this plane is relatively symmetric and
then the momentum fraction of the soft one is called ZG. And the
two branches where the algorithm stops is the RG.
Well, what we did is we looked at the momentum sharing and
we wanted to go beyond the leading algorithm mic access. What
makes this interesting is it probes the -- function and it's
measured extensively.
There was a lot of data to compare to.
How do we get to our result. There is a lot of technical
things involved. But I want to actually just give you a picture.
These are the pictures that you may have seen before. You can
picture emissions in a jet. On this axis is the angle of the
emission. As you go to the right, you get closer to the center
of the jet.
As you go up, you go to more and more soft emissions. And
in this picture, you can see the region where the emissions are
not allowed.
If I know the value of ZG and the group jet radius which is
this line for ZG and this for the group jet radius, and
everything below this line is groomed away. Things above this
line, they are not allowed. Because they would pass the grooming
condition and then the Soft Drop will terminate there. But it
should stop only at this point here.
Well, so this is like the intuitive picture you can have.
You can see how the different measurements show up as different
lines here. The theta G measurements. The ZG measurement and
grooming condition. Turns out for the point of view of the soft
effective theory, you can identify the corners of this picture as
different degrees of freedom or different modes in effective
theory.
The emission that passes grooming requires a little special
treatment so it's not sitting at the corner of the plane or of
this red area. But it's still needed to be included in some way.
A final thing to note with this kind of picture, you can also see
non-global logarithms. These are things that theorists worry
about where you have a boundary in phase space and you have a
different restriction of radius than on the other side.
For example, this line here, this is the boundary of the
jets. Inside, we cannot have emissions that are too energetic
because otherwise, the Soft Drop algorithm would stop and that --
that we know that we wanted to end up on this line for the
measurement of theta G.
On the other hand, emissions outside are restricted. That
is an example of a similar boundary like here, we have the same
angle scale but different energies that play a role.
Okay. If you want to know more about the ins and outs of
this, you can watch Pedro's talk. I want to show you one plot
from the results.
Well, you can see the distribution of ZG, the ATLAS data,
and then our predictions at leading log and NLL prime plus LO.
So you can see that going to a higher order perturbative
theory, you reduce the uncertainty, you see the predictions are
consistent. And if you get a better agreement with the data, you
can go to higher order.
So I want to switch to talking about non-perturbative
effects. But still within the context of Soft Drop and still at
some level related to precision calculations. So the starting
point of this next item is that an analytic understanding of
non-perturbative effects of Soft Drop.
If you look at the Soft Drop -- not just Soft Drop in
general but look at the groomed jet mass, you have the hadron
level prediction, you can see how that is different from the
parton level prediction. The leading effects are governed by two
different terms and these terms involve non-perturbative
parameters.
These are numbers that you have to get from data. There is
one here and there are two here. And one of the things to note,
there is an explicit beta dependence. If you get these values
for some number of beta, you can apply for a different value of
beta. Here is the cross section showing up again and you get
these coefficients.
The coefficients, Z1 and Z2, you can calculate the
perturbative theory. Maybe to give a physical intuition for the
terms, the first term describes the effect of non-perturbative
radiation that is inside the jets.
And at some level goes with the area of the groomed jet.
And so if you calculate Z1, you find this is basically related
to, like the groomed jet radius, the average groomed jet radius,
and Z2 is related to the effect of non-perturbative radiation
when -- stops.
That is a bit of a different effect. At this point or
previously, the C1 and C2 are calculated at leading log. What
these authors did is extend this to next leading log and the
approach is soft colinear effective theory. You can draw the new
plains with different measurement lines and I'm not going to go
through the different lines and what they respect.
I want to show you the results. Well, for more you can see
Adi's talk. Here you see the C1 parameter and C2 parameter
plotted. And you can see here the predictions at leading log and
NLL in blue and orange and compared to extracting the parameters
from Pythia. The cut off is at some point you get into the
region where you get into a fixed-order region. So you maybe
shouldn't be using this approach.
But you can see there is good agreement between the resumed
calculation and the parton showers and that by going to higher
order, you reduce the theory uncertainty.
So staying a bit with the topic of non-perturbative effects,
I want to talk about something further away from maybe where many
people in the BOOST community think about since this is E plus E
minus physics. But there have been extractions of the strong
coupling from E plus E minus event shapes.
These are some, claim to be the most precise extractions
because the theorists are very, very small. And well, one
example is the C parameter. By these people here, well, this is
the definition of the C parameter. You sum over the particles
that are produced in the E plus E minus collision and do the
calculation at fixed order and then extract alpha S by fitting to
the data. That gives you this value that is above the PD
G-value.
With summation you still have a bit of the PD G-value. With
the non-perturbative effects, you go down a lot. There are more
refinements you can do but it doesn't change it too much.
The thing you worry about here is this big jump, as you go
from here to here, that is all non-perturbative physics.
Well, the question is, are you, do you understand this
correctly. In particular, this non-perturbative correction is
derived really in the two jet limits and then at some level,
extrapolated as being the nonleading order correction. And the
question is, does this work?
So there was an explorative study at the end of last year
where they looked at the C parameter and they looked at the
non-perturbative effects at C0 which is where the Sudakov peak is
and the C equals 3/4. The shoulder.
The Sudakov shoulder means at lower perturbative theory, the
C parameter stops at 3/4. It's zero above. This is the Z axis
and this is the cross section. I should have included the plot.
It jumps from a constant to zero. So of course, this gets
smeared by higher order corrections but these are two special
points. They looked at what's the effect of non-perturbative
emissions on C. And at C0, you can calculate the effect and you
find this number. This is basically what would be used
everywhere. At 3/4, they did the calculation for how much the C
parameter would change. If you then calculate the effect, you
find something that is almost half the value of what you normally
expect.
Non-perturbative effects lead to a smaller, like more
non-perturbative effects means smaller alpha S. If you have less
number of effects, maybe you get a larger alpha S. Of course,
this parameter used to describe non-perturbative effects are at
C0 and C and 3/4. You can make different lines between these two
points.
Then you can do the alpha S extraction for the different
interpolations and that is what happens here. The standard one
is right here. It's not actually, I was surprised, it doesn't
give the smallest alpha S. As you go between this value and this
value in different ways, you get values of alpha S consistent
with the PDG or some C parameter.
This is of course, potentially relevant for alpha S from jet
substructure.
Okay.
So now I want to look at non-perturbative effects one last
time from a different angle. Because there has been some work on
actually making the Hadronization model in the Monte Carlo
consistent with predictions from factorization.
The motivation for doing this is that you can then have a
direct comparison between parton shower and analytic summation.
For measurements like the top Quark mass extraction you get an M
top from the Monte Carlo but can I use that directly in the
analytic resumed predictions or predictions of the cross section.
You can't make an apples-to-apples comparison. In this case,
well, what are you seeing here?
On the one axis, is the variable that describes how much
soft radiation you have. In the other axis, you have a variable
that describes how much extra, like extra contributions of the C
parameter from non-perturbative physics.
So the way you can read this is at small K prime, we get a
lot of extra, you get a larger shift in the C parameter and for
larger A prime, you get smaller shifts.
That is, of course, in line with what I was saying on the
previous slide. Here they took the perspective that say from
factorization, we expect the shift to be always the same. Can we
make the Hadronization model do that and the answer is yes.
So here you see that independent of K prime you basically
get the same value for K or the same distribution for K.
Of course, yes, this is interesting. The real question is,
what should we use for the non-perturbative. Because now we're
making it fit some prediction but we should, of course, know that
is what we should be making the Hadronization model do.
So the very last topic I want to address is spin
correlations. This is, well, again, related to the energy-energy
correlator that I mentioned before. For the three-point energy
correlator, this is sensitive to spin effects.
So here you can see three partons, the Quark producing the
Quark and a Quark anti-Quark pair. And these three are being
probed by a three-point energy correlator. If you look at the
situation where this angle is small, then this gluon basically
becomes on-shell and you can talk about the gluon and it turns
out that, an interference between different helicities depends on
the azimuthal angle.
A dependence on the angle around this splitting of the
gluon.
That's this phi here. If you calculate this at order alpha
S squared, we find the following result. You get some number out
and now, I should say I'm only focusing on the dependence on phi.
There is also a dependence of L and S. If you look at the phi
dependence, you get a cos2 phi dependence here.
There is an unfortunate thing here because in the real
world, NF is equal to five and we cannot change it. If you
insert five in these calculations it makes it harder to find the
effects.
This was an order alpha S squared calculations.
There is a resummation to be done. And for that, I will
refer to Ian's talk abnormal the light rate OPE. If you look at
the result that comes out. This is close to five modulation that
is plotted here for Quark jets, it's not very visible. But with
B tagging, it becomes more visible. That is shown in this plot
here.
Basically the last thing I want to mention is that spin
correlations have been implemented in the pan scale shower and
they can look at the same observable. They find very good
agreement or excellent, I should say: Because they're right on
top of each other.
The pink curve in this plot is the analytic prediction. And
then the blue dots here, they come from the shower.
For more on this, and also for discussion of other
observables, I will refer to Alexander's talk.
That brings me to the end of my talk. Maybe just to
summarize it, a bit of a story rather than firing all these
separate tidbits at you.
Calculations for track-based measurements are possible and
we are extending this so order equals alpha S squared.
Related to this is the relation that you can get the charged
hadron spectrum from ZH resummation. We can talk about
extracting the track function from data.
And then I talked about precision calculations. For example
for Soft Drop that allows you to make precise comparisons to data
and is an ingredient for the description of non-perturbative
effects for the groomed jet mass. And also I mentioned the
anti-KT functions. Precision calculations with real jet
algorithms are in reach.
Calculating the invariant mass of like, a hemisphere E jet E
minus is easier than dealing with the clustering effects of an
algorithm.
I looked at the Hadronization effects where on one hand,
people have shown now that you can change the Hadronization model
such that it agrees with prediction from factorization. But of
course, you should have a discussion about what is the correction
description. And I think I personally say the dust has not
completely settles on this.
The final topic is spin correlations, which can be probed by
jet substucture and encoded with parton showers.
At this point, I want to thank you for your attention. And
I'm happy to take any questions.
>> Awesome, thank you very much, this was an awesome, you
know, sort of tour all over many interesting things to think
about. Yes, so we're happy to take questions now from the
audience if anyone has a question, please raise your hand.
Okay. I see Raghav has his happened up. Go ahead?
>> Wouter, nice talk. Section 7 that you talked about, this
non-perturbative effects of the parton shower, very interesting.
You talked about this C parameter right and you say if you
increase this, effective value with which you calculate C from 0
to three-half, the contribution of that reduces by half, right.
So, I wasn't able to follow 100 percent that is why I'm
asking the question. So if I have a jet where this effective K
scale, right, you include this change parameter and show that the
distributions are the same, what does that imply for a jet
shower? Does that mean you're actually changing the splitting at
each step?
>> I think, so there were two separate things here and I
hope I didn't mix them up. There is on one hand this analytic
study where they said the effect of non-perturbative physics,
Hadronization should reduce as you go from C0 because it becomes
almost half of that.
And then there's this other approach where, well, we
actually, here they took the assumption or took as input the
prediction it should not change and said can we make the Monte
Carlo do that. They didn't change the shower with this, they
changed the way Hadronization works. So they managed to then
change some of the ways Hadronization was done so you go from
this distribution which is just like whatever was in there
before, so this distribution that says, oh, independent of what
the C parameter would be, I get possible the same shift.
>> This is like a change in the actual model its but not the
perturbative part.
>> It's changing the Hadronization step.
>> Very cool. Thanks.
>> Cool. I don't see any other hands up. Maybe I can ask a
quick one. It might be easier if you go back to slide 8, I think
was the clearest example. You showed a couple of places in the
talk, plots where you go from, you know, one calculation to a
more precise one like this one here. I guess my question, first
of all, does the green band overlap with the orange band or are
they just touching on the edge?
>> That is a good question. You can't really see that.
They are not touching on the edge, they do overlap a bit.
But we don't have the curve there so we really should make sure
that we do this for the actual paper that you can really see
that.
>> Okay. I guess you know what the question is then. If
I'm asking about this, right. So I mean this isn't the only
place where I saw something like this. I'm wondering are we
missing something when we make the error bars either some
non-perturbative corrections that are larger than we think they
are or something else that goes into them that maybe we should
think about? I mean, I guess, I would expect the green and the
orange maybe to overlap a bit more than they are here.
But maybe I don't know how that work?
>> That's a fair question. This is a fixed order
calculation. So the scale variations here is one scale that we
vary.
If you do like a resumed calculation to have multiple scales
you can vary probing the different physical scales in the
process. I have to admit, at some level this is our convention.
You can be more conservative and get larger bands that will
overlap better but then, of course, other people look at your
work will say, how did you get such large bands.
>> Oh yes.
>> That is why we are stuck with sticking with what is
conventional so you can do an apples-to-apples comparison. But
more generally, if things don't overlap, do we understand why
they don't overlap. For example, for the Higgs cross section, we
understand why there is a big jump as you go from one order to
the next.
There is also an example here in another plot where if you
look carefully, I mean, here things don't look so well but that
is simply because at that point, the leading log resummation
doesn't include any matching to the fixed order. This is not a
region where, this is a region where we really should have
included that.
It's just missing something there. We're extrapolating into
a region that we know shouldn't work.
>> Okay. Cool. I see questions from Clemens, next.
>> Thank you for this nice talk. I have a question on the
spin effects and jet substructure on slides 20/21. Well, it
might be a naive question, but I was thinking about how to
measure this experimentally. We see, like in W boson decays or
so, depending on the LSAT, we see some spin, spin is always the
same but the polarization, we see effects in decays now. You're
talking about three-point energy correlators here. I was
wondering where one can measure this experimentally.
Something, for instance, one would measure in boosted top
decays or that is unrelated and one needs to look at pure QCD
decays?
>> What is here is not specifically for a top Quark. It's
for light Quarks or gluon jets. We would take a look inside the
jets, well, we would put, we calculate the energy, well, we probe
the energy at three different points where this one has a large
separation, that is the L angle and this is a small one. So this
is not something specifically for the top Quark.
>> Okay, you basically just do initiated jets and take a lot
of them and try to measure this.
>> Yes.
>> Okay. Thank you.
>> Okay. Andrew, I think you're last. Is it fast? Because
we should move on.
>> I'll try to be fast. With respect to the
non-perturbative corrections in C parameter, if you have a
leading power factorization theorem, the Z factor only goes to
non-perturbative.
How do you incorporate the three jet non-perturbative
corrections, a sub leading factorization theorem or what?
>> That's a very good question. I do not know the answer.
That's a good question. I hadn't thought about this. I wanted
to put something out here that was thought provoking and it's
very interesting but I don't know how to do this. And I have to
be honest, at some level, okay. So this is maybe something for
the theorists to decide what they want to do with this. But of
course, there are two different approaches and one of them that
is advocated in their paper is maybe we should stop pushing so
hard on very high precision calculations.
Because here we're talking about next to next to next to
leading log calculations without doing more work on the
non-perturbative side.
But, yes, of course, the alternative is we understand these
things better and know how to incorporate them. But I don't know
the answer to your question.
>> Something for BOOST in a couple years!
>> Yes.
>> Okay. So I guess we should move on now. Thank you very
much Wouter for this great first talk. Our next speaker is
Jennifer who I can see. I guess you'll share your slides.
>> Yes.
Let me make full screen. I want to -- wait. Okay. Can you
see the slides?
>> Yep.
>> Great. Hello everyone. It's a pleasure to be here to
give an overview of the latest results from the LHC experiments
and the topics that are relevant for this conference of course.
I would like to start with a slide from the Petar's
experimental introduction from 2019 where he released the urgent
motivations that brought us to this conference.
We all want to understand QCD. We like to play with machine
learning and also we try to improve more and more our multijet
background estimate. But our ultimate goal at the end is that we
want to find new physics.
That was 2019. And quite some time ago but it feels a
little like we can condense the last two years into one.
I'm not skipping 2020 because despite being a horrible year
we have worked really hard to meet our goal of finding new
physics and hopefully, in the immediate future.
In fact, we, so the third LHC run is at the door. And
actually one had batch discovery potential. All the new ideas
that we have developed in this last years, will increase our
chances.
Not only in Run3 but also beyond that such that we can keep
improving our experimental results considering limited increase
in dataset size.
So in these last years we have performed more and more
measurements. We enlarged the region of the phase space. We
developed and applied new ideas to improve the trigger and the
data acquisition systems and upgrading and computing paradigms.
This is to make the best use of the data that we collect.
In my perspective, there is a lot of deep learning now as
fundamental basis of, on all this innovation.
In fact, we have seen major advances in deep learning models
for jet classification. For instance, in CMS, we have studied a
variety of boosted jet tagging algorithms. In the most recent
searches like deep learning models, using state of the art
technology, are becoming more and more the baseline.
So in particular, like the newest searches use for instance
particle net which is based on the permutation invariant Graph
Neural Network. That takes raw information, like the jet
constituents of the jet.
We see now how moving to more state-of-the-art architectures
we can further improve our jet classification platform and push
it more and more to the limit.
You can see here, for example, the performance of the new
tagger particle net with top tagging and compare to the previous
baseline which was an algorithm. And deep neural network but
based on 1D convolution.
Moving to new architectures help farther.
Similarly for ATLAS, they also developed recently new neural
network based taggers. They are based on high-level information.
So like they developed a new taggers for Higgs to BB and also for
the V bosons and top tagging where they use the new B tagger and
that combines the flavor information of up to three subjects.
Compared to a standard single jet algorithm. B tagging
algorithm.
Together with the kinematics information of the large radius
jet in which the subjects are con stained.
And they increase in performance and that can be seen in
these two plots on the right where the top one shows the
rejection of the multijet background and on the bottom is the
rejection of the top background.
In the blue, the light blue points show these new, this
latest algorithm which is compared then to other older ones. At
the bottom, you can see the performance for W and the top tagging
with these new neural network-based algorithm that uses that jet
substructure observables but obtained from the novel and improved
jet reconstruction which used this unified flow object which is
are basically like particle flow and use a combination of track
and -- information.
So this with respect to the algorithms as before are
additional discrimination classification platforms.
So we're seeing now also that these models are not only,
say, like R and D but they apply to searches. So here for
example, I'm showing three recent results from CMS. Searches for
di-Higgs resonances in 4D final state or 2B plus lepton final
state and also a recent search for diHiggs production in the VBF
channel.
Here I'm showing the expected limits as a function of the
recent Higgs mass for the two recent searches and instead for
the -- search, you see the exclusion limits as a function of the
coupling of the two doe sons and the to Higgs.
Here you see a comparison with the previous result which
were based on 2016 data. There is a large improvement which was
obtained thanks to the first deployment and this allows us to go
beyond what one would obtain from increasing luminosity.
Okay. This is all very promising but doesn't come without
complications. In particular, the searches are trying to use and
develop and use more and more sophisticated background estimation
techniques.
In fact, with more powerful taggers we also start seeing
more and more that the dijet search is not just -- not only
luminated by QCD background and you get 50/50 and top Quark.
As an example, I'm showing here the spectrum of the jet mass
distribution in the recent diHiggs 24B search and you can see how
ttbar becomes important.
We try to develop multidimensional models. And use them
separately to reduce our systemics. For example, in this search,
two-dimensional model is used where -- defined by the mass of the
Higgs. So the mass of the jet and the diHiggs invariant mass.
Then two regions of this phase space are identified where
one of the two jets pass and fail the jet tagging algorithms. In
this case the Higgs B tagging algorithm.
And then 2D transfer factor is computed to estimate the
background in the past signal region. Of course, where the
complication is, is that this works, so this transfer factor in
the 2D plane works because a lot of effort was put by the
experiments to invent and apply tagger de-correlation methods at
the cost of some performance loss.
The plot on the top right shows the jet mass distribution
for Higgs to BB tagger at different background rejection rates
and cuts on the tagger.
And you see how tighter and tighter cuts start -- the mass.
In this way, the background estimation that is used -- the region
that fail it is tagger is not usable.
In fact, when applying the correlation methods we obtain
instead a very smooth distributions in the jet masses as you can
see in these other plots on the left.
This allows also to be able to see a peak in the jet mass
which is something we would like to see in case there is a
signal.
Also, we have this very powerful taggers which are, however,
usually trained on imperfect Monte Carlo. So the obvious
question is, is the score well modeled.
With mass check, we check the agreement of the score of the
taggers between data Monte Carlo and we find that we must apply,
and must measure scale factors and we find sometimes that these
scale factors may be far from one and with large error bars.
So how do we compute the scale factors? Well, it's easy to
check for top or W tagging because we have a large sample of top
Quarks available include can be isolated. So for instance, here
you see a measurement for the W tagging algorithm in this sample
of top Quarks and we have a large fraction of -- that can be used
for this measurement.
So it's not this easy though, to compute the scale factors
for Higgs to BB or Higgs to CC tagging. So we typically use
gluon splitting as a proxy. That is what we've done mostly so
far. This comes with complications. It's not really the same
process that we want for tighter scale factor two.
It's becoming more interesting, the measurement in Z to BB
which is now feasible with the high Run2 statistics. This is for
instance, what ATLAS does. And here you can see the jet mass
spectrum in Z to BB reached the sample, say, there is enough Z
plus jet to be able to perform such measurement.
Now, the problem, another difficulty comes if you instead
want to apply the jet tagging algorithm to a full pronged jet.
That is something that we want to do. If, for instance, we're
doing a search with a Higgs boson decaying to WW or standard
model measurement with Higgs to WW.
This is more complicated because there is no obvious proxy
available in this case. As an example, I'm putting here a recent
search from CMS include is focused on a triboson resonance, you
expect cases where you have this -- biggest to 2Ws and you want
to tag these four prong jets.
So what we do, you do what you can and at the end, since no
method is the best, you need to have a large amount of systemics.
Can we have a better Monte Carlo? Well, in principle, yes.
Because we're all LHC experiments have performed over the years
many important jet substructure measurements. Which can help us
to improve the Monte Carlo generator, development and tuning
among other things like understanding better the perturbative QCD
and measurement of standard model parameters.
Here is a long list of jet substructure measurements
performed at the LHC. And among the, say one of the most recent
ones is the measurement in the Lund plane from ATLAS where
they're interested in a measurement that came out already last
year.
And you have already heard in the talk before about this
observable which allows to categorize all hard splittings at once
and it allows to factorize Hadronization and parson shower
effects.
In ATLAS paper they show these two nice plots. The one on
the top, these are both based on simulation. The one on the top,
the, it's based on Herwig, the two parson showers are compared
and the ratio is shown here.
You can see the parton shower, the Lund plane is sensitive
to the parton shower in the top left corner.
And then on the bottom is that the -- model is changed and
the ration is shown here. You can see the Hadronization, the
top, say, the second, the top right triangle of the Lund plane is
sensitive to this effort.
So this is very interesting and it can be done also in data.
That's the measurement. And to make it more intuitive, they show
slices in, basically in the regions of the Y axis. In particular
in this slide, you can see how the non-perturbative, how in the
non-perturbative region, the agreement with the data is worse or
better depending on the Hadronization model. And instead the
perturbative region on the left is, the agreement in the
percentage region on the left is more sensitive to the parton
shower model.
There is also a new interesting measurement from CMS which
you will hear more about in the next days. In particular, they
perform this new measurement of five observables sensitive to the
jet fragmentation in gluon and Quark jets. And also in several
different variants.
These measurements are performed in multijet environment, in
multijet sample but also using Z plus jet for the first time as
enriched in quark-initiated jets.
Here is a full list of observables. They have, they make
many important observations about the over-prediction of gluon
versus Quark discrimination depending on the generator and parton
shower.
Among the many results they show here, I'm showing the
measurements in the resolution singularity, and the Z jet region
on the left and central dijet region on the right.
You can see the usual sandwich between eta and Herwig. You
can see how the Z plus jet region is better described instead
that, with respect -- than the dijet region which is gluon
enriched.
We have all these measurements and it's probably time to
combine and make use of all this information for the next
generation Monte Carlo generator. Such that we simultaneously
describe everything we need and improve searches and other
measurements.
This is something that must be done. But at some point we
might ask do we need Monte Carlo at all? And instead, since we
can just learn from data.
This is an approach that is emerging a lot recently.
Why? Well, because we have performed thousands of
hypothesis tests and have not found any significant evidence of
new physics.
Either new physics is beyond the reach of LHC energies or we
need more data or we're looking in the right place.
We have not imagined yet how new physics look like. This is
an urgent need to generalize and there are many recent ideas that
make use of deep learning to learn directly from data and avoid
signal priors. This is also what we call anomaly detection.
The first time this idea was presented at BOOST was 2017 and
it was a poster. And then it was in 2018, we introduced a
dedicated machine learning session in which three out of 7 talks
were anomaly detection.
Since then, there has been a lot of effort to develop a
concept for end-to-end analysis based on this idea. Since it's
not obvious, like, how to actually make a real analysis out of
this.
In particular, where a lot of interesting talks at the --
describing indeed recent developments on supervised learning. I
think you will see more in Barry's talk about all this.
But what is interesting is that we have also seen finally
this idea already applied in data. In fact, ATLAS made the first
implementation of this approach on collider data last year. The
search is basically a three-dimensional search for generic
resonance A that decays to two other generic resonances B and C
in the dijet final state. And the approach is based on the CWoLa
hunting weakly supervised algorithm.
In which basically one performs a classification between two
mixed samples, one enriched in signal and one enriched in
background. And the classifier is trained on some variables.
So in particular in this first implementation, the simplest
way of doing it was applied. Which the network is trained using
the masses of the two jets to do the classification.
Here you can see the efficiency of the classifier with no
signal injected on the left and with signal injected on the
right.
You can see, indeed, how the efficiency goes to low values
in the region where there could be potentially a signal. So this
shows that the approach could potentially work.
Also, unfortunately, no high significant -- was found. So
they had to set limits. You can see the observed limits for
different hypothesis for the masses of the two jets and the two
particles in the final state compared to dijet search.
We can see that we are -- an approach is sensitive to a
large phase space. Not achievable by signal-dependent searches
but it's less sensitive than super vised searches in places where
these are tailored for.
But, of course, if we will know which signal to search for,
we wouldn't have to do this.
There is also one problem here that the anomaly might be
discarded by the trigger. In fact, to cope with a high data
rates of the LHC, we implement two stage filter system. Where
the first stage, the level one trigger analyze the data at
40 megahertz. And given like these high rates, just a very
coarse reconstruction is performed and brings down the rate to
100 kilohertz.
And then the second trigger, has reduced data rates, it can
perform a more sophisticated reconstruction of the event.
However, we bring down the rates to 1 kilohertz.
With 40 million collisions a second and only 1,000 stores,
we might just been writing the wrong events. In particular these
trigger algorithms are model dependent. Any other signature we
did not think about could have easily been discarded.
We can think of correcting the problem as soon as possible
in this data reduction flow. If we want to apply a deep learning
base anomaly detection here, we have to be careful. Deep
learning algorithms can become relatively large such that the
memory and number of iterations required for the inference can
easily explode.
So in particular in the level one trigger, we're --
constrained because the algorithm is to run with the latest --
just a few microseconds. Such that the algorithms run on
hardware, FPGA hardware. We have scarce resources-how to fit the
deep learning algorithm is not obvious.
But recently, a library called high level synthesis for
machine learning was developed to automate the deployment of deep
neural networks in FPGA to obtain ultra-low latency, I mean to
obtain a model, say, the implementation that is optimized for
good overall latency and good resources. So anybody can try it
and see how, say, for intuitively, one can obtain a few more
implementation for your deep learning algorithm.
In fact, there was recently, there is now an ongoing effort
to study anomaly detection with autoencoders with a level-one
trigger like inputs that can help overcome these trigger
limitations that I just discussed.
These detection algorithms can be employed at the event
level and this can be done already in Run3 or also at jet level.
But in that case, it would make more sense to have it in phase
two where we could profit from higher -- in particular for CMS,
the detector would be upgraded such that, such to provide
tracking and particle flow information at 40 megahertz.
That becomes a place where one anomalous jet can be done.
I'm pointing here to some work that will appear soon on the
archive. In particular, there is work on going to do jet
classification in nano second with graph convolutions,
interaction network and also more classic multilayer perceptrons.
This is not unsupervised but it's a group of concepts that
could be fundamental to then apply unsupervised algorithms later.
This ongoing work, there are several approaches that are
studied but there can be other ones for optimized anomaly
detection in low latency and low resource experimental
environments.
To stimulate the community effort, we have set up a new
challenge where you can explore it by looking at this link and
where you can find training and testing datasets and lot of
information on how to estimate the latency and footprint of your
algorithm.
Of course, we cannot improve the level one without improving
the HLT where we have more granular information. So closer to
the offline. Such that we can have there more performant models.
And of course, at HLT, we have more relaxed constraints but
still algorithms have to run in few hundred milliseconds or less.
This is pretty tight still.
I would like to conclude saying that understanding the
computing limitations is fundamental when developing models. And
also understanding the experiment available resources and how to
best profit from them, how to best profit from heterogeneous
farms. There is a change in our computing algorithms.
This brings me to a summary. With deep learning algorithms
at the basis of many new innovative ideas, more can be achieved
with boosted objects.
We have seen how the sensitivity of searches can be pushed
beyond the slow increase in the size of the collected data
sample.
With progresses in understanding jet substructure, searches
will benefit further from the application of these new
algorithms.
New approaches like anomaly detection applied to jets will
bring us to new unexplored territories and this can lead to the
study of new regions in HL-LHC.
Hope to see a few of these searches for the unknown at BOOST
next year.
>> Thank you very much for that great overview. I'll be
taking over for Matt for these questions.
Opening it up for questions, please use the raised hand and
reactions if you have questions. Max, go ahead.
>> Thanks for the nice talk. Along the lines of the anomaly
detection, how are we going to stop detecting, like prevent
ourselves from detecting anomalies and noise and other
experimental effects that are, you know, quite anomalous events
that happen often in things like this. This is something that
you put into the training and specifically avoid it. What are
your thoughts on the experimental aspects of a calorimeter
breaking or something like that. Just triggering on that
entirely.
>> Right. Well, I think we have seen now so little of this
applied to data that I don't have a feeling on actually how much
this detector noise would be a problem. Some sense we have
detector noise also in normal analysis, right. We have ways to
clean our data from such problems.
So I would think that we will understand this more when we
have more examples of these algorithms applied.
>> Okay. Very good. Thank you.
>> I'll ask a question, for those of us that are not
experimentalists, can you give a lighten review on what the next
two years at CERN are supposed to be. What has COVID affected
and when might we see Run3 and things like that.
>> Yes. It will start in, of course, things have gotten
slightly delayed because it was supposed to start already this
year.
And now the current schedule is spring next year. That's
the final schedule. So we'll see it soon?
>> Good. Excellent. Other questions?
I'll ask another question. I'll ask this of many of the
search talks. So as the experiments move more and more to deep
learning, which is great, and as you emphasize dramatically
increases efficiencies, we can trigger and things like that, do
things at a faster rate. What can theorists do to help with
searches? If you just go deep learning, a computer is much
smarter than me. What can we do to do searches. Does it help to
make fancy searches?
>> I think it's, it's fundamental that theorists indeed help
the searches and deep learning models. In terms of injecting
physics knowledge in the algorithm. When injecting physics
knowledge in the algorithm, you can make it more compact, faster,
and more intelligent than what it is.
We will see examples like equi variant neural networks or
this type of knowledge.
Do you agree?
>> I do. To be fair, the equivariant Lorentz network was
lead by experimentalists who are perhaps smarter than theorists.
>> Other questions?
>> Hi, yes, I can ask a question quickly. Sorry, Jennifer,
something crazy happened and I missed most of your talk which I
really apologize for. I was looking through your slides and I
really like the slides that you had about improving the Monte
Carlo.
I agree, maybe ditching it would be better. But I'm
wondering if you know of any progress that's been made to try to
incorporate all the measurements that you're listing into new
tunes of parton showers or is this something we can try to push
on as a community?
>> Yes. I think that's the missing part. That's the
missing part and we should push more on that. And eventually
maybe also restructuring a little, some type of hierarchies in
the experiment, like how groups talk, for instance. The standard
model group with the searches group.
But I know that will are a lot of people interested in the
field, let's say, filling in the missing pieces of the puzzle.
And I think we will see more of that.
>> Okay. That would be cool. The thing that I wonder about
is whether we're measuring the right stuff to really get the
Monte Carlo people excited. I know I've seen talks from the
people where they're asking for us to measure different
observables than the ones that we're looking at. It could be
something worthying about, are there things that we're just
missing.
Like maybe not stuff that ATLAS can do or LHC or someone
that can do nice hadron spectroscopy and stuff like that inside
of jets.
This, I think, the people could be really useful and it's
not something that we're pushing towards right now?
>> Yep.
>> Gregory, go ahead.
>> First, I want to agree with Matt. My question is a
follow up. If everything goes unsupervised, what are
experimentalists going to do?
>> Who.
>> Andrew asked the question what are theorists going to do
to help the machine learning revolution, say. If everything goes
unsupervised what will the experimental lists do to help?
>> I don't think everything has to go unsupervised. It's a
new approach. I think, so say that, I don't know, maybe, how I
see it is that, if we see an anomaly, we might need to understand
it. So there we need a lot of work in between, among, let's say
experimentalists, phenomenologists and theorists.
>> That makes sense, thanks.
>> Any other questions? Max.
>> This is not so much a question. It's a comment not a
question. I wanted to highlight on slide 9 the result from CMS,
the BBH to 4B. I think this is a really, really beautiful
result.
I think it's a really interesting new, sort of step for
substructure. Because here this search has shown for the first
time there is a coupling between two Higgs bosons and two vector
bosons and that's a strong statement about the standard model
using boosted techniques at the heart of this. And the beat the
equivalent of ATLAS and us.
I think this is a nice search and I wanted to highlight I
think this is another interesting aspect, seeing how we're seeing
a real interesting, unique statement on the standard model
physics with these techniques.
Thank you for showing this.
>> Okay. If there are no other questions, we can move to
our break. Thank you, again, Jennifer and Wouter for the great
review talks to kick off BOOST.
In the chat I have posted a link to the gather town for
BOOST. Please come over as you can with your break beverage of
choice.
We'll see you there. We'll reconvene, let's see, in about
15 minutes. At 1650. I have do the time zone conversion. In 15
minutes.
We will stop recording this Zoom session but the Zoom room
will remain open if you want to hang out here and say hi to
people as well. See everyone in about 15 minutes.
>> Okay. Everyone. We're coming back to Zoom now.
I guess we'll give people a minute to transfer back over.
I see Barry, hello.
>> Hi.
>> Do you want to try sharing your slides, we're about ready
to go.
>> How is this?
>> Looks good.
Let's just maybe wait another minute or so and then you can
get started.
>> Sounds good.
>> We've crept back up over 75 people, so I think, why don't
you get started now.
Okay. So I guess our next speaker is Barry who is going to
summarize the ML4 jets workshop that happened recently. In
particular, I think he is going to talk about many very
interesting machine learning related studies which have been done
also by members of the BOOST community.
Ones which we may not hear about otherwise this week. I
think it should be very interesting and I look forward to it.
Barry?
>> Yes, I think many is the keyword there. We had quite a
lot of talks and I apologize if I don't do justice to the
presentations.
First of all, I would just like to mention all the talk,
recordings and slides are able on the Indico page. I would like
to thank the organizers for having this summary talk. There is a
lot of overlap between the interests of the two communities.
We had the workshop just almost a month ago in Heidelberg in
Germany. You can see the picture of the old bridge and the
castle in the background. The venue where we had the conference
is a short walk five-minute walk from here.
I will start with an overview of the workshop in general and
then I'll go through the sessions in lightening fast speed. For
give me if I leave out details.
I'll introduce the next venue for ML4 jets and give a
summary of different topics that were covered in the workshop.
The series follows on from workshops in 2017, 12018, and
2020. This year because of COVID, of course, we couldn't have
the full conference in person.
So we managed to get a hybrid workshop. We had 384
registered participants. We were able to have 30 people here in
person. Mostly from Germany, ham burg and so on. With everyone
else online.
For peace of mind, we had daily testing and socially
distanced lecture hall.
With the online format there were lot of abstract
submissions. In three days we fit in 11 sessions and 99 talks.
This is good because we recorded everything and put it online.
But for the people in attendance in person, this meant 12 hours
of talks per day. It was intense but worth it.
One of the great things about the workshop is we had talks
from the theory community, experimental community and machine
learning community. We got a lot of interesting discussions
going on the breaks and so on.
And of course, the Euros. After 12 hours of talks what is
nice is to turn the projector over to football and have a beer.
What is more important is the physics. Here I listed the
main topics that were covered and at least my impression of what
the new ideas were and what was important.
The first obvious thing is the fact that many techniques had
big gains from the architecture. We have seen graph CN Ns and
deep sets, the transformers and INN and flows have boosted the
current technique.
As you can see from Jennifer's talk, there have been many
improvised machine learning advances. We have seen lots of
progress and simulation in generation direction. So moving from
GANs to flows and INN and VR architectures as well.
One of the cool things about this workshop was the
announcement of two challenges following up on the LHC Olympics
and top tagging challenge.
The anomalies at 40 megahertz challenge. And another one
which is not as far along in the organization is the calorimeter
simulation challenge.
For me, three of the most important or the most interesting
subjects that we're studying in machine learning and were covered
in detail at this conference is if understanding of uncertainties
in machine learning tools and the deep learning and anomaly
detection and symmetries.
These anomaly detection techniques are used in experiment.
This is one direction that needs a lot of work in comparison with
everything else. It's one of the most interesting things for a
theorist or experimentalist to work.
In the first session we had new architectures and here the
talks are mostly about incorporating symmetries into the neural
network architectures or optimization techniques.
We have the epi variance and equi variants. Lorentz neural
networks and permutation invariant architectures like the deep
sets or the transformer architectures.
I mean the permutation of variant architectures are used
because in a jet, if you order the constituents you have implies
sent ordering in the data.
In part two we have new strategy or representations. Here
the shift is from encoding the invariance in the network
architecture and moving it to the representation of the data.
The most common jet representation we know of it jet image. Here
we, other ideas were explored such as the new unsupervised
learning by Peter.
We had the BSM section. The focus was on anomaly detection.
It's difficult to categorize different anomaly detections into a
single box but here are three main sections. The first are the
over-density methods. The anomalies are seen as an over-density
in a high parameter space, like a high dimensional bump. A high
dimensional space for a region that is over-dense.
In the second part we look at latency space anomaly
detection. Autoencoders where the input data is map today a
compressed latent space and we look at the data and look at the
points of the lowest dense spaces and identify the anomalies.
One highlight here was the review talk from the dark machine
from Joe and Bryan. They compared different techniques and as
far as I can tell, it looked very nice.
In a third part we have the data space searches with
autoencoders. This is what you traditionally associate with
anomaly searches. Essentially, the data is mapped to a
compressed space and decompressed and the reconstruction error
between the two, the input and output is used as an anomaly
detection metric.
It was interesting to see the three different approaches
combined in a single session. Some comparisons between the three
different approaches would be nice in the future.
So then we had the ML assisted measurements and searches
session. Here we saw some interesting updates from the NNPDF
collaboration. Talks on using machine learning and searches for
optimal EFT parameters, observables for EFT. And hypotheses
testing at the LHC.
We saw talks on new inference methods. Particularly the
Ginko method.
One of the big advances in this session was the use of
invertible neural networks to perform measurements particularly
in the measuring QCD splittings where the forward process can be
simulated with different sharing parameters.
And then the normalizing flows can be used to perform
inference on this process and to actually get some measurement of
the parameters.
Then we had the, I think this was on the first session of
the second day. The classification talks. The big, I think the
big talk here was the improvements to the state of the art
particle net network. In the top tagging challenge a few years
ago, the particle net came out as the winner. And there was work
presented on particle next that incorporates more advanced
architectures to boost the classification of the particle.
One talk was the use of physics motivated representation
like the Lund plane. Not just for top tagging but for Higgs
measurements and it was really interesting.
The next session was on the simulation and generative
models. In part one, we focused on detector simulation. The
motivation here is that in the future, LHC Run3 and so on, the
overhead computational costs grow exponentially or badly. And
the goal is to use GANS or machine learning tools to replace or
aid.
One of the most interesting talks was the ATLAS talk. It
was interesting to see them using the GAN techniques.
This looks extremely promises.
Part two was the event and jet generation talks. Here were
talks on a variety of architectures, GANs, we had some flows in
here. This image comes from the OTUS talk from Jessica where
they were using a VAE architecture inspired by optimal transport
to, for event generation.
Then the regression, calibration and fast inference session.
Here in the first section are updates on regression with pileup
mitigation in CMS. We compared PUPPI with new machine learning
techniques using attention mechanisms, particularly this ABC net
or another method, Pilot Mitigation with Attention.
We have the calibration talks next. Here one of the
interesting things which is quite a new topic, the idea of
super-resolution. We can start with a low granularity jet image
from the calorimeter. And using other information like tracking
information and so on, along with machine learning techniques, we
can upgrade the resolution of these images. But apologies if I'm
misrepresenting this work.
Lastly in this session we have the fast inference talks.
For example, the jet identification on the level one trigger.
This again is similar to what Jennifer was talking about. This
online trigger stuff is interesting there. Is lots of data
thrown away and there are anomalies that we can possibly detect
with machine learning sets.
We had one session dedicated to datasets and challenges.
Both are in a similar vein. In machine learning, one of the
difficulties is to, if we develop a new technique is to have a
fair comparison across the board.
This is where the datasets come in. This reducible open
benchmarks framework is there so you can up load your neural
network architecture or whatever and this framework will run the
architecture on some data and then pre-vied a like for like
comparison of the different methods.
Secondly, which is a bit similar, is some shared data and
algorithms for deep learning in fundamental physics. This is the
Erum data program. The data doesn't just focus on particle
physics. We have data from Pierre and Roget and others.
The datasets were provided in a Python package and
everything can do the training and test their methods.
We have two new challenges following onto the top tagger
challenge. The first is anomaly detection at 40 megahertz. The
goal is to have this run with a very small network that can run
on a chip with very fast inference times to be used on the online
trigger.
The second challenge is a proposal, a calorimeter simulation
challenge. We have a community challenge based on a common
dataset for using and benching marking different approaches for
fast calorimeter simulation. The community input is welcome.
Then we have a session on exploring the latent structure of
data. This first part, some of the interesting talks here were
on learning symmetries and conserved quantities in physical
systems.
In the very first session we had new architecture stuff
where they were talking about, we have lots of work with people
incorporate symmetries into architectures and here machine
learning is being used to find conserved quantities and
invariants in the data themselves.
They have taken the problem of finding invariances and
reframed this as an optimization problem for machine learning
tasks.
Then we have in the second part, talks on latent space
exploration. There is a variety of applications here which
aren't all in the exact same direction.
But one I picked out was the COBRA architecture. I found
this interesting because one of the problems we have with complex
final states is rhetorical backgrounds and here they use machine
learning architecture to overcome these backgrounds.
So that seems promising.
Then we have a session on interpret able and robustness and
uncertainties.
This is, there is a lot of different talks and different
directions in this session. But there was an introduction on
interpretability and robustness and uncertainties with machine
learning. And then we came to uncertainties with generative
networks. Here I'm showing plots. Not only were they able to
use neural networks as a generator but they're using Bayesian
generative networks and can generate data and provide
uncertainties on the data that they generate.
It's a really important problem if these techniques are to
be used in the experiments.
Then we have talks on information content. For example,
here we have the explainable A, for ML jet taggers. It's an
interesting technique. These neural networks are using some
black box that you can't see, you can't really understand how the
data is propagated through the network. Using a clever
backpropagation trick, they're able to see which of the inputs in
neural network contributed the most to a certain decision on the
output layer. Here are some examples where the red pixels
indicate pixels that contributed most to the decision in the jet
classification task.
Lastly in this section, part 4, constructing observables.
Here we have for example Bayesian inference in four top LHC.
This is a mixture model that assumes there is signal and
background processes in the dataset and tries to disentangle this
through inference techniques and there were other interesting
talks in this session.
In the last session, there were talks in general machine
learning applications to cosmological simulations for example.
We have a talk on conditional invertible neural networks to probe
cosmic ray sources. If you have some physical parameters and you
can simulate the forward process, using INN, you can invert the
process and provide input on the parameters.
We had a talk from Rutger's. Here we have a visualization
of the -- data. They're using the anode density estimation
technique. It's a technique developed for particle physics
application and they're using this to identify stellar streams.
To end the session, there was a talk from Michael on synergy
between quantum computing and machine learning. I wasn't able to
get images from this because they didn't upload the slides.
He gave a live demonstration of running a machine learning
algorithm during the talk. This was focused on quantum annealing
and it can be useful for optimizing particle physics for
optimization of particle physics problems.
So quickly, the next ML4 jets is announced in January 2023
at Rutgers University. The plan is to have it in person. Having
visited there myself, I can recommend it as a venue. If you're
interested, sign up.
For the summary. Here I listed the exact same topics from
the beginning but I hope now this is more clear where these all
came in. We had big gains from the new architecture. Attention
mechanisms and transformers.
We had impressive advances in machine learning at ATLAS and
CMS.
And as a personal test, the symmetries and deep learning,
the uncertainties and anomaly detection really shone this year
for ML jets. I will end here, thank you.
>> Thank you, Barry. You weren't kidding about having a lot
of things. Okay. That was great.
Are there any questions from the audience, please raise your
hand?
People are clapping. Okay. Clemens has his hand up.
>> Hey, thanks a lot for this nice presentation. I have a
question, or maybe it's more asking for a comment from your side
on the first point that you list here in the summary and you
touched upon in your talk about the very beginning. Which is big
gains from new architectures when it comes to jet tagging. So
you had gotten one of the early slides where you drew like, is
this the limit in the middle of plot of, you know, how far can we
actually push this further? Do you think we're somehow, is this
red line much closer to the lines that we have already in the
plot or is there lots of space? What is your impression?
>> None of these plots are mine. These are from a talk that
you'll see again maybe on Thursday or if you watched the video.
It's really not clear. I don't think there's, I mean,
because of -- okay. So the best performing models are
sophisticated and complicated deep neural network models. It's
difficult to interpret what is going on. We don't really have an
analytical grasp on the upper limit. It's a really difficult
question to answer. I don't have a good answer for it?
>> I saw a question submitted I think for Frederic's talk
and maybe we'll come back to this talk about how much information
is there in a jet and how much information do you show to a
neural network? And whether that is good or bad. Probably we
should talk about it later instead of now. But I think this is
something that will come up, definitely.
Andrew?
>> Thanks for the great review, Barry. I have a
metaquestion. In the history of BOOST, ten years ago the name of
the game was designing new observables that have a very concrete
physical interpretation but then have some practical application
for tagging or whatever.
From the machine learning side, you can say, well, I want to
work with the lowest level observables, calorimeter hits or
whatever and throw it in a machine and see what the machine can
learn. Clearly, these are two ends of the spectrum.
One is very physically understandable but very limited in
scope because it's a single observable. And one is extremely
general but has, from one perspective, maybe impossible to
understand. Individual calorimeter hits are not modeled well
theoretically. Where do you see on this spectrum, studies in
machine learning and kind of theory or experimental analyses from
the other side, kind of ending up? What do you see as the happy
medium? What's the, you know, what's the, what did each side
need to give up and what did each side gain from the growth of
machine learning in BOOST physics?
>> I suppose one question is how much interpretability can
you get and understand well enough to use in an experiment. This
is a question, I think as long as you can calibrate the
observable, it's fine. Okay.
But, how much interpretability are you willing to give up is
a maybe a moderate taste. Low level data, this is interesting
and your question in Jennifer's talk I find interesting and I was
hoping to address it here but during the talk I didn't have
enough time. There are symmetries you can impose at low level --
at the level of the constituents which propagates to the network.
For example, the simplest thing you can do is take the jet
constituents and order them by PT and flatten them and pass them
through the network. It's the worst thing you can do. You can
say the neural network is a general function and it should be
able to extract all the information. But in practice this isn't
the case.
If you incorporate the symmetries in the low level, you are
helping the network not only get performance but produce
interpret able results. Bias isn't the right word but you're not
going to see influences of the preprocessing on your neural
network output. One example is permutation of invariance. This
is the big gains came from the first, the graph net architectures
and these permutations of invariant networks showed a big
performance over things like, well, the images or permutation of
the invariants.
Big performance with the graph net, which is interesting.
Another thing is the rotational invariance of the jet. It's not
an exact symmetry. So we have several neural networks like the
SL3 and so on but other methods for embedding the symmetries in
the low-level representations. We haven't seen this play out
fully in the literature. But my expectation is this will not
only improve performance in the networks but more interpret able
outputs, it's a vague thing to say, but I think, more interpret
able network outputs.
>> Thanks.
Petar?
>> Yes. I want to comment on this. I think Andrew has a
right question and I think this is sort of what comes back to
Jennifer's talk also. In the sense that we need better detector
simulation and we also need somehow to connect the theory that
you do to Monte Carlo simulation. I mean this is sort of where
the link has to be made because we are training these machine
learning algorithms on Monte Carlo or even if we train them on
data, you can then also train them on Monte Carlo for specific
model and basically learn something. I think that process of, if
we had a simulation which is, if we had the detector simulation
which is perfect, that is of course, not true.
But if we had one, then the second step would be to connect
theoretical calculations to Monte Carlo simulation in some way.
And I don't think that framework for that exists at all. And I
think this is something that we as a community need to figure out
how to do.
So I think absolutely there has to be connection between
deep thinking and deep learning as we discussed a couple of
BOOSTs ago. I think this is where there is a missing step in
this whole process of, we are running Monte Carlo simulations
that were written 40 years ago and that have been tuned on data
without actually, I mean adding all of the theoretical
improvements in the theoretical understanding of soft QCD, I mean
none of that propagated into the Monte Carlo simulations that we
use to train machine learning algorithms.
So I think that was, yes, I mean I just want to say this.
Because this is sort of the missing link that connects all of
these things and there's very little work being put into this?
>> You're saying the bottleneck in this fast simulation is
not in the machinery by on the Monte Carlo side?
>> I think machine learning, I think all the aspects are
advancing and I think, so I mean maybe I'm wrong, but as far as I
can tell, sort of there are many things that do not propagate
into measurements and even think that propagate into measurements
that there are, they're like large, even instead of all the
measurements they are large theoretical uncertainties that really
don't need to be there simply because there is no progress in
certain areas.
I attend every BOOST from year to year, there are huge
advances in machine learning and huge advances in calculations
and there are huge advances in, you know, getting the technology
of Monte Carlo simulations to work correctly. But there is no
connection between improvements in soft QCD and Pythia, you know,
parton shower. And maybe this is fantasy. Maybe this is just
too hard and it's going to take ten years to do this. But we as
a field are not putting enough effort into this.
I feel.
Maybe other people can, voice different opinions and correct
me.
I mean I would love to be wrong but that is what I feel.
>> John, did you have a quick comment about that.
>> It was a follow up on that.
So the, first of all, there is a lot of theoretical
advancement in Monte Carlo generators. But it's focused mostly
on getting the hard part of the event right, all the technology
to deal with including higher order matrix elements and parton
showers and higher order parton showers are on the way.
I think we give the impression there is no work going on in
Monte Carlo, because there is. There is a lot of theoretical
connection there. The state-of-the-art theoretical calculation
for a given differential cross section at the LHC is more than
likely to be embedded in a full final state Monte Carlo these
days if you assume away from anything simple.
So I think in that sense, it's, you know, we couldn't give
the impression there is no work. There is work. I think you're
right, it hasn't been focused on getting the soft QCD right. At
some level that is a bit of an afterthought and retuned for the
parton shower. The parton shower interpolates between the two.
But then when, then you need to think, even the Monte Carlos
where we have state of the art calculations for a given final
state is embedded in a Monte Carlo in Sherpa and Herwig with
match box or something like that. It's very, very challenge to
quantify the uncertainties on its.
If you want to feed all the information about a full final
state into a machine learning algorithm and really understand the
result. In principle you need to understand, you have
theoretical control over all of the uncertainties of everything
you put in. We're a long, long way from that. I think one
approach to making better use of machine learning is maybe to
start being stricter about what we allow the machine know from
the Monte Carlo and only feeding it with things that we have good
theoretical control over. And then gradually trying to grow that
list working with the theory community to put them in there.
I think throwing everything we know about an event from the
data and everything that we know about an event from the Monte
Carlo into a machine is, we'll never get there. Everything you
throw in, you have to have theoretical control over the
uncertainties and at the experimental level. I think we should
step back and build up what we can give a machine to learn from
step by step and see how far we get that way.
That is where I think the collaboration with the theory
community can come in?
>> Okay. Great comments. Jon, I think that is a very good
segue into one of the talks in the next session that I suggest we
move onto now. We're running a couple of minutes late.
The chair of the next session is TJ who I guess is here
somewhere.
Yes. Hi TJ. The first speaker, I should check is --
>> His video is on. We can hand over to Samuel and a
summary of what PIRANHAs can do for your jets.
>> Sam, you're muted.
>> There we go. Thanks so much, Matt. Okay. Hi everyone.
As you know I'll be doing a short review of pileup and infrared
radiation annihilation, PIRANHA. It's a strategy for continuous
grooming that my collaborator, Patrick, Eric, Jessie and I have
developed.
Before I go on, can someone verify, I've been having
problems, when I go through the slides, are people seeing this?
>> Yep.
>> Thanks. In my talk, I mentioned that grooming is
important and it's a procedure for removing contaminating soft
radiation from our data. You all know this. It has experimental
and theoretical benefits.
And for, more on that in the questions, actually. For the
sake of this talk and for simplicity, I will focus on the
modified mass drop tiger or beta equals zero.
One of the main points I made is Soft Drop with beta equals
zero has a hard cut off and I looked at two events each with two
particles. E plus and E minus have soft particles with energy
fractions Z plus above the cut off and Z minus below the cut off.
Even though the events start close together in event space
in a way that we can make precise, they look similar because they
straddle this cut off. Their distinct final states after the
grooming procedure.
We mentioned how this leads to problems in predicting
responses to detector responses to Hadronization responses. More
on that in the questions. But it seems natural to ask if we can
ask for a continuous grooming procedure which does not present
this difficulty in this measure zero region of phase space where
we straddle the cut off.
We do implement -- we do introduce such a procedure or a
strategy for continuously removing contaminating soft radiation.
We call this PIRANHA. And it's based on some techniques that I
mentioned briefly in the long talk.
Now I will mention the intuition is we can think of this as
PIRANHAs or a group of PIRANHAs that we're optimally transporting
to eat up the offending contamination or event. I introduced
recursive safe subtraction. A tree-based implementation of this
PIRANHA strategy.
Let's compare it. You have seen Soft Drop many times. But
for now recursive safe subtraction here. They're similar, you
start with a jet and recluster it to get an angularly ordered set
of submissions and loop through the widest first. In Soft Drop,
there is a hard cutoff and we completely eliminate the events in
the heart cutoff until we find one that survives.
In the case of recursive safe subtraction, we have a set of
PIRANHAs that remember how much they eat. They're going to get
full.
At each step of the grooming procedure, we're going to use
up some grooming parameter and reduce the amount of grooming we
do in the future until eventually one of the emissions is going
to survive the grooming procedure and then we'll keep the rest of
the jet.
We can look at this on our simple examples from earlier and
because the PIRANHAs are eating up the event in the case of E
plus and E minus, we don't have the same issues of discontinuity
for the events straddling the cutoff.
First, one manifestation of this continuity is that the
distributions of observables tended to be more smooth in the case
of recursive safe subtraction. Here is an example of C1, 2.
Which is M squared of the PT2 of the jet.
In the case of Soft Drop, there is a kink in the
distribution. In the case of recursive safe subtraction there is
no such kink.
We saw how these, this few found continuity responded to
Hadronization and I'm showing C1, 2 on the X. And hadron level
C1 -- excuse me, parton on the X axis. Hadron on the Y axis and
will is a larger spread or less linear correlation as reflected
by the linear correlation coefficient here in the case of Soft
Drop.
More spread for Soft Drop. Less for recursive safe
subtraction when it comes to responses for Hadronization.
Detector effects, I presented this naive analysis when I want to
have no detector effects, I think about all hadrons in the jet.
In the case of post smearing, quote unquote, I consider only the
charged hadrons in the event.
We see it's not quite as obvious just by looking at it but
there is less linear correlation. More unpredictable nonlinear
responses of the Soft Drop jets to this very naive smearing
procedure.
So we learned that grooming is important which you knew. I
introduced the PIRANHA grooming strategy and recursive safe
subtraction implementation of that strategy which overcome some
of the discontinuities of previous methods and I'm looking
forward to all your questions.
>> Thanks, that was an excellent summary.
I think the way we will do this is we'll start with one
question from the discussion documents and then we'll also have
questions from the floor.
And do raise your hands if you want to interject.
One question from the document is as follows. You've
demonstrated that you have to choose the Z cut quite precisely.
It's sensitive to the goals that you're trying to I Chief and
what your event environment is.
How robust are these choices and do you see a way to reduce
the need for tunings. And you need to have sensitivity to the W
mass but you also need to deal with under lying retro pilot?
>> Thanks so much. I don't have a very complete answer for
this question and I think it's an excellent question that merits
further study. But let me show what happens when I consider top
jets for example with the same PT as the W jets that I showed in
my long talk.
The same choices of Z cut and the same re-scaling approach
to work well in that case.
It seems to me based on this naive example that because the
grooming we do scales with the energy of the hard process, that
if the energy scales involved in the hard process are similar,
the amount of grooming we need to do are similar.
On the other hand, if I increase the amount of energy
associated with the hard process, for example, here I'm showing
you jets produced in Z plus Q processes at 3 TeV rather than the
500 jets I showed early, because of the amount of energy that
we're grooming at the hard scale process and the energy
associated with the under lying event is weakly correlated with
the energy of the hard process. It seems that we need to be more
careful as we change the energy scaling of our processes.
Does that answer the question?
>> I'll leave it open for a moment in case anyone wishes to
ask a follow up?
>> Thanks.
>> Okay. I think we can take a question from Robin, go
ahead.
>> Hi, thanks.
I just got back from vacation so I haven't opened the Google
doc or watched anything in advance. But it occurred know, first
of all, this is super interesting method.
I'm going to read about it now.
But one of the issues with the modified tagger in Soft Drop
mode is you need to calibrate things back up to their proper
masses because you lose the mass peak.
And then it gets shifted. I was wondering if any work has
been done comparing, well, experimentally, maybe or even just
with the Monte Carlo to show whether this helps the peaks be more
on mass. And you sort of showed me on slide 9, a plot that a
little bit answers that. But I can't totally see the peak
underneath all the different Z cuts from Soft Drop in the blue.
>> Yes. Absolutely. Yes. Let me see if I have a better
plot here.
Looks like I don't. I have one in my long talk. Worst case
scenario, if I can't answer the question here, I will pull that
up.
But for now, let me know if this plot begins to answer your
question. Here I'm showing the modified mass dropped groomed W
jet at 500 GeV.
And you see the mass peak here appears close to W mass for
recursive safe subtraction in the long talk I talk about how with
a different choice of Z cut actually, we get a slightly shifted
value of the W mass or the value at the peak simply because in
the case of recursive safe subtraction, we're removing some kind
of set amount of energy.
Whereas in the case of Soft Drop, that is not necessarily
the case.
And so we do a very, very naive rescaling procedure here and
we'll do something more complicated in the future.
But for now, just a very naive scaling procedure where we
shift the mass by 2 percent to cover the W mass.
Please tell me, does this give additional information,
please help me follow up or kill the question?
>> Well, I think it answers my question. I think it means
that there's no improvement in -- you still have to do the
rescaling.
>> Yes, absolutely. You still have to calibrate. Thanks so
much. Thanks.
>> I wonder if, Matt, do you want to follow up because you
had a question on calibration effects?
>> I did. It might be a little niche but I guess if I can
try to summarize my question for you. One thing that we've seen
in ATLAS when doing all those studies for the UFO paper last
summer is that the jet response in ATLAS of signal and background
jets can be a bit different. And it can respond differently if
you have a jet with real structure in it or you have a Quark or
gluon jet to different grooming algorithms.
This is a total pain in the butt if you want to calibrate
the mass of the jets. There is a section in the paper about
this.
The question is, does RSS treat W top jets any differently
than background jets? Have you looked at what it's actually
doing in these different topologies?
>> I don't have a specific answer to your question, again,
thanks. I think it's really important and beautiful. First, I
will say I don't trust my calibration procedure nearly as much as
I trust the more sophisticated calibration procedures using that
paper and experimental analyses. We hope to have more slightly
sophisticated ones.
But one question I have for you is, do you know how the
issue manifests for different types of jets so we can address it
more in the future. I have another slide with plots for you.
But I can address the question in the future a little more
accurately?
>> Maybe it would be better if we follow-up about this.
Basically, in a nutshell, the problem is if you have a W jet and
gluon jet with the same mass and PT, you could imagine the ZG of
the jets is different or something like that. Because you have
more soft particles in one than the other, we think the jet mass
response differs.
And this means the calibration factors that you assign the
jets is different.
This actually prevented us from using some of the options
that we looked at. Recursive Soft Drop was sensitive to this
technique and we couldn't calibrate it. It's a weird technical
thing that we noticed that we're still trying to understand.
>> Thanks so much. In general, I will say, I expect very
naively that the increased continuity of recursive safe sub
fraction, for example, leads to better responses or behavior.
But without further study, I don't know. I wanted to show these
plots in case they show additional information for you where I do
the procedure for the QCD background at the same energy scale and
draw the ROC curve to present a little bit of additional
information.
I think to answer the question fully, I need to do a more
precise study. So thanks?
>> Okay.
>> I think we will have to move on shortly. But Akitya, if
you have a short question.
>> I doubt it's a short question. I'm sorry I didn't write
my question in the Google doc.
It's very interesting how the peaks are but the fact that in
Soft Drop that you have a hard cutoff that allows you the
understand the effect of the Hadronization quite simply. Because
all it means is that if there's a perturbative subject that is
close to the Soft Drop boundary, that the Hadronization
corrections for the boundary end up translating the theta
function to a delta function.
That is one of -- there are two non-perturbative effects.
One is how much the area of the jet is. How much you collect.
That scales us the groom at radius and the other is the boundary
effect. Which is as you pointed out in the talk, it's a sharp
boundary and end up being a delta function. Now that you have
softened the boundary, how do I think of these boundary effects?
How do I think of -- I mean you don't have to answer this
question now. This can be very complicated.
I'm happy to discuss this over the week. But how does the
softening of peak translate -- how does this boundary correction
that happens in the Soft Drop, or Hadronization, how does that
change or what does that look like in this kind of observable?
I also understand what the boundary effect means. Boundary
correction, if there is a subject that barely passed or failed.
If it barely passed, it passed adder and failed. It happens
right at the boundary.
The perturbative coefficient that is involved in this
contribution is basically puts a delta function for the Soft Drop
condition because of this sharp cutoff.
Now that you have softened the peak, how does that picture
change?
>> I think to be fair to the other speakers we should carry
on this discussion elsewhere. But thanks for the comment. I
encourage, also, if in other sessions we have follow-up questions
that we don't quite manage to get to, to add them to the document
or chase after the speakers.
Thanks, again, Sam. Now over to Yongbin for particle
identification with Graph Neural Networks.
Looks good.
>> Okay. Thank you. Glad to present our studies for what
we call semi-supervised graph NN for PUPPI. I'm at Fermilab and
this work is done by me and Pan and Miaoyuan and Shikun.
Computer science experts.
So to start a quick recap. We need pileup mitigation. And
there are previous studies using the charged hadron subtraction
and the SoftKiller which removes the low PT particles and PUPPI
which makes use of the neighboring particle information.
Recently, we had the machine learning studies for pileup
mitigation that using convolutional neural networks and Graph
Neural Networks and the problem with the machine learning
approach is that you need the graph information of the neutral
particles.
So this, so in a real case like full simulation or in the
data, we can do the -- for the charged particles. This is easy
because we have tracking information, tracking and word
information.
Because of neutral particles in the full simulation, this
information is currently really hard to recover. In the end we
don't have this information at all.
If you want a neural network with previous approaches we
need a perfect simulation or very good simulation of the data
that we are also discussing.
So the idea we're having here is how about we train our
model using the charged particles and then do the inference on
the neutral particles? Doing it this way, this is called the
semi-supervised approach. This would allow us to train directly
on real data. So full simulation. We can just train on the data
and apply it on the data. And also I want to emphasize, we have
in this study the -- our own model. But the semi-supervised
training strategy will work on other machine learning models. We
need to control the input features and make sure it transfers
well from charged particles to neutral particles.
And just for illustration purposes, this is the distribution
made by the CMS collaboration and this is a feature of the
particles. We can look at the neutral particles and the charged
particles and they look similar. We can make use of these
features and train on the charged particles using the
reconstructed information. And then apply it on the
[indiscernible] particles and that is the basic idea.
This is still on the fast simulation. And we just take the
PUPPIML datasets as the training datasets.
So we have training on the 80 and 140 pileups and we train
on 80 and test on 140. In this specific dataset, the flag for
the charged particles are assumed to be perfect. There is no
mislabeling. In the real data case we manage to handle these
things but here it's sort of like a toy model. We don't have
this problem.
>> You need to speed up a bit if you want time for
questions.
>> We do the model in a phi space and connect the particles.
And we do this, what we call the -- model. When the graph --
this pass from the neighboring node and we apply K here and then
the message is we do an average, the mass is sort of the average
of the gate. When we update the node information here, there is
another gate that is sort of the ways of the neighboring
particles and the ways of the node itself.
So two layers of convolution and two layers of MLP. We take
the GNN output and PUPPI weight and get the final score.
This is like masking procedure, we mask all the randomly
selected charged particles and mask them with the neutral
particles.
This is a performance at PU80. This is like the
semi-supervised approach. The PUPPI scores are here. These are
better compared with PUPPI. And with supervised states it's
similar.
And here are similar performs. We still observe
consistently better performances.
We test some performances on the high level physics
variables like jet mass. You can see basically the supervised
learning and [indiscernible] have similar performance and better
than PUPPI.
And also at 140.
This is like the display where the truth particles and the
PUPPI particles against the Graph Neural Networks and most of
them get cleaned.
Some weight distributions on the GNN compared with PUPPI.
The GNN is more peaked on the two ends and the PUPPI weights have
flatter dispersions.
And in summary, we do the semi-supervised learning and train
on the charged particles now and apply on the neutral particles.
We're working on testing the performance of this technique on the
real data and full simulation. In that case it's more
complicated and we are working on this. That's it. Thanks.
>> Thanks for -- sorry to rush you.
Once again, I will combine a synthesis of two comments from
the document. PUPPI and GNN seem to produce quite different
weights and also in terms of the, your goal is in the eventer
display, you can eliminate a few more neutrals that PUPPI doesn't
manage to deal with.
Do you understand what the differences are in the inputs
and, or what is it in the inference that the Graph Neural Network
is picking up on? Would it be possibly to use that information
to improve our non-ML techniques by adding JPT or isolation
requirements?
>> Okay. Thanks for the question. Yes. So basically, if
you look at the inference and the Graph Neural Network. The
inputs are like the PT, charge. It's pretty -- almost the same.
Our goal here is we want to make use of the particle feature
itself. What the -- does cut on the PT plus the neighboring
feature and combining them together.
Instead of the same poll as PUPPI here which defines the PT,
et cetera, the GNN makes use of the neighboring features and
explores more complicated structures. Like, for example, it
could be PT square root divided by delta R or whatever. It
depends on the delta R of PT. This is something we think it
learns.
We think it's something like similar to PUPPI but it does
better job. That is why we want to reduce the input variables
and just make it small network as small as possible. So we can
be more comfort that it really does work.
If you look at this, these are distributions. These are
like, if you tune really hard, you can probably remove the
neutral particles. The idea is here let the GNN tune the matrix
such that there are pileup particles that are not far from where
the leading particles can be claimed. Like this one. It's sort
of close to the -- particles but not so close. These particles
can be really clean.
I have more displays here. Like these particles can be
cleaned.
>> I think the question is in some terms about
interpretability. We know what you've put in and your network is
getting some results but do you know how it is using that
information? Maybe this is something that is still further work.
>> Yes. We need to understand it better. But for now, it
makes use of, it can find a better matrix. Like PT square
divided by delta R or something like that. Which is a function
of the PT. Instead of a relatively straightforward matrix as
PUPPI.
>> Thanks.
Let's see. Raise your hand if you have spontaneous or would
like to ask questions that you posted in the doc.
Maybe while people are thinking we can have one more. You
showed that you're able to transfer the training. On the other
hand this will limit you to using only the information that you
can treat identically between the two. I guess, maybe this is
slightly broader than the scope of your specific study, but do
you foresee a way that you can do a better association of the
labels for the neutrals and/or extend this to using --
information, bearing in mind their charged and neutral terms may
differ?
>> I mean, okay. So for the neighboring particles, we can
still use low level showering information of the neighboring
particles. For the target particle, it's not straightforward to
make use of such information.
But we could think about doing some training, for example,
like for photons, we can still do matching for the, at the
simulation level.
And the real simulation and then train on a small sample of
photons, like making use of -- showering information. And then
combining the training of this -- plus the training on the
photons together. Maybe that would do a better job. I don't
know. That remains to be seen. We're studying the adaptation to
transfer the charged particles better to the neutral particles.
Maybe we can also do something there.
>> Great. Thanks.
Okay. I think we can wrap it up here. So thank you.
Our last speaker today will be Frederic on jet tagging with
graph networks on the Lund plane.
>> Hello, can you see me?
>> We hear you. Your slides are gone.
>> The slides are gone, sorry. One second.
It says I'm still sharing. I will just re-log into Zoom.
Okay. Sorry about that. Can you hear me now?
>> Yes. Looks good.
>> Sorry for the technology problems.
Thanks for giving me the opportunity to talk about this
work. I haven't prepared a separate set of slides so I will go
through the full presentation and skip parts.
I will talk about work I did on the jet tagging in the Lund
plane with the Huilin and a bit of work with Gregory and Adam.
So most of this is around using the Lund plane as an input
to machine learning model so I will start by giving a brief
overview of how the Lund plane is defined and how it can be used
in a way of representing jets. So the idea is that you represent
emissions in this log/log plane of the log of the angle of the
emission and the log of the transverse momentum. This is useful
because it separates out different kinematic regimes.
Non-perturbative contributions are located in the lower part of
the Lund plane. So you can move contributions from that regime
by imposing cuts in Lund plane in the momentum of the emission.
And soft colinear emissions are uniformly in this plane at
leading order.
So you can have a region of the Lund plane that is purely
dominated by perturbative radiation and another region of the
Lund plane that is sensitive mostly to non-perturbative
emissions.
We can use this representation as a way of creating
essentially fingerprints of jets through the use of Cambridge
Aachen clustering sequence. We go through the clustering, the
branches of this sequence and at each step we define two sub jets
ordered in transverse momentum and save the kinematics that
correspond to that branches.
The information we save is the two coordinates of the Lund
plane and like the momentum function of the splitting, the mass
of the pair, the azimuthal angle and we repeat this procedure on
both subjects until we have the full tree of the architecture
clustering so we have a node of Lund kinematics of this tuple T
for each of the clustering along the sequence.
So we can map essentially, a jet onto a tree of these Lund
de-clustering from the clustering sequence. So you have here a
representation in terms of Lund plane of each of the emissions
with secondary planes here and tertiary planes that corresponds
to a binary tree with kinematic information for each of the
splittings along the plane in the tree.
One sequence is the primary sequence that can be used for
measurements and visualization.
One thing that we can do from this plane is use it for
identifying the origin of jets. So in particular, if we can
include information from all these branches at once, so this
fractal representation of the Lund plane which many secondary
branches and tertiary branches, this provides a strong basis for
jet tagging. In particular in regimes where you have complicated
topologies. Like top decays or some decays of the Higgs where
you have information that can be in secondary branches. It's
necessary to take into account the full Lund plane and not just
the primary plane.
So the way this can be done is by treating the, each
de-clustering on the Lund tree as a node on a graph. So we map
this tree of Lund de-clustering to graph where the edges of the
graph correspond to connections along the Cambridge Aachen
clustering and then use the coordinates of each splitting?
>> Sorry, I hate to say this, but do you think you can wrap
up in one minute?
>> This is the structure of the neural network. You can
look at in a full talk or on the paper. And the summary is that
for things like top tagging you have on the right, background
rejection of QCD against top efficiency. You can see a factor of
2 or 3 improvement against particle net for some of these LundNet
models. We designed two models based on different kinematic
inputs. We see a significant improvement and there is also
computational complexity is quite a lot lower because you can use
essentially the structure of the Cambridge Aachen structuring.
You don't have to do a nearest neighbor search of two
particles within the jet.
And so I think, I can leave it at that. You can look at
more information on the talk in the paper.
Please go ahead with any questions.
>> Thanks, again, sorry to rush you. I think we can start
with a comment that was foreshadowed earlier. Which is that, you
have shown that you can get some improved December crimination
using these LundNet models. And this seems to be based on using
more information than exists in a jet where the, for reference
you have something like a 5N minus 5 dimensional input to the
LundNet five versus 3M minus 4 for the phase space.
What you seem to gain from this information in the ideal
case, it also degrades rapidly. Do you think it's possible to
provide a rule of thumb for what information in jets is robust
and resilient as an input to these neural networks.
>> I think, so the reason the LundNet with more information
does poorly in the resilience plot is because it has as parted of
the inputs the mass of the pair of particles. This is sensitive
to emissions that are further down the tree if you have a soft
wide angle emission further down the tree. Even if you remove
the node by imposing a KT cut, you still have some sensitivity to
it in the value of the mass. Like some pair above, in the tree.
So because of that, you basically have some information
about the low KT, Lund plane region in the mass that you feed
into the network.
So that's why I think that's the main reason why you don't
gain any resilience. I don't think it has so much to do with
dimensionality. It has more to do with the physics that these
networks get as input.
In some sense, whether there is redundant information, it's
not really, I don't think it's really makes that much of a
difference.
For example, in the input you have here the KT and the
momentum fraction Z. Those inputs are quite heavily correlated.
If you have delta and KT in some sense, you kind of know Z
already. But it's still given as input. So we found that in
practice, there was a small performance gain in including it.
But you are adding in some sense redundant information. But
that does not reduce the resilience. What reduces the resilience
is the information that adds sensitivity to regions of the Lund
plane that you want to remove. To be insensitive to
non-perturbative effects, for example.
I don't know if that answers the question or not.
>> Thanks. Well, seeing as there was earlier question on,
in general, the sorts of information that is useful to put into
networks, maybe I'll give a chance for that question to be
followed up on.
>> Sorry, what was the question? About what physical
information needs to be on the input?
>> Well, okay. There was some discussion earlier on
low-level input versus structuring networks and more broadly, I
guess, the question about how, what choices one would want to
make about what is useful information and to feed to a network.
Bearing in mind robustness against modeling effects and so on.
>> I think the main problem with giving it low-level
information, like directly particle-level information is very
difficult to add in, have some handle on robustness once you've
given that it that kind of information. With structured
information, the only reason we can do these plots here with
resilience is we can add a cut in the Lund plane that we can
increase and that slides the model up in resilience and down in
performance.
If you take a model like particle net, you have a model and
then we cannot make it less sensitive to non-perturbative
effects.
At least not trivially.
>> Fair enough. There is a hand raised from somebody.
>> Thank you for the great talk. I was wondering, if one
wants to construct X to BB tagger, you think this kind of
LundNet, how one should proceed, theoretically sit well motivated
to look for a two-body resonance tagger using the LundNet
construction?
>> Yes. In principle, it's a heavy enough that you have
both boosted, the particle is boosted enough that this is a
single jet with two B Quarks. I mean, you can -- yes.
>> Recently, it was shown that particle net actually also
improves the Higgs to BB tagging. So I was wondering if similar
can be studied or it's expected.
>> No.
>> Okay.
>> I would imagine so, yes. I mean, with Monte Carlo data
it just takes a few hours to train a model. But other than that,
it's a straightforward application. So I would imagine.
>> Okay. Thank you.
>> Thanks.
>> There was a recent paper on Higgs tagging using the Lund
plane. So presumably it's quite similar.
>> Thanks everyone. Now, technically we're a little past
the end of the session. Unfortunately, any further discussion
will have to wait.
But I think we did a good amount already today. Thanks
again to the speakers and those who participated. And once more,
our various means to carry on discussion afterwards.
So in case the local organizers want to round off?
>> I don't think that we had anything else to say. Thank
you very much. I think these discussions were really interesting
and we'll pick back up tomorrow at 3 p.m. CERN time or converted
to your local time zone.
Talk to everyone then or before that on gather town or
elsewhere.
Cool, bye.