August 2, 2021 BOOST Conference "This text is being provided in a rough draft format. Communication Access Realtime Translation (CART) is provided in order to facilitate communication accessibility and may not be a totally verbatim record of the proceedings." >> Hey, Andrew, can you hear me still? >> Yep. >> As usual, we're getting exponential increase in participants. So maybe wait a minute. >> I think we can get started slowly. And people can join during the introduction. Just so that we don't start running late already. Okay. >> I'll start recording. >> Okay. You should be able to -- okay. I guess I can't see everyone now. Good. Okay. So to everyone who is already connected and those joining us, as I go through these slides, welcome enthusiastically to BOOST 2021. It is online again this year. I look forward to seeing everyone in person again, hopefully next year. But we hope that even though we're online, we have a very enjoyable few days of thinking about physics ahead of us. Okay. The conference is online this year and we've changed the format a little bit. It's quite different, actually, than previous BOOSTs. The major change is that all the plenary talks have been prerecorded and they're all already available and they have been, most of them, for almost a week now. You can watch them at your leisure. The plenary talks will have already been watched by you, the participants, and the live periods of the conference are devoted to discussion sessions on the results themselves. But we won't go through the prerecorded videos at length. Instead, we'll have small reminders of the most salient aspects of the different results that we're going to talk about. This is a very good pro tip, if you haven't watched the videos yet. You can watch them at higher speed in a video player. 2X speed is maybe a little fast for me. But even 1.25 or 1.5 might mean you can watch a couple of extra videos and participate more effectively in the discussion. I want to emphasize that the format probably won't work if people haven't watched the videos, so we encourage you to talk a look and put discussion points into the Google doc ahead of time. We have 29 excellent talks which cover a really wide variety of subjects on CDS already. There is experimental representation from ATLAS, CMS, LHCb, and STAR this year. We have, in the end, over 7 hours of content. It's hard to watch it at the last minute. Hopefully, you've already seen some of the videos. I mentioned the schedule that we plan on this week. Today we have three live summary talks. One from the theory side, one from the experimental side and then the recent summary for end all jets. And then we'll have our first discussion related to ML results and pilot mitigation, one for jet tagging. The next few days we will have discussion sessions about various topics. On Thursday, the end of the day, we'll close the conference out with a panel discussion. We have a great panel, and you can put questions or topics that you would like to discuss in the Google doc and I think the idea is we want to talk about where we want to BOOST to next. Either in the next year or in the future as we think about future colliders, the FCC, the EIC and things like this. I guess it's also up to the panel to come up with the BOOST catch phrase for this year. But I think that task is in good hands. We wanted to make a couple of comments about the videos and some of the things that we've been trying to make sure were done for the conference. All of the prerecorded videos have been captioned. You can enable the captions, they're not turned on by default on the video player by clicking this button and turning on the English subtitles slider. The live event will also be captioned. We have one of our awesome captioners already connected. You may see the subtitles on your screen. You may need to turn them on with the live transcript in Zoom. It's also possible to follow the transcript in a separate link and you can click that. One of the reasons that we changed the format is to try to make things more accessible. We have a lot of collaborators around the world and we are in a very, sort of, Europe, US centric time zone with the live sessions. So it was our hope, partially, for people who are in Asia or Australia or otherwise just can't connect for full days of Zoom meetings, the prerecorded talks might give them a chance to see some of the things that we're talking about and asynchronously participate in the discussion. Which is not the best, but maybe it's better than other formats and it's something we wanted to try. Okay. I've mentioned that we are collecting points for discussion in a Google doc. So I won't repeat myself here. You can find the link to it on the Indico page. You have to be signed in to see the page. If you don't leave your name in the Google doc, the chairs in the sessions can't call on you. We encourage you to leave your name but it's not mandatory. During the discussion sessions, we don't want to be too formal about them. We encourage you to turn your video on if you want. You don't have to. Some people think it's nice and some people can't. And that's okay, too. Yes, the Zoom information is on Indico. I assume that you found it if you're listening to me now. If you want to ask a question, raise your hand and we'll call on you or the chair of the session will call on you. Breaks will take place ton gather town. You can find a link to that on the Indico agenda. I put a link here to the Slack from last year. If you click you can join it. That might be useful if you're trying to get in touch with someone or you just want a chat room for something. Now, the gather town is really cool. I was checking it out before the conference. We have it open basically 24/7 right now. There's lots of stuff that you can do. There are games for relaxing and private spaces for arguing. So I'm sure people will do both of those quite a bit. Feel free to join in and mingle and socialize as much as you want. We really want to thank Anna Benecke for her help in making the space really, really fun. So thank you, very much, Anna. The last thing we wanted -- the second to last thing we wanted to mention is the BOOST community values are linked from the conference webpage. We strongly encourage, we expect that all participants have read the community values. It's important that we BOOST each other and be, provide great guidelines that it's important everyone follows to make sure we do. If any issues come up during the conference, which you feel, well, if you feel there is an issue, particularly if they go against any of the community values that are written on this page, please feel free and you should contact Ayana and Jessie and the local committee. Whoever you feel most comfortable contacting and we'll do our best to address it .P. The last thing that I wanted to say is a huge, huge thank you to several people. First thanks is to Connie who is part of the local organizing committee and she has been extremely helpful trying to organize all of the behind-the-scenes things for the conference. There are more than you think for an online event. We really want to thank the IAC who provided a lot of advice and feedback as we were putting together the program for this week and provided quite a bit of financial support for the conference. And finally, we want to thank CERN, particularly Joachim the director and the financial team. Finally, we want to thank all of you, the participants, especially for your hard work reviewing contributions already for the conference. Okay. That's all I had. The last thing to do is to BOOST, maybe you're BOOSTing from home or BOOSTing from some place that looks like this around CERN this week. If there are any questions for myself or any other members of the LOC or any other things that people want to ask right now, I'm sure we would be happy to answer them. Otherwise, I will hand things over to our first speaker. I don't know if I can see if hands are up. Yes, I can. >> Siri had a question, sorry. Okay. I don't see any raised hands. So Wouter, take it away. >> Let me make sure everything works as required. Okay. Good afternoon. It's a pleasure, well at least good afternoon for me and people in this time zone, good morning for those in the US or wherever you may find yourself. It's a pleasure to give this theory introduction at BOOST. When I was asked to give this talk, I was told I had cart blanch to talk about whatever I thought was interesting and relevant to the BOOST community. This is a personal selection of some theory developments. I will try to keep it not technical by instead emphasize the ideas. What is it. How does it work. And most importantly, why should you care about these developments. Here is an overview of some of the topics that will come back. Track based measurements. Precision based measurements. Anti-KT jet functions at NLLO. Non-perturbative effects and spin correlations. I'm a theorist and sometimes our view of reality is more like this. But for the experimentalists who are here, they have to deal with other complications and I'll try to bridge this gap. But there should be plenty of time for discussion at the end and I encourage that as well. The very first topic I want to talk about is calculations for track-based measurements. The motivation for doing track-based measurements is twofold. First of all, you have much better angular resolution. Otherwise, you're limited by the calorimeter cell size. You can remove pile up and identify the vertex where it comes from as you can see in the picture here. To maybe make this more complete, you can see a measurement of the jet mass from ATLAS. And to be precise, it's not the jet mass, it's the jet mass divided by the PT of the jet and then take the logarithm. Then you see all particles using tracking. First of all, you see here large uncertainty of the data. That's the gray bands here. That is mostly coming from pileup because the groom that has been used is not very aggressive. If you pick a smaller -- you more aggressively see this less. The other region, away from the region where you have a lot of pileup, if you look down here, also here there's a large uncertainty, if you use all particles versus using tracking. That is something that you can gain by using tracking. Well, so you may want to do a track-based measurement but can we theorists handle that. They're the challenges where if you try to do a calculation for a track-based measurement you run into divergences. So you do calculations of Quarks and gluons. Take into account Hadronization effects. This is not too complicated in principle because we've done this before. We talk about parton distribution functions and fragmentation functions something similar happens. A partonic calculation contains divergences and that are absorbed into a non-perturbative object and in this case that is a track function. To give you intuition of the track function, it's Z of an initial parton that is converted to charged particles. It hadronizes and we get two plus and a Pi zero. In this case you say the momentum fraction in terms of charged particles is Z1 plus Z2. If you compare this with the fragmentation function for pions you get two contributions. You find either this Pion or this pion, the two terms are added up. While fragmentation function, we look at one hadron at a time, these track functions look at all hadrons at the same time and that leads to different behavior. So, when you looked at this -- well, several years ago, 2013, she is Jessie, we first had a formalism and thought let's apply it and we picked track-based thrusts. You can see data from Delphi. And the measurement of thrust in terms of measurements of tracks and all particles and thrust is like something for jet mass. The uncertainties are smaller on the data when you're using tracks. So we went ahead and started calculating and the calculation is fairly complicated. One of the ingredients you need is a jet function that describes both the contribution to the thrust from the radiation and you also need to know in addition the momentum fraction of the charged particles in the jets. It's the other variable X. It really probes the details of the track function that appear in there. So you need to know the Quark and gluon functions to make the prediction. Maybe you can extract that from the data in some way. If you're interested, I can tell you more about that but after the talk. After you've done the hard work, you get the plot out and find out for basically almost the whole distribution, the tracks and calorimeters of all particles are basically the same except for this peak here. This peak you have to deal with other non-perturbative effects. Not just the effects of converting to charged particles but from having gluons with small transfer momenta. You can feel a little dubious about this. That felt a little unsatisfying. I fine, fine, the data, these curves are very similar. Maybe that is just what it is. But we do all this work and you find curves that are lying on top of each other is maybe not the best. Thankfully, there has been some developments since then. Quite a bit is looking at new observables. I'll highlight two examples. The first is the azimuthal angle in the Z plus jet production. Here you see the boson, here you see a jet, red blobs are incoming protons and we're looking at the azimuthal angle which is the angle between the jet and the effective boson in the transverse plane. The key ingredient here is this one that we use the winner-take-all axis. By doing this, the whole jet -- the effective soft radiation and jet finding becomes suppressed as a consequence of that, we don't need to know about the details of how soft radiation is converted to charged atoms. That doesn't play a role in this story. This is the effect of doing the track-based measurement is suppressed. It starts next to next leading log accuracy. And in addition, it didn't require full details of track functions but we need a single number for Quark jets and another for gluon jets. To make this more concrete, we can look at the distributions for this angle delta phi in Pythia using all particles are using only charged particles and the curves are basically on top of each other. If you look at the ratio plot below, within statistics is one except for maybe at the very end of the distribution. This is an observable where you see small effects going from all particles to charged particles only. And the effect, we can understand is theoretically and theoretical calculation, not very difficult. If you wanted to properly account for this conversion, we can do this with not non-perturbative functions but basically with two numbers. That is, I would say a positive development compared to the complication of this. Where also, again, there is not much effect when switching from tracks to all particles or vice-versa. The next example is a case where the effect of switching two tracks -- there is a really effect. It's not that the observable doesn't change. But this is an example where implementing tracks is easy. And the example here is energy-energy correlators. These are not cross section measurements. These are really weighted cross sections. So you take the cross section and you look at two points at some fixed angle chi. That is this angle here. You take the energy deposits at these points that are separated by this angle and you weight, you take the cross section weighted by these energies. The reason this is easy to convert to tracks is you just have to say, well, instead of measuring the energy in all hadrons, I just want to know the energy of charged hadrons. You multiply by the momentum fraction. And then this X is drawn from the track function but actually I can just put brackets around all this and it just, a single number which is a moment of the track function. I should mention here, if you want to do these measurements at very small angle, chi, you want to, of course, use tracks. So something that I've been working on recently with Yibei, is the calculation for both the track function evolution and the track based EEC. So I just want to briefly show some results. So this evolution of the third moment of the gluon track function and this very first term is just the same as you expect for the evolution of the third moment of the fragmentation function. We get additional terms like this one that involves two track functions or this one is three track functions. And this nonlinear behavior arises because we're not measuring a single hadron but we want to know about multiple hadrons in the final state. We can use these results to get track-bailed prediction for the energy-energy correlator and this plot is for the asymmetry of the energy-energy correlator which is plotted here. The angle chi again. The orange is the leading order. And then in green, the NLO or alpha squared prediction. If you want to see how this compares to data, the INT is helpful. There is more work to done because we should do the resummation. Okay. Something a little related to track function that I wanted to highlight is a recent development by -- where he looked at the spectrum of charged hadrons and the perspective he took, if you think of Quark or gluon, it will emit radiation and that is described by perturbative physics and at the end of the day it will hadronize. What if we just pretend that all this is produced by this evolution, this fragmentation process and the Hadronization is basically irrelevant. The key thing that he did, not just normal evolution, but did evolution that involves the resummation of the small ZH resummation. It's not a normal D -- evolution. At the end of the day he used three parameters. The cut off of the evolution at small scales. How you freeze a coupling and a normalization factor. With the three-parameter model, he could compare to various measurements of the charged hadron spectrum at different energies and the results are very promising. This is perhaps the best picture. You get this red curve, this prediction and it goes nicely through the dots. Here is a momentum fraction, log over one of the momentum fraction. And there are regions that even though the agreement is fairly good, you shouldn't trust this prediction because of the hadronized effects to worry about, and the other end of the distribution, you have to worry about not just small momentum fraction de-summation but the DGLAP. This looks promising for fragmentation, can you use it for track functions. It would be cool to get the results for the track functions. Okay. Now I want to look at precision calculations and I'll look at some different examples. The first one is the anti-KT jet function at NNLO. And maybe just to break this down a little bit, if you fry to do higher orbed resumed calculations, then you have to worry about, well, one of the things you need to calculate is this object. And it has to do with, for example, three colinear partons and then calculating the effect of the jet algorithm on this. We have an incoming one that comes out of the collision and this produced. This object shows up in Higgs plus one jet production exclusively. And this whole formula for the cross section you don't have to digest. But I wanted to highlight this jet function appears in a formula. It's something that is relevant for physics. The problem is that with these three different partons, actually applying a jet algorithm becomes more cumbersome. If you say three partons and parton I and J are first clustered, then you have to impose this is measuring the smallest. Secondly, you want to combine the I and J and that distance should be small enough to lie within one jet. Of course, this doesn't sound too complicated by the challenge is you cannot do this numerically. There are divergences. You have to isolate the divergences and you can do the remaining integrals numerically. There was a recent calculation earlier this year where they went through different clustering histories, they did sector decomposition and soft subtractions but they managed to do it. This is the result for this jet function. I should actually add this is only the Quark jet function. They didn't do the gluon case yet. This is for a specific choice of scales and also some logarithmic terms in there. These, you can already get from other results through consistency and provide a check on their calculation. So you might say, well, this is, this cross section you're talking about is maybe not the thing I'm interested in because it's looking at a whole jet and not the jet substructure. But the fact that you can do such a calculation holds promise for doing these higher order calculations for jet substructure. Okay. Another topic in the context of precision that I want to talk about is precision calculation for Soft Drop. This is a bit of overkill to talk about how Soft Drop works but very briefly, if you have your jet, you recluster it using the Cambridge Aachen algorithm that results in an angular order tree that you see here. What you do is go through the jets and at every splitting you check whether the splitting is too asymmetric. The basic condition you use is this one here. If, instead the Z, which is the momentum fraction of the softer sub jet, if it's smaller than, you throw away the softer sub jet, the softer branch, sorry, I should say branch. You continue until this plane is relatively symmetric and then the momentum fraction of the soft one is called ZG. And the two branches where the algorithm stops is the RG. Well, what we did is we looked at the momentum sharing and we wanted to go beyond the leading algorithm mic access. What makes this interesting is it probes the -- function and it's measured extensively. There was a lot of data to compare to. How do we get to our result. There is a lot of technical things involved. But I want to actually just give you a picture. These are the pictures that you may have seen before. You can picture emissions in a jet. On this axis is the angle of the emission. As you go to the right, you get closer to the center of the jet. As you go up, you go to more and more soft emissions. And in this picture, you can see the region where the emissions are not allowed. If I know the value of ZG and the group jet radius which is this line for ZG and this for the group jet radius, and everything below this line is groomed away. Things above this line, they are not allowed. Because they would pass the grooming condition and then the Soft Drop will terminate there. But it should stop only at this point here. Well, so this is like the intuitive picture you can have. You can see how the different measurements show up as different lines here. The theta G measurements. The ZG measurement and grooming condition. Turns out for the point of view of the soft effective theory, you can identify the corners of this picture as different degrees of freedom or different modes in effective theory. The emission that passes grooming requires a little special treatment so it's not sitting at the corner of the plane or of this red area. But it's still needed to be included in some way. A final thing to note with this kind of picture, you can also see non-global logarithms. These are things that theorists worry about where you have a boundary in phase space and you have a different restriction of radius than on the other side. For example, this line here, this is the boundary of the jets. Inside, we cannot have emissions that are too energetic because otherwise, the Soft Drop algorithm would stop and that -- that we know that we wanted to end up on this line for the measurement of theta G. On the other hand, emissions outside are restricted. That is an example of a similar boundary like here, we have the same angle scale but different energies that play a role. Okay. If you want to know more about the ins and outs of this, you can watch Pedro's talk. I want to show you one plot from the results. Well, you can see the distribution of ZG, the ATLAS data, and then our predictions at leading log and NLL prime plus LO. So you can see that going to a higher order perturbative theory, you reduce the uncertainty, you see the predictions are consistent. And if you get a better agreement with the data, you can go to higher order. So I want to switch to talking about non-perturbative effects. But still within the context of Soft Drop and still at some level related to precision calculations. So the starting point of this next item is that an analytic understanding of non-perturbative effects of Soft Drop. If you look at the Soft Drop -- not just Soft Drop in general but look at the groomed jet mass, you have the hadron level prediction, you can see how that is different from the parton level prediction. The leading effects are governed by two different terms and these terms involve non-perturbative parameters. These are numbers that you have to get from data. There is one here and there are two here. And one of the things to note, there is an explicit beta dependence. If you get these values for some number of beta, you can apply for a different value of beta. Here is the cross section showing up again and you get these coefficients. The coefficients, Z1 and Z2, you can calculate the perturbative theory. Maybe to give a physical intuition for the terms, the first term describes the effect of non-perturbative radiation that is inside the jets. And at some level goes with the area of the groomed jet. And so if you calculate Z1, you find this is basically related to, like the groomed jet radius, the average groomed jet radius, and Z2 is related to the effect of non-perturbative radiation when -- stops. That is a bit of a different effect. At this point or previously, the C1 and C2 are calculated at leading log. What these authors did is extend this to next leading log and the approach is soft colinear effective theory. You can draw the new plains with different measurement lines and I'm not going to go through the different lines and what they respect. I want to show you the results. Well, for more you can see Adi's talk. Here you see the C1 parameter and C2 parameter plotted. And you can see here the predictions at leading log and NLL in blue and orange and compared to extracting the parameters from Pythia. The cut off is at some point you get into the region where you get into a fixed-order region. So you maybe shouldn't be using this approach. But you can see there is good agreement between the resumed calculation and the parton showers and that by going to higher order, you reduce the theory uncertainty. So staying a bit with the topic of non-perturbative effects, I want to talk about something further away from maybe where many people in the BOOST community think about since this is E plus E minus physics. But there have been extractions of the strong coupling from E plus E minus event shapes. These are some, claim to be the most precise extractions because the theorists are very, very small. And well, one example is the C parameter. By these people here, well, this is the definition of the C parameter. You sum over the particles that are produced in the E plus E minus collision and do the calculation at fixed order and then extract alpha S by fitting to the data. That gives you this value that is above the PD G-value. With summation you still have a bit of the PD G-value. With the non-perturbative effects, you go down a lot. There are more refinements you can do but it doesn't change it too much. The thing you worry about here is this big jump, as you go from here to here, that is all non-perturbative physics. Well, the question is, are you, do you understand this correctly. In particular, this non-perturbative correction is derived really in the two jet limits and then at some level, extrapolated as being the nonleading order correction. And the question is, does this work? So there was an explorative study at the end of last year where they looked at the C parameter and they looked at the non-perturbative effects at C0 which is where the Sudakov peak is and the C equals 3/4. The shoulder. The Sudakov shoulder means at lower perturbative theory, the C parameter stops at 3/4. It's zero above. This is the Z axis and this is the cross section. I should have included the plot. It jumps from a constant to zero. So of course, this gets smeared by higher order corrections but these are two special points. They looked at what's the effect of non-perturbative emissions on C. And at C0, you can calculate the effect and you find this number. This is basically what would be used everywhere. At 3/4, they did the calculation for how much the C parameter would change. If you then calculate the effect, you find something that is almost half the value of what you normally expect. Non-perturbative effects lead to a smaller, like more non-perturbative effects means smaller alpha S. If you have less number of effects, maybe you get a larger alpha S. Of course, this parameter used to describe non-perturbative effects are at C0 and C and 3/4. You can make different lines between these two points. Then you can do the alpha S extraction for the different interpolations and that is what happens here. The standard one is right here. It's not actually, I was surprised, it doesn't give the smallest alpha S. As you go between this value and this value in different ways, you get values of alpha S consistent with the PDG or some C parameter. This is of course, potentially relevant for alpha S from jet substructure. Okay. So now I want to look at non-perturbative effects one last time from a different angle. Because there has been some work on actually making the Hadronization model in the Monte Carlo consistent with predictions from factorization. The motivation for doing this is that you can then have a direct comparison between parton shower and analytic summation. For measurements like the top Quark mass extraction you get an M top from the Monte Carlo but can I use that directly in the analytic resumed predictions or predictions of the cross section. You can't make an apples-to-apples comparison. In this case, well, what are you seeing here? On the one axis, is the variable that describes how much soft radiation you have. In the other axis, you have a variable that describes how much extra, like extra contributions of the C parameter from non-perturbative physics. So the way you can read this is at small K prime, we get a lot of extra, you get a larger shift in the C parameter and for larger A prime, you get smaller shifts. That is, of course, in line with what I was saying on the previous slide. Here they took the perspective that say from factorization, we expect the shift to be always the same. Can we make the Hadronization model do that and the answer is yes. So here you see that independent of K prime you basically get the same value for K or the same distribution for K. Of course, yes, this is interesting. The real question is, what should we use for the non-perturbative. Because now we're making it fit some prediction but we should, of course, know that is what we should be making the Hadronization model do. So the very last topic I want to address is spin correlations. This is, well, again, related to the energy-energy correlator that I mentioned before. For the three-point energy correlator, this is sensitive to spin effects. So here you can see three partons, the Quark producing the Quark and a Quark anti-Quark pair. And these three are being probed by a three-point energy correlator. If you look at the situation where this angle is small, then this gluon basically becomes on-shell and you can talk about the gluon and it turns out that, an interference between different helicities depends on the azimuthal angle. A dependence on the angle around this splitting of the gluon. That's this phi here. If you calculate this at order alpha S squared, we find the following result. You get some number out and now, I should say I'm only focusing on the dependence on phi. There is also a dependence of L and S. If you look at the phi dependence, you get a cos2 phi dependence here. There is an unfortunate thing here because in the real world, NF is equal to five and we cannot change it. If you insert five in these calculations it makes it harder to find the effects. This was an order alpha S squared calculations. There is a resummation to be done. And for that, I will refer to Ian's talk abnormal the light rate OPE. If you look at the result that comes out. This is close to five modulation that is plotted here for Quark jets, it's not very visible. But with B tagging, it becomes more visible. That is shown in this plot here. Basically the last thing I want to mention is that spin correlations have been implemented in the pan scale shower and they can look at the same observable. They find very good agreement or excellent, I should say: Because they're right on top of each other. The pink curve in this plot is the analytic prediction. And then the blue dots here, they come from the shower. For more on this, and also for discussion of other observables, I will refer to Alexander's talk. That brings me to the end of my talk. Maybe just to summarize it, a bit of a story rather than firing all these separate tidbits at you. Calculations for track-based measurements are possible and we are extending this so order equals alpha S squared. Related to this is the relation that you can get the charged hadron spectrum from ZH resummation. We can talk about extracting the track function from data. And then I talked about precision calculations. For example for Soft Drop that allows you to make precise comparisons to data and is an ingredient for the description of non-perturbative effects for the groomed jet mass. And also I mentioned the anti-KT functions. Precision calculations with real jet algorithms are in reach. Calculating the invariant mass of like, a hemisphere E jet E minus is easier than dealing with the clustering effects of an algorithm. I looked at the Hadronization effects where on one hand, people have shown now that you can change the Hadronization model such that it agrees with prediction from factorization. But of course, you should have a discussion about what is the correction description. And I think I personally say the dust has not completely settles on this. The final topic is spin correlations, which can be probed by jet substucture and encoded with parton showers. At this point, I want to thank you for your attention. And I'm happy to take any questions. >> Awesome, thank you very much, this was an awesome, you know, sort of tour all over many interesting things to think about. Yes, so we're happy to take questions now from the audience if anyone has a question, please raise your hand. Okay. I see Raghav has his happened up. Go ahead? >> Wouter, nice talk. Section 7 that you talked about, this non-perturbative effects of the parton shower, very interesting. You talked about this C parameter right and you say if you increase this, effective value with which you calculate C from 0 to three-half, the contribution of that reduces by half, right. So, I wasn't able to follow 100 percent that is why I'm asking the question. So if I have a jet where this effective K scale, right, you include this change parameter and show that the distributions are the same, what does that imply for a jet shower? Does that mean you're actually changing the splitting at each step? >> I think, so there were two separate things here and I hope I didn't mix them up. There is on one hand this analytic study where they said the effect of non-perturbative physics, Hadronization should reduce as you go from C0 because it becomes almost half of that. And then there's this other approach where, well, we actually, here they took the assumption or took as input the prediction it should not change and said can we make the Monte Carlo do that. They didn't change the shower with this, they changed the way Hadronization works. So they managed to then change some of the ways Hadronization was done so you go from this distribution which is just like whatever was in there before, so this distribution that says, oh, independent of what the C parameter would be, I get possible the same shift. >> This is like a change in the actual model its but not the perturbative part. >> It's changing the Hadronization step. >> Very cool. Thanks. >> Cool. I don't see any other hands up. Maybe I can ask a quick one. It might be easier if you go back to slide 8, I think was the clearest example. You showed a couple of places in the talk, plots where you go from, you know, one calculation to a more precise one like this one here. I guess my question, first of all, does the green band overlap with the orange band or are they just touching on the edge? >> That is a good question. You can't really see that. They are not touching on the edge, they do overlap a bit. But we don't have the curve there so we really should make sure that we do this for the actual paper that you can really see that. >> Okay. I guess you know what the question is then. If I'm asking about this, right. So I mean this isn't the only place where I saw something like this. I'm wondering are we missing something when we make the error bars either some non-perturbative corrections that are larger than we think they are or something else that goes into them that maybe we should think about? I mean, I guess, I would expect the green and the orange maybe to overlap a bit more than they are here. But maybe I don't know how that work? >> That's a fair question. This is a fixed order calculation. So the scale variations here is one scale that we vary. If you do like a resumed calculation to have multiple scales you can vary probing the different physical scales in the process. I have to admit, at some level this is our convention. You can be more conservative and get larger bands that will overlap better but then, of course, other people look at your work will say, how did you get such large bands. >> Oh yes. >> That is why we are stuck with sticking with what is conventional so you can do an apples-to-apples comparison. But more generally, if things don't overlap, do we understand why they don't overlap. For example, for the Higgs cross section, we understand why there is a big jump as you go from one order to the next. There is also an example here in another plot where if you look carefully, I mean, here things don't look so well but that is simply because at that point, the leading log resummation doesn't include any matching to the fixed order. This is not a region where, this is a region where we really should have included that. It's just missing something there. We're extrapolating into a region that we know shouldn't work. >> Okay. Cool. I see questions from Clemens, next. >> Thank you for this nice talk. I have a question on the spin effects and jet substructure on slides 20/21. Well, it might be a naive question, but I was thinking about how to measure this experimentally. We see, like in W boson decays or so, depending on the LSAT, we see some spin, spin is always the same but the polarization, we see effects in decays now. You're talking about three-point energy correlators here. I was wondering where one can measure this experimentally. Something, for instance, one would measure in boosted top decays or that is unrelated and one needs to look at pure QCD decays? >> What is here is not specifically for a top Quark. It's for light Quarks or gluon jets. We would take a look inside the jets, well, we would put, we calculate the energy, well, we probe the energy at three different points where this one has a large separation, that is the L angle and this is a small one. So this is not something specifically for the top Quark. >> Okay, you basically just do initiated jets and take a lot of them and try to measure this. >> Yes. >> Okay. Thank you. >> Okay. Andrew, I think you're last. Is it fast? Because we should move on. >> I'll try to be fast. With respect to the non-perturbative corrections in C parameter, if you have a leading power factorization theorem, the Z factor only goes to non-perturbative. How do you incorporate the three jet non-perturbative corrections, a sub leading factorization theorem or what? >> That's a very good question. I do not know the answer. That's a good question. I hadn't thought about this. I wanted to put something out here that was thought provoking and it's very interesting but I don't know how to do this. And I have to be honest, at some level, okay. So this is maybe something for the theorists to decide what they want to do with this. But of course, there are two different approaches and one of them that is advocated in their paper is maybe we should stop pushing so hard on very high precision calculations. Because here we're talking about next to next to next to leading log calculations without doing more work on the non-perturbative side. But, yes, of course, the alternative is we understand these things better and know how to incorporate them. But I don't know the answer to your question. >> Something for BOOST in a couple years! >> Yes. >> Okay. So I guess we should move on now. Thank you very much Wouter for this great first talk. Our next speaker is Jennifer who I can see. I guess you'll share your slides. >> Yes. Let me make full screen. I want to -- wait. Okay. Can you see the slides? >> Yep. >> Great. Hello everyone. It's a pleasure to be here to give an overview of the latest results from the LHC experiments and the topics that are relevant for this conference of course. I would like to start with a slide from the Petar's experimental introduction from 2019 where he released the urgent motivations that brought us to this conference. We all want to understand QCD. We like to play with machine learning and also we try to improve more and more our multijet background estimate. But our ultimate goal at the end is that we want to find new physics. That was 2019. And quite some time ago but it feels a little like we can condense the last two years into one. I'm not skipping 2020 because despite being a horrible year we have worked really hard to meet our goal of finding new physics and hopefully, in the immediate future. In fact, we, so the third LHC run is at the door. And actually one had batch discovery potential. All the new ideas that we have developed in this last years, will increase our chances. Not only in Run3 but also beyond that such that we can keep improving our experimental results considering limited increase in dataset size. So in these last years we have performed more and more measurements. We enlarged the region of the phase space. We developed and applied new ideas to improve the trigger and the data acquisition systems and upgrading and computing paradigms. This is to make the best use of the data that we collect. In my perspective, there is a lot of deep learning now as fundamental basis of, on all this innovation. In fact, we have seen major advances in deep learning models for jet classification. For instance, in CMS, we have studied a variety of boosted jet tagging algorithms. In the most recent searches like deep learning models, using state of the art technology, are becoming more and more the baseline. So in particular, like the newest searches use for instance particle net which is based on the permutation invariant Graph Neural Network. That takes raw information, like the jet constituents of the jet. We see now how moving to more state-of-the-art architectures we can further improve our jet classification platform and push it more and more to the limit. You can see here, for example, the performance of the new tagger particle net with top tagging and compare to the previous baseline which was an algorithm. And deep neural network but based on 1D convolution. Moving to new architectures help farther. Similarly for ATLAS, they also developed recently new neural network based taggers. They are based on high-level information. So like they developed a new taggers for Higgs to BB and also for the V bosons and top tagging where they use the new B tagger and that combines the flavor information of up to three subjects. Compared to a standard single jet algorithm. B tagging algorithm. Together with the kinematics information of the large radius jet in which the subjects are con stained. And they increase in performance and that can be seen in these two plots on the right where the top one shows the rejection of the multijet background and on the bottom is the rejection of the top background. In the blue, the light blue points show these new, this latest algorithm which is compared then to other older ones. At the bottom, you can see the performance for W and the top tagging with these new neural network-based algorithm that uses that jet substructure observables but obtained from the novel and improved jet reconstruction which used this unified flow object which is are basically like particle flow and use a combination of track and -- information. So this with respect to the algorithms as before are additional discrimination classification platforms. So we're seeing now also that these models are not only, say, like R and D but they apply to searches. So here for example, I'm showing three recent results from CMS. Searches for di-Higgs resonances in 4D final state or 2B plus lepton final state and also a recent search for diHiggs production in the VBF channel. Here I'm showing the expected limits as a function of the recent Higgs mass for the two recent searches and instead for the -- search, you see the exclusion limits as a function of the coupling of the two doe sons and the to Higgs. Here you see a comparison with the previous result which were based on 2016 data. There is a large improvement which was obtained thanks to the first deployment and this allows us to go beyond what one would obtain from increasing luminosity. Okay. This is all very promising but doesn't come without complications. In particular, the searches are trying to use and develop and use more and more sophisticated background estimation techniques. In fact, with more powerful taggers we also start seeing more and more that the dijet search is not just -- not only luminated by QCD background and you get 50/50 and top Quark. As an example, I'm showing here the spectrum of the jet mass distribution in the recent diHiggs 24B search and you can see how ttbar becomes important. We try to develop multidimensional models. And use them separately to reduce our systemics. For example, in this search, two-dimensional model is used where -- defined by the mass of the Higgs. So the mass of the jet and the diHiggs invariant mass. Then two regions of this phase space are identified where one of the two jets pass and fail the jet tagging algorithms. In this case the Higgs B tagging algorithm. And then 2D transfer factor is computed to estimate the background in the past signal region. Of course, where the complication is, is that this works, so this transfer factor in the 2D plane works because a lot of effort was put by the experiments to invent and apply tagger de-correlation methods at the cost of some performance loss. The plot on the top right shows the jet mass distribution for Higgs to BB tagger at different background rejection rates and cuts on the tagger. And you see how tighter and tighter cuts start -- the mass. In this way, the background estimation that is used -- the region that fail it is tagger is not usable. In fact, when applying the correlation methods we obtain instead a very smooth distributions in the jet masses as you can see in these other plots on the left. This allows also to be able to see a peak in the jet mass which is something we would like to see in case there is a signal. Also, we have this very powerful taggers which are, however, usually trained on imperfect Monte Carlo. So the obvious question is, is the score well modeled. With mass check, we check the agreement of the score of the taggers between data Monte Carlo and we find that we must apply, and must measure scale factors and we find sometimes that these scale factors may be far from one and with large error bars. So how do we compute the scale factors? Well, it's easy to check for top or W tagging because we have a large sample of top Quarks available include can be isolated. So for instance, here you see a measurement for the W tagging algorithm in this sample of top Quarks and we have a large fraction of -- that can be used for this measurement. So it's not this easy though, to compute the scale factors for Higgs to BB or Higgs to CC tagging. So we typically use gluon splitting as a proxy. That is what we've done mostly so far. This comes with complications. It's not really the same process that we want for tighter scale factor two. It's becoming more interesting, the measurement in Z to BB which is now feasible with the high Run2 statistics. This is for instance, what ATLAS does. And here you can see the jet mass spectrum in Z to BB reached the sample, say, there is enough Z plus jet to be able to perform such measurement. Now, the problem, another difficulty comes if you instead want to apply the jet tagging algorithm to a full pronged jet. That is something that we want to do. If, for instance, we're doing a search with a Higgs boson decaying to WW or standard model measurement with Higgs to WW. This is more complicated because there is no obvious proxy available in this case. As an example, I'm putting here a recent search from CMS include is focused on a triboson resonance, you expect cases where you have this -- biggest to 2Ws and you want to tag these four prong jets. So what we do, you do what you can and at the end, since no method is the best, you need to have a large amount of systemics. Can we have a better Monte Carlo? Well, in principle, yes. Because we're all LHC experiments have performed over the years many important jet substructure measurements. Which can help us to improve the Monte Carlo generator, development and tuning among other things like understanding better the perturbative QCD and measurement of standard model parameters. Here is a long list of jet substructure measurements performed at the LHC. And among the, say one of the most recent ones is the measurement in the Lund plane from ATLAS where they're interested in a measurement that came out already last year. And you have already heard in the talk before about this observable which allows to categorize all hard splittings at once and it allows to factorize Hadronization and parson shower effects. In ATLAS paper they show these two nice plots. The one on the top, these are both based on simulation. The one on the top, the, it's based on Herwig, the two parson showers are compared and the ratio is shown here. You can see the parton shower, the Lund plane is sensitive to the parton shower in the top left corner. And then on the bottom is that the -- model is changed and the ration is shown here. You can see the Hadronization, the top, say, the second, the top right triangle of the Lund plane is sensitive to this effort. So this is very interesting and it can be done also in data. That's the measurement. And to make it more intuitive, they show slices in, basically in the regions of the Y axis. In particular in this slide, you can see how the non-perturbative, how in the non-perturbative region, the agreement with the data is worse or better depending on the Hadronization model. And instead the perturbative region on the left is, the agreement in the percentage region on the left is more sensitive to the parton shower model. There is also a new interesting measurement from CMS which you will hear more about in the next days. In particular, they perform this new measurement of five observables sensitive to the jet fragmentation in gluon and Quark jets. And also in several different variants. These measurements are performed in multijet environment, in multijet sample but also using Z plus jet for the first time as enriched in quark-initiated jets. Here is a full list of observables. They have, they make many important observations about the over-prediction of gluon versus Quark discrimination depending on the generator and parton shower. Among the many results they show here, I'm showing the measurements in the resolution singularity, and the Z jet region on the left and central dijet region on the right. You can see the usual sandwich between eta and Herwig. You can see how the Z plus jet region is better described instead that, with respect -- than the dijet region which is gluon enriched. We have all these measurements and it's probably time to combine and make use of all this information for the next generation Monte Carlo generator. Such that we simultaneously describe everything we need and improve searches and other measurements. This is something that must be done. But at some point we might ask do we need Monte Carlo at all? And instead, since we can just learn from data. This is an approach that is emerging a lot recently. Why? Well, because we have performed thousands of hypothesis tests and have not found any significant evidence of new physics. Either new physics is beyond the reach of LHC energies or we need more data or we're looking in the right place. We have not imagined yet how new physics look like. This is an urgent need to generalize and there are many recent ideas that make use of deep learning to learn directly from data and avoid signal priors. This is also what we call anomaly detection. The first time this idea was presented at BOOST was 2017 and it was a poster. And then it was in 2018, we introduced a dedicated machine learning session in which three out of 7 talks were anomaly detection. Since then, there has been a lot of effort to develop a concept for end-to-end analysis based on this idea. Since it's not obvious, like, how to actually make a real analysis out of this. In particular, where a lot of interesting talks at the -- describing indeed recent developments on supervised learning. I think you will see more in Barry's talk about all this. But what is interesting is that we have also seen finally this idea already applied in data. In fact, ATLAS made the first implementation of this approach on collider data last year. The search is basically a three-dimensional search for generic resonance A that decays to two other generic resonances B and C in the dijet final state. And the approach is based on the CWoLa hunting weakly supervised algorithm. In which basically one performs a classification between two mixed samples, one enriched in signal and one enriched in background. And the classifier is trained on some variables. So in particular in this first implementation, the simplest way of doing it was applied. Which the network is trained using the masses of the two jets to do the classification. Here you can see the efficiency of the classifier with no signal injected on the left and with signal injected on the right. You can see, indeed, how the efficiency goes to low values in the region where there could be potentially a signal. So this shows that the approach could potentially work. Also, unfortunately, no high significant -- was found. So they had to set limits. You can see the observed limits for different hypothesis for the masses of the two jets and the two particles in the final state compared to dijet search. We can see that we are -- an approach is sensitive to a large phase space. Not achievable by signal-dependent searches but it's less sensitive than super vised searches in places where these are tailored for. But, of course, if we will know which signal to search for, we wouldn't have to do this. There is also one problem here that the anomaly might be discarded by the trigger. In fact, to cope with a high data rates of the LHC, we implement two stage filter system. Where the first stage, the level one trigger analyze the data at 40 megahertz. And given like these high rates, just a very coarse reconstruction is performed and brings down the rate to 100 kilohertz. And then the second trigger, has reduced data rates, it can perform a more sophisticated reconstruction of the event. However, we bring down the rates to 1 kilohertz. With 40 million collisions a second and only 1,000 stores, we might just been writing the wrong events. In particular these trigger algorithms are model dependent. Any other signature we did not think about could have easily been discarded. We can think of correcting the problem as soon as possible in this data reduction flow. If we want to apply a deep learning base anomaly detection here, we have to be careful. Deep learning algorithms can become relatively large such that the memory and number of iterations required for the inference can easily explode. So in particular in the level one trigger, we're -- constrained because the algorithm is to run with the latest -- just a few microseconds. Such that the algorithms run on hardware, FPGA hardware. We have scarce resources-how to fit the deep learning algorithm is not obvious. But recently, a library called high level synthesis for machine learning was developed to automate the deployment of deep neural networks in FPGA to obtain ultra-low latency, I mean to obtain a model, say, the implementation that is optimized for good overall latency and good resources. So anybody can try it and see how, say, for intuitively, one can obtain a few more implementation for your deep learning algorithm. In fact, there was recently, there is now an ongoing effort to study anomaly detection with autoencoders with a level-one trigger like inputs that can help overcome these trigger limitations that I just discussed. These detection algorithms can be employed at the event level and this can be done already in Run3 or also at jet level. But in that case, it would make more sense to have it in phase two where we could profit from higher -- in particular for CMS, the detector would be upgraded such that, such to provide tracking and particle flow information at 40 megahertz. That becomes a place where one anomalous jet can be done. I'm pointing here to some work that will appear soon on the archive. In particular, there is work on going to do jet classification in nano second with graph convolutions, interaction network and also more classic multilayer perceptrons. This is not unsupervised but it's a group of concepts that could be fundamental to then apply unsupervised algorithms later. This ongoing work, there are several approaches that are studied but there can be other ones for optimized anomaly detection in low latency and low resource experimental environments. To stimulate the community effort, we have set up a new challenge where you can explore it by looking at this link and where you can find training and testing datasets and lot of information on how to estimate the latency and footprint of your algorithm. Of course, we cannot improve the level one without improving the HLT where we have more granular information. So closer to the offline. Such that we can have there more performant models. And of course, at HLT, we have more relaxed constraints but still algorithms have to run in few hundred milliseconds or less. This is pretty tight still. I would like to conclude saying that understanding the computing limitations is fundamental when developing models. And also understanding the experiment available resources and how to best profit from them, how to best profit from heterogeneous farms. There is a change in our computing algorithms. This brings me to a summary. With deep learning algorithms at the basis of many new innovative ideas, more can be achieved with boosted objects. We have seen how the sensitivity of searches can be pushed beyond the slow increase in the size of the collected data sample. With progresses in understanding jet substructure, searches will benefit further from the application of these new algorithms. New approaches like anomaly detection applied to jets will bring us to new unexplored territories and this can lead to the study of new regions in HL-LHC. Hope to see a few of these searches for the unknown at BOOST next year. >> Thank you very much for that great overview. I'll be taking over for Matt for these questions. Opening it up for questions, please use the raised hand and reactions if you have questions. Max, go ahead. >> Thanks for the nice talk. Along the lines of the anomaly detection, how are we going to stop detecting, like prevent ourselves from detecting anomalies and noise and other experimental effects that are, you know, quite anomalous events that happen often in things like this. This is something that you put into the training and specifically avoid it. What are your thoughts on the experimental aspects of a calorimeter breaking or something like that. Just triggering on that entirely. >> Right. Well, I think we have seen now so little of this applied to data that I don't have a feeling on actually how much this detector noise would be a problem. Some sense we have detector noise also in normal analysis, right. We have ways to clean our data from such problems. So I would think that we will understand this more when we have more examples of these algorithms applied. >> Okay. Very good. Thank you. >> I'll ask a question, for those of us that are not experimentalists, can you give a lighten review on what the next two years at CERN are supposed to be. What has COVID affected and when might we see Run3 and things like that. >> Yes. It will start in, of course, things have gotten slightly delayed because it was supposed to start already this year. And now the current schedule is spring next year. That's the final schedule. So we'll see it soon? >> Good. Excellent. Other questions? I'll ask another question. I'll ask this of many of the search talks. So as the experiments move more and more to deep learning, which is great, and as you emphasize dramatically increases efficiencies, we can trigger and things like that, do things at a faster rate. What can theorists do to help with searches? If you just go deep learning, a computer is much smarter than me. What can we do to do searches. Does it help to make fancy searches? >> I think it's, it's fundamental that theorists indeed help the searches and deep learning models. In terms of injecting physics knowledge in the algorithm. When injecting physics knowledge in the algorithm, you can make it more compact, faster, and more intelligent than what it is. We will see examples like equi variant neural networks or this type of knowledge. Do you agree? >> I do. To be fair, the equivariant Lorentz network was lead by experimentalists who are perhaps smarter than theorists. >> Other questions? >> Hi, yes, I can ask a question quickly. Sorry, Jennifer, something crazy happened and I missed most of your talk which I really apologize for. I was looking through your slides and I really like the slides that you had about improving the Monte Carlo. I agree, maybe ditching it would be better. But I'm wondering if you know of any progress that's been made to try to incorporate all the measurements that you're listing into new tunes of parton showers or is this something we can try to push on as a community? >> Yes. I think that's the missing part. That's the missing part and we should push more on that. And eventually maybe also restructuring a little, some type of hierarchies in the experiment, like how groups talk, for instance. The standard model group with the searches group. But I know that will are a lot of people interested in the field, let's say, filling in the missing pieces of the puzzle. And I think we will see more of that. >> Okay. That would be cool. The thing that I wonder about is whether we're measuring the right stuff to really get the Monte Carlo people excited. I know I've seen talks from the people where they're asking for us to measure different observables than the ones that we're looking at. It could be something worthying about, are there things that we're just missing. Like maybe not stuff that ATLAS can do or LHC or someone that can do nice hadron spectroscopy and stuff like that inside of jets. This, I think, the people could be really useful and it's not something that we're pushing towards right now? >> Yep. >> Gregory, go ahead. >> First, I want to agree with Matt. My question is a follow up. If everything goes unsupervised, what are experimentalists going to do? >> Who. >> Andrew asked the question what are theorists going to do to help the machine learning revolution, say. If everything goes unsupervised what will the experimental lists do to help? >> I don't think everything has to go unsupervised. It's a new approach. I think, so say that, I don't know, maybe, how I see it is that, if we see an anomaly, we might need to understand it. So there we need a lot of work in between, among, let's say experimentalists, phenomenologists and theorists. >> That makes sense, thanks. >> Any other questions? Max. >> This is not so much a question. It's a comment not a question. I wanted to highlight on slide 9 the result from CMS, the BBH to 4B. I think this is a really, really beautiful result. I think it's a really interesting new, sort of step for substructure. Because here this search has shown for the first time there is a coupling between two Higgs bosons and two vector bosons and that's a strong statement about the standard model using boosted techniques at the heart of this. And the beat the equivalent of ATLAS and us. I think this is a nice search and I wanted to highlight I think this is another interesting aspect, seeing how we're seeing a real interesting, unique statement on the standard model physics with these techniques. Thank you for showing this. >> Okay. If there are no other questions, we can move to our break. Thank you, again, Jennifer and Wouter for the great review talks to kick off BOOST. In the chat I have posted a link to the gather town for BOOST. Please come over as you can with your break beverage of choice. We'll see you there. We'll reconvene, let's see, in about 15 minutes. At 1650. I have do the time zone conversion. In 15 minutes. We will stop recording this Zoom session but the Zoom room will remain open if you want to hang out here and say hi to people as well. See everyone in about 15 minutes. >> Okay. Everyone. We're coming back to Zoom now. I guess we'll give people a minute to transfer back over. I see Barry, hello. >> Hi. >> Do you want to try sharing your slides, we're about ready to go. >> How is this? >> Looks good. Let's just maybe wait another minute or so and then you can get started. >> Sounds good. >> We've crept back up over 75 people, so I think, why don't you get started now. Okay. So I guess our next speaker is Barry who is going to summarize the ML4 jets workshop that happened recently. In particular, I think he is going to talk about many very interesting machine learning related studies which have been done also by members of the BOOST community. Ones which we may not hear about otherwise this week. I think it should be very interesting and I look forward to it. Barry? >> Yes, I think many is the keyword there. We had quite a lot of talks and I apologize if I don't do justice to the presentations. First of all, I would just like to mention all the talk, recordings and slides are able on the Indico page. I would like to thank the organizers for having this summary talk. There is a lot of overlap between the interests of the two communities. We had the workshop just almost a month ago in Heidelberg in Germany. You can see the picture of the old bridge and the castle in the background. The venue where we had the conference is a short walk five-minute walk from here. I will start with an overview of the workshop in general and then I'll go through the sessions in lightening fast speed. For give me if I leave out details. I'll introduce the next venue for ML4 jets and give a summary of different topics that were covered in the workshop. The series follows on from workshops in 2017, 12018, and 2020. This year because of COVID, of course, we couldn't have the full conference in person. So we managed to get a hybrid workshop. We had 384 registered participants. We were able to have 30 people here in person. Mostly from Germany, ham burg and so on. With everyone else online. For peace of mind, we had daily testing and socially distanced lecture hall. With the online format there were lot of abstract submissions. In three days we fit in 11 sessions and 99 talks. This is good because we recorded everything and put it online. But for the people in attendance in person, this meant 12 hours of talks per day. It was intense but worth it. One of the great things about the workshop is we had talks from the theory community, experimental community and machine learning community. We got a lot of interesting discussions going on the breaks and so on. And of course, the Euros. After 12 hours of talks what is nice is to turn the projector over to football and have a beer. What is more important is the physics. Here I listed the main topics that were covered and at least my impression of what the new ideas were and what was important. The first obvious thing is the fact that many techniques had big gains from the architecture. We have seen graph CN Ns and deep sets, the transformers and INN and flows have boosted the current technique. As you can see from Jennifer's talk, there have been many improvised machine learning advances. We have seen lots of progress and simulation in generation direction. So moving from GANs to flows and INN and VR architectures as well. One of the cool things about this workshop was the announcement of two challenges following up on the LHC Olympics and top tagging challenge. The anomalies at 40 megahertz challenge. And another one which is not as far along in the organization is the calorimeter simulation challenge. For me, three of the most important or the most interesting subjects that we're studying in machine learning and were covered in detail at this conference is if understanding of uncertainties in machine learning tools and the deep learning and anomaly detection and symmetries. These anomaly detection techniques are used in experiment. This is one direction that needs a lot of work in comparison with everything else. It's one of the most interesting things for a theorist or experimentalist to work. In the first session we had new architectures and here the talks are mostly about incorporating symmetries into the neural network architectures or optimization techniques. We have the epi variance and equi variants. Lorentz neural networks and permutation invariant architectures like the deep sets or the transformer architectures. I mean the permutation of variant architectures are used because in a jet, if you order the constituents you have implies sent ordering in the data. In part two we have new strategy or representations. Here the shift is from encoding the invariance in the network architecture and moving it to the representation of the data. The most common jet representation we know of it jet image. Here we, other ideas were explored such as the new unsupervised learning by Peter. We had the BSM section. The focus was on anomaly detection. It's difficult to categorize different anomaly detections into a single box but here are three main sections. The first are the over-density methods. The anomalies are seen as an over-density in a high parameter space, like a high dimensional bump. A high dimensional space for a region that is over-dense. In the second part we look at latency space anomaly detection. Autoencoders where the input data is map today a compressed latent space and we look at the data and look at the points of the lowest dense spaces and identify the anomalies. One highlight here was the review talk from the dark machine from Joe and Bryan. They compared different techniques and as far as I can tell, it looked very nice. In a third part we have the data space searches with autoencoders. This is what you traditionally associate with anomaly searches. Essentially, the data is mapped to a compressed space and decompressed and the reconstruction error between the two, the input and output is used as an anomaly detection metric. It was interesting to see the three different approaches combined in a single session. Some comparisons between the three different approaches would be nice in the future. So then we had the ML assisted measurements and searches session. Here we saw some interesting updates from the NNPDF collaboration. Talks on using machine learning and searches for optimal EFT parameters, observables for EFT. And hypotheses testing at the LHC. We saw talks on new inference methods. Particularly the Ginko method. One of the big advances in this session was the use of invertible neural networks to perform measurements particularly in the measuring QCD splittings where the forward process can be simulated with different sharing parameters. And then the normalizing flows can be used to perform inference on this process and to actually get some measurement of the parameters. Then we had the, I think this was on the first session of the second day. The classification talks. The big, I think the big talk here was the improvements to the state of the art particle net network. In the top tagging challenge a few years ago, the particle net came out as the winner. And there was work presented on particle next that incorporates more advanced architectures to boost the classification of the particle. One talk was the use of physics motivated representation like the Lund plane. Not just for top tagging but for Higgs measurements and it was really interesting. The next session was on the simulation and generative models. In part one, we focused on detector simulation. The motivation here is that in the future, LHC Run3 and so on, the overhead computational costs grow exponentially or badly. And the goal is to use GANS or machine learning tools to replace or aid. One of the most interesting talks was the ATLAS talk. It was interesting to see them using the GAN techniques. This looks extremely promises. Part two was the event and jet generation talks. Here were talks on a variety of architectures, GANs, we had some flows in here. This image comes from the OTUS talk from Jessica where they were using a VAE architecture inspired by optimal transport to, for event generation. Then the regression, calibration and fast inference session. Here in the first section are updates on regression with pileup mitigation in CMS. We compared PUPPI with new machine learning techniques using attention mechanisms, particularly this ABC net or another method, Pilot Mitigation with Attention. We have the calibration talks next. Here one of the interesting things which is quite a new topic, the idea of super-resolution. We can start with a low granularity jet image from the calorimeter. And using other information like tracking information and so on, along with machine learning techniques, we can upgrade the resolution of these images. But apologies if I'm misrepresenting this work. Lastly in this session we have the fast inference talks. For example, the jet identification on the level one trigger. This again is similar to what Jennifer was talking about. This online trigger stuff is interesting there. Is lots of data thrown away and there are anomalies that we can possibly detect with machine learning sets. We had one session dedicated to datasets and challenges. Both are in a similar vein. In machine learning, one of the difficulties is to, if we develop a new technique is to have a fair comparison across the board. This is where the datasets come in. This reducible open benchmarks framework is there so you can up load your neural network architecture or whatever and this framework will run the architecture on some data and then pre-vied a like for like comparison of the different methods. Secondly, which is a bit similar, is some shared data and algorithms for deep learning in fundamental physics. This is the Erum data program. The data doesn't just focus on particle physics. We have data from Pierre and Roget and others. The datasets were provided in a Python package and everything can do the training and test their methods. We have two new challenges following onto the top tagger challenge. The first is anomaly detection at 40 megahertz. The goal is to have this run with a very small network that can run on a chip with very fast inference times to be used on the online trigger. The second challenge is a proposal, a calorimeter simulation challenge. We have a community challenge based on a common dataset for using and benching marking different approaches for fast calorimeter simulation. The community input is welcome. Then we have a session on exploring the latent structure of data. This first part, some of the interesting talks here were on learning symmetries and conserved quantities in physical systems. In the very first session we had new architecture stuff where they were talking about, we have lots of work with people incorporate symmetries into architectures and here machine learning is being used to find conserved quantities and invariants in the data themselves. They have taken the problem of finding invariances and reframed this as an optimization problem for machine learning tasks. Then we have in the second part, talks on latent space exploration. There is a variety of applications here which aren't all in the exact same direction. But one I picked out was the COBRA architecture. I found this interesting because one of the problems we have with complex final states is rhetorical backgrounds and here they use machine learning architecture to overcome these backgrounds. So that seems promising. Then we have a session on interpret able and robustness and uncertainties. This is, there is a lot of different talks and different directions in this session. But there was an introduction on interpretability and robustness and uncertainties with machine learning. And then we came to uncertainties with generative networks. Here I'm showing plots. Not only were they able to use neural networks as a generator but they're using Bayesian generative networks and can generate data and provide uncertainties on the data that they generate. It's a really important problem if these techniques are to be used in the experiments. Then we have talks on information content. For example, here we have the explainable A, for ML jet taggers. It's an interesting technique. These neural networks are using some black box that you can't see, you can't really understand how the data is propagated through the network. Using a clever backpropagation trick, they're able to see which of the inputs in neural network contributed the most to a certain decision on the output layer. Here are some examples where the red pixels indicate pixels that contributed most to the decision in the jet classification task. Lastly in this section, part 4, constructing observables. Here we have for example Bayesian inference in four top LHC. This is a mixture model that assumes there is signal and background processes in the dataset and tries to disentangle this through inference techniques and there were other interesting talks in this session. In the last session, there were talks in general machine learning applications to cosmological simulations for example. We have a talk on conditional invertible neural networks to probe cosmic ray sources. If you have some physical parameters and you can simulate the forward process, using INN, you can invert the process and provide input on the parameters. We had a talk from Rutger's. Here we have a visualization of the -- data. They're using the anode density estimation technique. It's a technique developed for particle physics application and they're using this to identify stellar streams. To end the session, there was a talk from Michael on synergy between quantum computing and machine learning. I wasn't able to get images from this because they didn't upload the slides. He gave a live demonstration of running a machine learning algorithm during the talk. This was focused on quantum annealing and it can be useful for optimizing particle physics for optimization of particle physics problems. So quickly, the next ML4 jets is announced in January 2023 at Rutgers University. The plan is to have it in person. Having visited there myself, I can recommend it as a venue. If you're interested, sign up. For the summary. Here I listed the exact same topics from the beginning but I hope now this is more clear where these all came in. We had big gains from the new architecture. Attention mechanisms and transformers. We had impressive advances in machine learning at ATLAS and CMS. And as a personal test, the symmetries and deep learning, the uncertainties and anomaly detection really shone this year for ML jets. I will end here, thank you. >> Thank you, Barry. You weren't kidding about having a lot of things. Okay. That was great. Are there any questions from the audience, please raise your hand? People are clapping. Okay. Clemens has his hand up. >> Hey, thanks a lot for this nice presentation. I have a question, or maybe it's more asking for a comment from your side on the first point that you list here in the summary and you touched upon in your talk about the very beginning. Which is big gains from new architectures when it comes to jet tagging. So you had gotten one of the early slides where you drew like, is this the limit in the middle of plot of, you know, how far can we actually push this further? Do you think we're somehow, is this red line much closer to the lines that we have already in the plot or is there lots of space? What is your impression? >> None of these plots are mine. These are from a talk that you'll see again maybe on Thursday or if you watched the video. It's really not clear. I don't think there's, I mean, because of -- okay. So the best performing models are sophisticated and complicated deep neural network models. It's difficult to interpret what is going on. We don't really have an analytical grasp on the upper limit. It's a really difficult question to answer. I don't have a good answer for it? >> I saw a question submitted I think for Frederic's talk and maybe we'll come back to this talk about how much information is there in a jet and how much information do you show to a neural network? And whether that is good or bad. Probably we should talk about it later instead of now. But I think this is something that will come up, definitely. Andrew? >> Thanks for the great review, Barry. I have a metaquestion. In the history of BOOST, ten years ago the name of the game was designing new observables that have a very concrete physical interpretation but then have some practical application for tagging or whatever. From the machine learning side, you can say, well, I want to work with the lowest level observables, calorimeter hits or whatever and throw it in a machine and see what the machine can learn. Clearly, these are two ends of the spectrum. One is very physically understandable but very limited in scope because it's a single observable. And one is extremely general but has, from one perspective, maybe impossible to understand. Individual calorimeter hits are not modeled well theoretically. Where do you see on this spectrum, studies in machine learning and kind of theory or experimental analyses from the other side, kind of ending up? What do you see as the happy medium? What's the, you know, what's the, what did each side need to give up and what did each side gain from the growth of machine learning in BOOST physics? >> I suppose one question is how much interpretability can you get and understand well enough to use in an experiment. This is a question, I think as long as you can calibrate the observable, it's fine. Okay. But, how much interpretability are you willing to give up is a maybe a moderate taste. Low level data, this is interesting and your question in Jennifer's talk I find interesting and I was hoping to address it here but during the talk I didn't have enough time. There are symmetries you can impose at low level -- at the level of the constituents which propagates to the network. For example, the simplest thing you can do is take the jet constituents and order them by PT and flatten them and pass them through the network. It's the worst thing you can do. You can say the neural network is a general function and it should be able to extract all the information. But in practice this isn't the case. If you incorporate the symmetries in the low level, you are helping the network not only get performance but produce interpret able results. Bias isn't the right word but you're not going to see influences of the preprocessing on your neural network output. One example is permutation of invariance. This is the big gains came from the first, the graph net architectures and these permutations of invariant networks showed a big performance over things like, well, the images or permutation of the invariants. Big performance with the graph net, which is interesting. Another thing is the rotational invariance of the jet. It's not an exact symmetry. So we have several neural networks like the SL3 and so on but other methods for embedding the symmetries in the low-level representations. We haven't seen this play out fully in the literature. But my expectation is this will not only improve performance in the networks but more interpret able outputs, it's a vague thing to say, but I think, more interpret able network outputs. >> Thanks. Petar? >> Yes. I want to comment on this. I think Andrew has a right question and I think this is sort of what comes back to Jennifer's talk also. In the sense that we need better detector simulation and we also need somehow to connect the theory that you do to Monte Carlo simulation. I mean this is sort of where the link has to be made because we are training these machine learning algorithms on Monte Carlo or even if we train them on data, you can then also train them on Monte Carlo for specific model and basically learn something. I think that process of, if we had a simulation which is, if we had the detector simulation which is perfect, that is of course, not true. But if we had one, then the second step would be to connect theoretical calculations to Monte Carlo simulation in some way. And I don't think that framework for that exists at all. And I think this is something that we as a community need to figure out how to do. So I think absolutely there has to be connection between deep thinking and deep learning as we discussed a couple of BOOSTs ago. I think this is where there is a missing step in this whole process of, we are running Monte Carlo simulations that were written 40 years ago and that have been tuned on data without actually, I mean adding all of the theoretical improvements in the theoretical understanding of soft QCD, I mean none of that propagated into the Monte Carlo simulations that we use to train machine learning algorithms. So I think that was, yes, I mean I just want to say this. Because this is sort of the missing link that connects all of these things and there's very little work being put into this? >> You're saying the bottleneck in this fast simulation is not in the machinery by on the Monte Carlo side? >> I think machine learning, I think all the aspects are advancing and I think, so I mean maybe I'm wrong, but as far as I can tell, sort of there are many things that do not propagate into measurements and even think that propagate into measurements that there are, they're like large, even instead of all the measurements they are large theoretical uncertainties that really don't need to be there simply because there is no progress in certain areas. I attend every BOOST from year to year, there are huge advances in machine learning and huge advances in calculations and there are huge advances in, you know, getting the technology of Monte Carlo simulations to work correctly. But there is no connection between improvements in soft QCD and Pythia, you know, parton shower. And maybe this is fantasy. Maybe this is just too hard and it's going to take ten years to do this. But we as a field are not putting enough effort into this. I feel. Maybe other people can, voice different opinions and correct me. I mean I would love to be wrong but that is what I feel. >> John, did you have a quick comment about that. >> It was a follow up on that. So the, first of all, there is a lot of theoretical advancement in Monte Carlo generators. But it's focused mostly on getting the hard part of the event right, all the technology to deal with including higher order matrix elements and parton showers and higher order parton showers are on the way. I think we give the impression there is no work going on in Monte Carlo, because there is. There is a lot of theoretical connection there. The state-of-the-art theoretical calculation for a given differential cross section at the LHC is more than likely to be embedded in a full final state Monte Carlo these days if you assume away from anything simple. So I think in that sense, it's, you know, we couldn't give the impression there is no work. There is work. I think you're right, it hasn't been focused on getting the soft QCD right. At some level that is a bit of an afterthought and retuned for the parton shower. The parton shower interpolates between the two. But then when, then you need to think, even the Monte Carlos where we have state of the art calculations for a given final state is embedded in a Monte Carlo in Sherpa and Herwig with match box or something like that. It's very, very challenge to quantify the uncertainties on its. If you want to feed all the information about a full final state into a machine learning algorithm and really understand the result. In principle you need to understand, you have theoretical control over all of the uncertainties of everything you put in. We're a long, long way from that. I think one approach to making better use of machine learning is maybe to start being stricter about what we allow the machine know from the Monte Carlo and only feeding it with things that we have good theoretical control over. And then gradually trying to grow that list working with the theory community to put them in there. I think throwing everything we know about an event from the data and everything that we know about an event from the Monte Carlo into a machine is, we'll never get there. Everything you throw in, you have to have theoretical control over the uncertainties and at the experimental level. I think we should step back and build up what we can give a machine to learn from step by step and see how far we get that way. That is where I think the collaboration with the theory community can come in? >> Okay. Great comments. Jon, I think that is a very good segue into one of the talks in the next session that I suggest we move onto now. We're running a couple of minutes late. The chair of the next session is TJ who I guess is here somewhere. Yes. Hi TJ. The first speaker, I should check is -- >> His video is on. We can hand over to Samuel and a summary of what PIRANHAs can do for your jets. >> Sam, you're muted. >> There we go. Thanks so much, Matt. Okay. Hi everyone. As you know I'll be doing a short review of pileup and infrared radiation annihilation, PIRANHA. It's a strategy for continuous grooming that my collaborator, Patrick, Eric, Jessie and I have developed. Before I go on, can someone verify, I've been having problems, when I go through the slides, are people seeing this? >> Yep. >> Thanks. In my talk, I mentioned that grooming is important and it's a procedure for removing contaminating soft radiation from our data. You all know this. It has experimental and theoretical benefits. And for, more on that in the questions, actually. For the sake of this talk and for simplicity, I will focus on the modified mass drop tiger or beta equals zero. One of the main points I made is Soft Drop with beta equals zero has a hard cut off and I looked at two events each with two particles. E plus and E minus have soft particles with energy fractions Z plus above the cut off and Z minus below the cut off. Even though the events start close together in event space in a way that we can make precise, they look similar because they straddle this cut off. Their distinct final states after the grooming procedure. We mentioned how this leads to problems in predicting responses to detector responses to Hadronization responses. More on that in the questions. But it seems natural to ask if we can ask for a continuous grooming procedure which does not present this difficulty in this measure zero region of phase space where we straddle the cut off. We do implement -- we do introduce such a procedure or a strategy for continuously removing contaminating soft radiation. We call this PIRANHA. And it's based on some techniques that I mentioned briefly in the long talk. Now I will mention the intuition is we can think of this as PIRANHAs or a group of PIRANHAs that we're optimally transporting to eat up the offending contamination or event. I introduced recursive safe subtraction. A tree-based implementation of this PIRANHA strategy. Let's compare it. You have seen Soft Drop many times. But for now recursive safe subtraction here. They're similar, you start with a jet and recluster it to get an angularly ordered set of submissions and loop through the widest first. In Soft Drop, there is a hard cutoff and we completely eliminate the events in the heart cutoff until we find one that survives. In the case of recursive safe subtraction, we have a set of PIRANHAs that remember how much they eat. They're going to get full. At each step of the grooming procedure, we're going to use up some grooming parameter and reduce the amount of grooming we do in the future until eventually one of the emissions is going to survive the grooming procedure and then we'll keep the rest of the jet. We can look at this on our simple examples from earlier and because the PIRANHAs are eating up the event in the case of E plus and E minus, we don't have the same issues of discontinuity for the events straddling the cutoff. First, one manifestation of this continuity is that the distributions of observables tended to be more smooth in the case of recursive safe subtraction. Here is an example of C1, 2. Which is M squared of the PT2 of the jet. In the case of Soft Drop, there is a kink in the distribution. In the case of recursive safe subtraction there is no such kink. We saw how these, this few found continuity responded to Hadronization and I'm showing C1, 2 on the X. And hadron level C1 -- excuse me, parton on the X axis. Hadron on the Y axis and will is a larger spread or less linear correlation as reflected by the linear correlation coefficient here in the case of Soft Drop. More spread for Soft Drop. Less for recursive safe subtraction when it comes to responses for Hadronization. Detector effects, I presented this naive analysis when I want to have no detector effects, I think about all hadrons in the jet. In the case of post smearing, quote unquote, I consider only the charged hadrons in the event. We see it's not quite as obvious just by looking at it but there is less linear correlation. More unpredictable nonlinear responses of the Soft Drop jets to this very naive smearing procedure. So we learned that grooming is important which you knew. I introduced the PIRANHA grooming strategy and recursive safe subtraction implementation of that strategy which overcome some of the discontinuities of previous methods and I'm looking forward to all your questions. >> Thanks, that was an excellent summary. I think the way we will do this is we'll start with one question from the discussion documents and then we'll also have questions from the floor. And do raise your hands if you want to interject. One question from the document is as follows. You've demonstrated that you have to choose the Z cut quite precisely. It's sensitive to the goals that you're trying to I Chief and what your event environment is. How robust are these choices and do you see a way to reduce the need for tunings. And you need to have sensitivity to the W mass but you also need to deal with under lying retro pilot? >> Thanks so much. I don't have a very complete answer for this question and I think it's an excellent question that merits further study. But let me show what happens when I consider top jets for example with the same PT as the W jets that I showed in my long talk. The same choices of Z cut and the same re-scaling approach to work well in that case. It seems to me based on this naive example that because the grooming we do scales with the energy of the hard process, that if the energy scales involved in the hard process are similar, the amount of grooming we need to do are similar. On the other hand, if I increase the amount of energy associated with the hard process, for example, here I'm showing you jets produced in Z plus Q processes at 3 TeV rather than the 500 jets I showed early, because of the amount of energy that we're grooming at the hard scale process and the energy associated with the under lying event is weakly correlated with the energy of the hard process. It seems that we need to be more careful as we change the energy scaling of our processes. Does that answer the question? >> I'll leave it open for a moment in case anyone wishes to ask a follow up? >> Thanks. >> Okay. I think we can take a question from Robin, go ahead. >> Hi, thanks. I just got back from vacation so I haven't opened the Google doc or watched anything in advance. But it occurred know, first of all, this is super interesting method. I'm going to read about it now. But one of the issues with the modified tagger in Soft Drop mode is you need to calibrate things back up to their proper masses because you lose the mass peak. And then it gets shifted. I was wondering if any work has been done comparing, well, experimentally, maybe or even just with the Monte Carlo to show whether this helps the peaks be more on mass. And you sort of showed me on slide 9, a plot that a little bit answers that. But I can't totally see the peak underneath all the different Z cuts from Soft Drop in the blue. >> Yes. Absolutely. Yes. Let me see if I have a better plot here. Looks like I don't. I have one in my long talk. Worst case scenario, if I can't answer the question here, I will pull that up. But for now, let me know if this plot begins to answer your question. Here I'm showing the modified mass dropped groomed W jet at 500 GeV. And you see the mass peak here appears close to W mass for recursive safe subtraction in the long talk I talk about how with a different choice of Z cut actually, we get a slightly shifted value of the W mass or the value at the peak simply because in the case of recursive safe subtraction, we're removing some kind of set amount of energy. Whereas in the case of Soft Drop, that is not necessarily the case. And so we do a very, very naive rescaling procedure here and we'll do something more complicated in the future. But for now, just a very naive scaling procedure where we shift the mass by 2 percent to cover the W mass. Please tell me, does this give additional information, please help me follow up or kill the question? >> Well, I think it answers my question. I think it means that there's no improvement in -- you still have to do the rescaling. >> Yes, absolutely. You still have to calibrate. Thanks so much. Thanks. >> I wonder if, Matt, do you want to follow up because you had a question on calibration effects? >> I did. It might be a little niche but I guess if I can try to summarize my question for you. One thing that we've seen in ATLAS when doing all those studies for the UFO paper last summer is that the jet response in ATLAS of signal and background jets can be a bit different. And it can respond differently if you have a jet with real structure in it or you have a Quark or gluon jet to different grooming algorithms. This is a total pain in the butt if you want to calibrate the mass of the jets. There is a section in the paper about this. The question is, does RSS treat W top jets any differently than background jets? Have you looked at what it's actually doing in these different topologies? >> I don't have a specific answer to your question, again, thanks. I think it's really important and beautiful. First, I will say I don't trust my calibration procedure nearly as much as I trust the more sophisticated calibration procedures using that paper and experimental analyses. We hope to have more slightly sophisticated ones. But one question I have for you is, do you know how the issue manifests for different types of jets so we can address it more in the future. I have another slide with plots for you. But I can address the question in the future a little more accurately? >> Maybe it would be better if we follow-up about this. Basically, in a nutshell, the problem is if you have a W jet and gluon jet with the same mass and PT, you could imagine the ZG of the jets is different or something like that. Because you have more soft particles in one than the other, we think the jet mass response differs. And this means the calibration factors that you assign the jets is different. This actually prevented us from using some of the options that we looked at. Recursive Soft Drop was sensitive to this technique and we couldn't calibrate it. It's a weird technical thing that we noticed that we're still trying to understand. >> Thanks so much. In general, I will say, I expect very naively that the increased continuity of recursive safe sub fraction, for example, leads to better responses or behavior. But without further study, I don't know. I wanted to show these plots in case they show additional information for you where I do the procedure for the QCD background at the same energy scale and draw the ROC curve to present a little bit of additional information. I think to answer the question fully, I need to do a more precise study. So thanks? >> Okay. >> I think we will have to move on shortly. But Akitya, if you have a short question. >> I doubt it's a short question. I'm sorry I didn't write my question in the Google doc. It's very interesting how the peaks are but the fact that in Soft Drop that you have a hard cutoff that allows you the understand the effect of the Hadronization quite simply. Because all it means is that if there's a perturbative subject that is close to the Soft Drop boundary, that the Hadronization corrections for the boundary end up translating the theta function to a delta function. That is one of -- there are two non-perturbative effects. One is how much the area of the jet is. How much you collect. That scales us the groom at radius and the other is the boundary effect. Which is as you pointed out in the talk, it's a sharp boundary and end up being a delta function. Now that you have softened the boundary, how do I think of these boundary effects? How do I think of -- I mean you don't have to answer this question now. This can be very complicated. I'm happy to discuss this over the week. But how does the softening of peak translate -- how does this boundary correction that happens in the Soft Drop, or Hadronization, how does that change or what does that look like in this kind of observable? I also understand what the boundary effect means. Boundary correction, if there is a subject that barely passed or failed. If it barely passed, it passed adder and failed. It happens right at the boundary. The perturbative coefficient that is involved in this contribution is basically puts a delta function for the Soft Drop condition because of this sharp cutoff. Now that you have softened the peak, how does that picture change? >> I think to be fair to the other speakers we should carry on this discussion elsewhere. But thanks for the comment. I encourage, also, if in other sessions we have follow-up questions that we don't quite manage to get to, to add them to the document or chase after the speakers. Thanks, again, Sam. Now over to Yongbin for particle identification with Graph Neural Networks. Looks good. >> Okay. Thank you. Glad to present our studies for what we call semi-supervised graph NN for PUPPI. I'm at Fermilab and this work is done by me and Pan and Miaoyuan and Shikun. Computer science experts. So to start a quick recap. We need pileup mitigation. And there are previous studies using the charged hadron subtraction and the SoftKiller which removes the low PT particles and PUPPI which makes use of the neighboring particle information. Recently, we had the machine learning studies for pileup mitigation that using convolutional neural networks and Graph Neural Networks and the problem with the machine learning approach is that you need the graph information of the neutral particles. So this, so in a real case like full simulation or in the data, we can do the -- for the charged particles. This is easy because we have tracking information, tracking and word information. Because of neutral particles in the full simulation, this information is currently really hard to recover. In the end we don't have this information at all. If you want a neural network with previous approaches we need a perfect simulation or very good simulation of the data that we are also discussing. So the idea we're having here is how about we train our model using the charged particles and then do the inference on the neutral particles? Doing it this way, this is called the semi-supervised approach. This would allow us to train directly on real data. So full simulation. We can just train on the data and apply it on the data. And also I want to emphasize, we have in this study the -- our own model. But the semi-supervised training strategy will work on other machine learning models. We need to control the input features and make sure it transfers well from charged particles to neutral particles. And just for illustration purposes, this is the distribution made by the CMS collaboration and this is a feature of the particles. We can look at the neutral particles and the charged particles and they look similar. We can make use of these features and train on the charged particles using the reconstructed information. And then apply it on the [indiscernible] particles and that is the basic idea. This is still on the fast simulation. And we just take the PUPPIML datasets as the training datasets. So we have training on the 80 and 140 pileups and we train on 80 and test on 140. In this specific dataset, the flag for the charged particles are assumed to be perfect. There is no mislabeling. In the real data case we manage to handle these things but here it's sort of like a toy model. We don't have this problem. >> You need to speed up a bit if you want time for questions. >> We do the model in a phi space and connect the particles. And we do this, what we call the -- model. When the graph -- this pass from the neighboring node and we apply K here and then the message is we do an average, the mass is sort of the average of the gate. When we update the node information here, there is another gate that is sort of the ways of the neighboring particles and the ways of the node itself. So two layers of convolution and two layers of MLP. We take the GNN output and PUPPI weight and get the final score. This is like masking procedure, we mask all the randomly selected charged particles and mask them with the neutral particles. This is a performance at PU80. This is like the semi-supervised approach. The PUPPI scores are here. These are better compared with PUPPI. And with supervised states it's similar. And here are similar performs. We still observe consistently better performances. We test some performances on the high level physics variables like jet mass. You can see basically the supervised learning and [indiscernible] have similar performance and better than PUPPI. And also at 140. This is like the display where the truth particles and the PUPPI particles against the Graph Neural Networks and most of them get cleaned. Some weight distributions on the GNN compared with PUPPI. The GNN is more peaked on the two ends and the PUPPI weights have flatter dispersions. And in summary, we do the semi-supervised learning and train on the charged particles now and apply on the neutral particles. We're working on testing the performance of this technique on the real data and full simulation. In that case it's more complicated and we are working on this. That's it. Thanks. >> Thanks for -- sorry to rush you. Once again, I will combine a synthesis of two comments from the document. PUPPI and GNN seem to produce quite different weights and also in terms of the, your goal is in the eventer display, you can eliminate a few more neutrals that PUPPI doesn't manage to deal with. Do you understand what the differences are in the inputs and, or what is it in the inference that the Graph Neural Network is picking up on? Would it be possibly to use that information to improve our non-ML techniques by adding JPT or isolation requirements? >> Okay. Thanks for the question. Yes. So basically, if you look at the inference and the Graph Neural Network. The inputs are like the PT, charge. It's pretty -- almost the same. Our goal here is we want to make use of the particle feature itself. What the -- does cut on the PT plus the neighboring feature and combining them together. Instead of the same poll as PUPPI here which defines the PT, et cetera, the GNN makes use of the neighboring features and explores more complicated structures. Like, for example, it could be PT square root divided by delta R or whatever. It depends on the delta R of PT. This is something we think it learns. We think it's something like similar to PUPPI but it does better job. That is why we want to reduce the input variables and just make it small network as small as possible. So we can be more comfort that it really does work. If you look at this, these are distributions. These are like, if you tune really hard, you can probably remove the neutral particles. The idea is here let the GNN tune the matrix such that there are pileup particles that are not far from where the leading particles can be claimed. Like this one. It's sort of close to the -- particles but not so close. These particles can be really clean. I have more displays here. Like these particles can be cleaned. >> I think the question is in some terms about interpretability. We know what you've put in and your network is getting some results but do you know how it is using that information? Maybe this is something that is still further work. >> Yes. We need to understand it better. But for now, it makes use of, it can find a better matrix. Like PT square divided by delta R or something like that. Which is a function of the PT. Instead of a relatively straightforward matrix as PUPPI. >> Thanks. Let's see. Raise your hand if you have spontaneous or would like to ask questions that you posted in the doc. Maybe while people are thinking we can have one more. You showed that you're able to transfer the training. On the other hand this will limit you to using only the information that you can treat identically between the two. I guess, maybe this is slightly broader than the scope of your specific study, but do you foresee a way that you can do a better association of the labels for the neutrals and/or extend this to using -- information, bearing in mind their charged and neutral terms may differ? >> I mean, okay. So for the neighboring particles, we can still use low level showering information of the neighboring particles. For the target particle, it's not straightforward to make use of such information. But we could think about doing some training, for example, like for photons, we can still do matching for the, at the simulation level. And the real simulation and then train on a small sample of photons, like making use of -- showering information. And then combining the training of this -- plus the training on the photons together. Maybe that would do a better job. I don't know. That remains to be seen. We're studying the adaptation to transfer the charged particles better to the neutral particles. Maybe we can also do something there. >> Great. Thanks. Okay. I think we can wrap it up here. So thank you. Our last speaker today will be Frederic on jet tagging with graph networks on the Lund plane. >> Hello, can you see me? >> We hear you. Your slides are gone. >> The slides are gone, sorry. One second. It says I'm still sharing. I will just re-log into Zoom. Okay. Sorry about that. Can you hear me now? >> Yes. Looks good. >> Sorry for the technology problems. Thanks for giving me the opportunity to talk about this work. I haven't prepared a separate set of slides so I will go through the full presentation and skip parts. I will talk about work I did on the jet tagging in the Lund plane with the Huilin and a bit of work with Gregory and Adam. So most of this is around using the Lund plane as an input to machine learning model so I will start by giving a brief overview of how the Lund plane is defined and how it can be used in a way of representing jets. So the idea is that you represent emissions in this log/log plane of the log of the angle of the emission and the log of the transverse momentum. This is useful because it separates out different kinematic regimes. Non-perturbative contributions are located in the lower part of the Lund plane. So you can move contributions from that regime by imposing cuts in Lund plane in the momentum of the emission. And soft colinear emissions are uniformly in this plane at leading order. So you can have a region of the Lund plane that is purely dominated by perturbative radiation and another region of the Lund plane that is sensitive mostly to non-perturbative emissions. We can use this representation as a way of creating essentially fingerprints of jets through the use of Cambridge Aachen clustering sequence. We go through the clustering, the branches of this sequence and at each step we define two sub jets ordered in transverse momentum and save the kinematics that correspond to that branches. The information we save is the two coordinates of the Lund plane and like the momentum function of the splitting, the mass of the pair, the azimuthal angle and we repeat this procedure on both subjects until we have the full tree of the architecture clustering so we have a node of Lund kinematics of this tuple T for each of the clustering along the sequence. So we can map essentially, a jet onto a tree of these Lund de-clustering from the clustering sequence. So you have here a representation in terms of Lund plane of each of the emissions with secondary planes here and tertiary planes that corresponds to a binary tree with kinematic information for each of the splittings along the plane in the tree. One sequence is the primary sequence that can be used for measurements and visualization. One thing that we can do from this plane is use it for identifying the origin of jets. So in particular, if we can include information from all these branches at once, so this fractal representation of the Lund plane which many secondary branches and tertiary branches, this provides a strong basis for jet tagging. In particular in regimes where you have complicated topologies. Like top decays or some decays of the Higgs where you have information that can be in secondary branches. It's necessary to take into account the full Lund plane and not just the primary plane. So the way this can be done is by treating the, each de-clustering on the Lund tree as a node on a graph. So we map this tree of Lund de-clustering to graph where the edges of the graph correspond to connections along the Cambridge Aachen clustering and then use the coordinates of each splitting? >> Sorry, I hate to say this, but do you think you can wrap up in one minute? >> This is the structure of the neural network. You can look at in a full talk or on the paper. And the summary is that for things like top tagging you have on the right, background rejection of QCD against top efficiency. You can see a factor of 2 or 3 improvement against particle net for some of these LundNet models. We designed two models based on different kinematic inputs. We see a significant improvement and there is also computational complexity is quite a lot lower because you can use essentially the structure of the Cambridge Aachen structuring. You don't have to do a nearest neighbor search of two particles within the jet. And so I think, I can leave it at that. You can look at more information on the talk in the paper. Please go ahead with any questions. >> Thanks, again, sorry to rush you. I think we can start with a comment that was foreshadowed earlier. Which is that, you have shown that you can get some improved December crimination using these LundNet models. And this seems to be based on using more information than exists in a jet where the, for reference you have something like a 5N minus 5 dimensional input to the LundNet five versus 3M minus 4 for the phase space. What you seem to gain from this information in the ideal case, it also degrades rapidly. Do you think it's possible to provide a rule of thumb for what information in jets is robust and resilient as an input to these neural networks. >> I think, so the reason the LundNet with more information does poorly in the resilience plot is because it has as parted of the inputs the mass of the pair of particles. This is sensitive to emissions that are further down the tree if you have a soft wide angle emission further down the tree. Even if you remove the node by imposing a KT cut, you still have some sensitivity to it in the value of the mass. Like some pair above, in the tree. So because of that, you basically have some information about the low KT, Lund plane region in the mass that you feed into the network. So that's why I think that's the main reason why you don't gain any resilience. I don't think it has so much to do with dimensionality. It has more to do with the physics that these networks get as input. In some sense, whether there is redundant information, it's not really, I don't think it's really makes that much of a difference. For example, in the input you have here the KT and the momentum fraction Z. Those inputs are quite heavily correlated. If you have delta and KT in some sense, you kind of know Z already. But it's still given as input. So we found that in practice, there was a small performance gain in including it. But you are adding in some sense redundant information. But that does not reduce the resilience. What reduces the resilience is the information that adds sensitivity to regions of the Lund plane that you want to remove. To be insensitive to non-perturbative effects, for example. I don't know if that answers the question or not. >> Thanks. Well, seeing as there was earlier question on, in general, the sorts of information that is useful to put into networks, maybe I'll give a chance for that question to be followed up on. >> Sorry, what was the question? About what physical information needs to be on the input? >> Well, okay. There was some discussion earlier on low-level input versus structuring networks and more broadly, I guess, the question about how, what choices one would want to make about what is useful information and to feed to a network. Bearing in mind robustness against modeling effects and so on. >> I think the main problem with giving it low-level information, like directly particle-level information is very difficult to add in, have some handle on robustness once you've given that it that kind of information. With structured information, the only reason we can do these plots here with resilience is we can add a cut in the Lund plane that we can increase and that slides the model up in resilience and down in performance. If you take a model like particle net, you have a model and then we cannot make it less sensitive to non-perturbative effects. At least not trivially. >> Fair enough. There is a hand raised from somebody. >> Thank you for the great talk. I was wondering, if one wants to construct X to BB tagger, you think this kind of LundNet, how one should proceed, theoretically sit well motivated to look for a two-body resonance tagger using the LundNet construction? >> Yes. In principle, it's a heavy enough that you have both boosted, the particle is boosted enough that this is a single jet with two B Quarks. I mean, you can -- yes. >> Recently, it was shown that particle net actually also improves the Higgs to BB tagging. So I was wondering if similar can be studied or it's expected. >> No. >> Okay. >> I would imagine so, yes. I mean, with Monte Carlo data it just takes a few hours to train a model. But other than that, it's a straightforward application. So I would imagine. >> Okay. Thank you. >> Thanks. >> There was a recent paper on Higgs tagging using the Lund plane. So presumably it's quite similar. >> Thanks everyone. Now, technically we're a little past the end of the session. Unfortunately, any further discussion will have to wait. But I think we did a good amount already today. Thanks again to the speakers and those who participated. And once more, our various means to carry on discussion afterwards. So in case the local organizers want to round off? >> I don't think that we had anything else to say. Thank you very much. I think these discussions were really interesting and we'll pick back up tomorrow at 3 p.m. CERN time or converted to your local time zone. Talk to everyone then or before that on gather town or elsewhere. Cool, bye.