Unknown Speaker 27:17 Perfect. We can see the slides. Unknown Speaker 27:25 That's okay. So can you hear me? Well? Unknown Speaker 27:28 Yes, we can hear you Kenny. Yeah. Okay. Thank you. Thank you. Unknown Speaker 27:32 So I'm very pleased to be here today for the introduction to the CMS Open Data workshop. We've been playing for this for a long time. And we are very excited and somewhat intimidated about that. So I'm Kathy Lassila Bahraini, I'm the same estate observation and open access coordinator. And I'm from Helsinki Institute of Physics. And I'd like to welcome you first of all behalf of the seamless open data team. And I've shown here pictures of people who are involved in the preparation of this workshop. So this if they are here, who is my co deputy, and we are competing about extremes. He's going over equator and I'm at an Arctic Circle. So we represent diversity in two extremes. He's one of the lead organizers of this workshop with Matt Bayless from Siena College has been working hard for this workshop for the for four months now. also introduce Clements langit who has been probably solving most of your problems. I particularly like working with Clements because he usually has an answer before I understand the question and that's a nice way to to go forward. But we're not alone here. So as Gabriela already showed, there's a tutorial team, this list of speakers who you would hear either in in video recording so tutorials who are responsible of their, their their lectures or daily the lessons and we have a list of facilitators so they all speakers and also all these people who many of you already met in the in the mattermost channel. There are also two CMS participants who agreed to help thanks thanks a lot for their help. And I'd like to address Special thanks to LPC event committee and coordinators, it has to be very helpful to get this deck is organicist and done. So this is a remote workshop and obviously when normally when we do this kind of workshop It is really an integral part of the work is to sit together to look at the screen and then also have the social aspect of talking about what you're doing and, and and all that so obviously we can't ask in a remote workshop. We can't we can do that. But we still want to get to know each other. So please tell something about you in this slide set that has been set up, it's also attached to the agenda. So please add a page about you figure if you like, tell something, something about to about you so we can look around them and learned about each other. And when we started planning for this workshop, we obviously started to think about goals. So what do you expect? What do you want to learn as a participant? And when what do we expect as as organizers? So obviously, for participants, we had to make some assumptions. So we think that you want to learn about the basic physics objects like photons, electrons jets, how they accessed from the same as open data, how they are identified, whether the values we have there need to be corrected, and then how to write them out. You're certainly also interested in the event selection, and triggers, which is one of the key features of the experimental physics and sometimes be tricky to understand. Then, if you want to produce real results, you will need to finally evaluate the luminosity. Otherwise, you will get no number out. And also to do real scale physics studies, you will need to understand the possibilities for large scale data processing. Also, we think they will be interested in examples of how to estimate systematic uncertainties that something that is typical for a for a analysis of experimental data. And also, what is certainly interesting is, for example, analysis that has already been carried out, and the lessons learned. So that's not all, we we want to get something as well. So what we want to do, we want to build a community of users, when you work in a large experiment, that the very important part of the knowledge is what I would call collaborative knowledge. So we rely on the knowledge that other people around us have. And when you're working with open data, Unknown Speaker 32:22 there's no such thing if you don't have a community. So that's why it's very important to build a group of people with with similar interests, so we can share the information and knowledge. And that's why we want to introduce this, as Gabriela already mentioned the Open Data forum, which could be as a discussion for different kind of things for different questions, technical issues, and also some other suggestions. In this workshop, we will propose a certain way of doing things. But however, we want to get understanding of the usage patterns and needs. So please give us that feedback. So tell us what kind of what kind of tools you would like to use in your work. And also, we'd like to get feedback on what is missing in the documentation and, and the tutorial material. And there's certainly quite a lot. And based on that. Also, we want to build a proper seamless Open Data User Guide. And I'll come to that a little later. Okay, so these are quite ambitious goals. So do reach them. So that's a good question. And as you all know, this is the very first workshop of this kind. And people have been working quite hard to get the first tutorial material available. So it's natural that some of some of that is still version 00 and it will get improved, but bear with us this is the first trial. Now we have these goals. So how to get that I will say a few words about workshopped the structure, and the working methods. As we build the program, we build it thinking of you writing your own analysis. So we're trying to follow the steps of an existing simplified example, which you probably have already seen that is the Higgs the big a to tell leptons. It is very typical two step process we have the first part which reads in the open data files and writes them to a small format. This first step is almost obligatory always done in the same as computing environment and with similar software. And then as an output we have the smaller reduced files and the next step is to analyze that format. So that's that may or may not be in the in the Same as tight environment, you will probably would like to do it in your own environment. So that's perfectly possible. So doing going through the tutorials will expand this simplified example in some areas. But to get an idea what it needs to take this kind of example to the research level, obviously, we won't be achieving this in in this three days. But at least we want to give you some idea what kind of issues you would need to add to a simple example analysis. We set up we've sent us you know, a set of mandatory pre exercises. And they are meant to cover the technical issues, how to get you ready to the work, and to get you know, the tools. So very importantly, as we didn't ask, asks you to give us some replies or that kind of thing. So very importantly, you see the poll on the zoom window. I think you see that. And reply to that poll. The poll is asking if we're able to go through the pre exercises, if you really have a working environment available, and what kind of difficulties Did you encounter when you when you went through. So this is very important, we need this information in order to adapt the the programs and tutorials in in next next next three days. So as I said, the pre exercises are the goals set and test your working environment before the workshop. Those you've done it, you certainly notice that it requires some time and some effort. Unknown Speaker 36:53 Also, the previous processes were there to come some background information on tools in use at this workshop. So many thanks for those of you who have done them. In this slide have two screenshots one to the left is the one you get from the Docker container when it starts, the one to the right is the one you get from the VM image. That's the work the working environment there. If neither of these two is familiar to you, you are in trouble because you will not be able to do the exercises in this workshop. However, for this time, we have a temporary solution. We have a browser based Docker built, which this is a ready made environment, which doesn't require any installation. It is nothing is saved after the session, but it's still behaves like the normal you can do the normal exercises there. So if you clink, click to that link you will find few very streamlined tutorials one is the one is taking the very basic things from the pre exercises and you can go through very quickly. If you click it through, you won't learn anything. So you should still go back to the pre exercises and understand what you're doing. But at least you will have the working working environment available. In addition to this learning environment where we have tutorial on the on the left side of the screen and the the Docker environment on the right side of the screen, we have a Docker environment another environment available and as for those who really haven't succeeded to to build that your own environment you can you can start it with these instructions on this slide. Again, I remind you with one session is maximum four hours and be aware nothing is saved after the session. And you should really make sure that you get your working environment either be in a Docker installed properly on your computer, so that you can work on the same as open data also after this workshop. But with these two options, you can you can you can follow the tutorial and and and learn. So for the tutorial workflow, I have taken this from the physics object demo video which will you will see later today. Unknown Speaker 39:29 So the two first Unknown Speaker 39:33 lessons today is one is the data scouting so you will learn how to find the data and as you will look a little bit inside the data file as well. Then we have a lesson about trigger manipulation. So you learn to select then later on we will go to physics objects which is really the core of this tutorial. So firstly, we will have introductory video and then You have two different lessons and physics physics objects, one is mainly about electrons and photons, and common access methods. Then you have in in in physics, object two, you have more complicated objects like jet, and also attacking and correcting. And then we have lessons about pre selection and scheming how to reduce the the output files to something you are you are interested in. And then an important section about the identification. So how do you identify a physics option? And how do you select them. And later on, we will address also how then to work on this reduced data to plot the quantities you would like to have. So that's what we're going to do. And then on Friday morning, we have a special special session and CMS analysis on a cloud environment. And you will that you will have an opportunity to learn how to run a CMS open data processing in a real scale commercial cloud environment. And it will be all hands on and you will get a temporary account details of that will be sent to you to you later. So we got some resources for this through archiver archive project in which some open data is a use case. And for that we will be using Kubernetes engine on Google Cloud Platform. And I would really like to encourage everyone to to participate even if you don't know what Kubernetes is, as this is really the occasion to learn about it. So really don't miss it, it will be a hands on. And it's really a great opportunity to be able to, to rehearse on real commercial cloud environments. And then on Friday afternoon, we have a features demo or about the analysis description languages. So you can see further information about the about the this project in this link. But this is not part of the CMS open data distribution. But we thought that you may find it interesting. And it will be built around the Higgs to Tao example. So it's connected to to the examples that we're working on in this workshop. For the schedule, you can find the schedule in in the schedule place and also in the Indigo agenda. So as Gabrielle mentioned, we will have live Hanson sessions and demos. On mornings they will be through zoom, which are dedicated mattermost channels. So you can use zoom and mattermost to communicate on these topics. Then enough to noon, you will work on your own pace. They are lecture exercises and demos. And again, you will you can communicate through dedicated mathematics channel. And we will leave zoom as an open office for everyone. So they will be someone there. You can drop in you can ask questions, and if needed, you can share your screen so we can discuss it more more in detail and Transcribed by https://otter.ai