Monday Morning session
-----------------------
(Eastern Time)


[Enrico Fermi Institute] 11:09:56
Everybody can hear just fine. Yeah, Because I'm sitting here I'm just getting picked up by the mic on the ceiling. Okay?

[David Mason] 11:09:59
we can hear.

[Enrico Fermi Institute] 11:10:01
Great. Great. Thank you. Okay. So if you go to the next slide, So the first area that we want to cover is looking a little bit on the what we're doing in terms of workflows on Hbc and cloud and to do that maybe at the very first we look at what resources are we actually looking at

[Enrico Fermi Institute] 11:10:19
here at the right now So if you look at what's available for us, Hbc: we have broadly 2 types of facilities, and they have different new user experiences in terms of how you approach them how you can use them and there's the leadership class facilities funded by doe argon

[Enrico Fermi Institute] 11:10:38
Oakridge, and so on, and they are kind of.

[Enrico Fermi Institute] 11:10:41
They're very restricted. They focus on accelerators to get the most flops for good power budget.

[Enrico Fermi Institute] 11:10:47
They don't care too much about making it easy for the user.

[Enrico Fermi Institute] 11:10:50
You are expected to adjust your work, for to be able to run there, and they target large scale workforce.

[Enrico Fermi Institute] 11:10:57
This is the kind of stuff that you can do. Nowhere else.

[Enrico Fermi Institute] 11:10:59
You go to the and then the the user, facilities. Nurse tag, the exceed excess sites, which is they usually a mix.

[Enrico Fermi Institute] 11:11:11
Some of them are straight out like they look like Hpc compute clusters, and how they build.

[Enrico Fermi Institute] 11:11:17
Some of them have interconnects There might be a mix of gpus and cpus, mostly still cpus, and they take all comments.

[Enrico Fermi Institute] 11:11:25
Basically you can get an allocation. You can get going.

[Enrico Fermi Institute] 11:11:28
They work with you to try to make it easy to, so you can get on the facility and get your work next slide. And at any time if you wanna have a comment or a question, please just ask it we're not supposed to go.

[Enrico Fermi Institute] 11:11:42
Through the big presentation. So it's discussion. Yep.

[Enrico Fermi Institute] 11:11:48
so, and then looking at that, with that in mind, What are we currently running there?

[Enrico Fermi Institute] 11:11:53
So this is the right now, and the green. If you see a green that's straightforward copy from the charge, there's a question I was asked to ask, so to answer that here what we're doing right now so for cms.

[Enrico Fermi Institute] 11:12:05
What we're doing is we basically anything that starts with a generator step and has no input except for pile up.

[Enrico Fermi Institute] 11:12:12
We currently assigning to a lot of us Hbc sites You don't have to do anything special to work for gets injected with automatically.

[Enrico Fermi Institute] 11:12:19
You Can run there and that's that was the majority of run to Monte Carlo.

[Enrico Fermi Institute] 11:12:24
Workflows and the run. Three-month caller work was kind of, not a political sense, and for Atlas it's primarily simulation.

[Enrico Fermi Institute] 11:12:33
Usually they are specifically so to Hbc size. So you select the bunch of, I guess you pick.

[Enrico Fermi Institute] 11:12:40
This is this: This is a good fit, and then you assign it there, and it runs.

[Enrico Fermi Institute] 11:12:42
It, and they also have the goal to expand on that the limiting fact factors.

[Enrico Fermi Institute] 11:12:53
If in what workflows you target at Hbc are usually based on machine characteristics, So Cpu architectures, certain Hpc: I mean intel is easy to use when it gets beyond that, currently still a little bit difficult did you have a Gpu Accelerator: how much memory.

[Enrico Fermi Institute] 11:13:10
You have per call, remember, Perkins, with Kl. Kind of the dying breed, is kind of disappearing a bit.

[Enrico Fermi Institute] 11:13:16
So it's usually okay. Now then, network connectivity, And it's not just tune from the note, like by the Scf falls It's also for the facility as a whole.

[Enrico Fermi Institute] 11:13:27
Sometimes Hbc: Yes. Facility, restrictions or firewall limits where you once, when you scale up, you hit scaling limits where you basically overlook go to the pipe because they don't.

[Enrico Fermi Institute] 11:13:38
They're not used to such data. Intensive workflows.

[Enrico Fermi Institute] 11:13:41
So again I quick question back when we were talking about Cpu architecture and loading point up operations.

[Enrico Fermi Institute] 11:13:47
Yeah, what in particular is making that hard from your perspective. It's basically showing the arm or something going to arm is not harder.

[Enrico Fermi Institute] 11:13:57
It's just a matter of extra work to validate the platform.

[Enrico Fermi Institute] 11:14:00
Okay, So it's really about numerical outcomes and making sure that things agree between Yeah, it's basically a one-time investment of basically, being able to support the platform that's true, on all of them though cause that's not true for the the yeah, Olcf: was a bit is a bit of

[Enrico Fermi Institute] 11:14:18
a also have his power, you know. Cms just finished the power validation.

[Enrico Fermi Institute] 11:14:23
Okay, So you. So the the requirement, then, is the the effective requirement is for a given sort of Cpu architecture.

[Enrico Fermi Institute] 11:14:32
The upstream code has to be valid. It Well, firstly, you have to build your code.

[Enrico Fermi Institute] 11:14:38
I've got to be buildable, and then and then you need to run like whatever physics validation you produce, Some samples.

[Enrico Fermi Institute] 11:14:43
And then the physicist, the physics group, whatever in the global collaboration, needs to go in and say, this is actually okay.

[Enrico Fermi Institute] 11:14:50
So there's a depend. So, therefore, there's a dependency and a on external to to you.

[Enrico Fermi Institute] 11:15:02
requires labor from outside of us, because the Us.

[Enrico Fermi Institute] 11:15:07
Can't just say this platform is validated. The experiment, as a whole has to say that so, coming back to the why, you couldn't do pile up during digitization, because you had to read extra remotely you Can do it?

[Enrico Fermi Institute] 11:15:24
And that's we spot that basically we don't current.

[Enrico Fermi Institute] 11:15:28
We currently don't run anything that needs primary Newport but Pilot is supported because Pilot is is so unevenly distributed because of its size that we anyways for normal production even on some tier 2 sites they also read it remotely, so that's the use case that the support anyways to

[Enrico Fermi Institute] 11:15:48
the x, So the Hpc. Just expanded. So it's not a limitation.

[Enrico Fermi Institute] 11:15:53
No, yeah, I mean, eventually, as you scale up, it comes. Then the network connectivity comes in.

[Enrico Fermi Institute] 11:15:59
We have to look at that, for instance, at Frontera we're hitting scaling limits because of remote Parliament.

[Enrico Fermi Institute] 11:16:06
I thought it frontier. There was a limit on the amount of yeah, the amount of remote access you could do from around.

[Enrico Fermi Institute] 11:16:13
You can see. Yeah, So we actually hit the external connectivity limit of the facility.

[Enrico Fermi Institute] 11:16:19
And as I recall, Frontera, they mostly consider their ethernet to be like a control plane.

[Enrico Fermi Institute] 11:16:26
Each node in the rack is connected at one giving, and each rack is connected.

[Enrico Fermi Institute] 11:16:31
At 10 years ago. I think something like that to your core.

[Enrico Fermi Institute] 11:16:36
So in that case you probably weren't doing a lot of pile up at front.

[Enrico Fermi Institute] 11:16:38
We were reading Pilot: Okay, So you aren't hitting, I mean, But you are running.

[Enrico Fermi Institute] 11:16:42
You were used. You're accessing your pilot data sets by Ethernet, though.

[Enrico Fermi Institute] 11:16:48
Yeah, So you aren't hitting. You're still hitting the overall capacity of the of of attack.

[Enrico Fermi Institute] 11:16:54
Then, yeah, like 100 gig, or something like the well, we in the beginning, we hit the we actually hit the the scaling limitations on Fi: one trying to get okay.

[Enrico Fermi Institute] 11:17:03
And then they. They limited us. But it's it's fine, I mean, the limit is not restricting.

[Enrico Fermi Institute] 11:17:10
The limit is still. Hi Enough that we don't have a problem using up the allocation over email.

[Enrico Fermi Institute] 11:17:14
We just couldn't do what we tried to do. Which is, do these 100 K core groups?

[Enrico Fermi Institute] 11:17:20
Because at that point the traffic was too high. Yeah.

[Enrico Fermi Institute] 11:17:27
Oh, yeah, I was at the network connectivity. So we discussed the facility potentially for facility limits.

[Enrico Fermi Institute] 11:17:35
Here, then another limitation can be storage. A. If you use it, for if you use shared storage for input out to date output data, you would have to integrate it into the data management solution, because you Basically, have to prepase later you want to process and then to stage out the data, later, awesome

[Enrico Fermi Institute] 11:17:51
criminals from the job execution part 2 through your own storage, but also another.

[Enrico Fermi Institute] 11:18:01
The consideration is whether job scratch is local or shared.

[Enrico Fermi Institute] 11:18:04
For instance, the Lcf. Usually have only shared storage.

[Enrico Fermi Institute] 11:18:08
They don't give you any local storage. Most of the, And is that funded side access in fronttera?

[Enrico Fermi Institute] 11:18:16
They give you local scratch, and that is another area where you can run to scaling invitations.

[Enrico Fermi Institute] 11:18:22
and looking a bit ahead. So this is what we're doing now.

[Enrico Fermi Institute] 11:18:26
If you look ahead to the Hrxc area like assuming the resource mixed shifts, and we get more Hpc.

[Enrico Fermi Institute] 11:18:36
Resources can make. Can we have still a forward to restrict the workforce?

[Enrico Fermi Institute] 11:18:42
Everyone there. Oh, is it? Is that basically restricting ourselves in terms of what we can do operation.

[Enrico Fermi Institute] 11:18:53
And right now we do what's easiest, And that's that that just came out of starting up this.

[Enrico Fermi Institute] 11:18:59
And of course, you start up with what's easy to just get something to run.

[Enrico Fermi Institute] 11:19:03
But as you became experienced with it, and as the amount of resources goes up, that might not be enough to keep scaling up, I'm to take advantage of opportunities.

[Enrico Fermi Institute] 11:19:15
No, from Shaqi

[Shigeki] 11:19:18
just not a curiosity. This is sort of the state of trying to get to work at the end Pc.

[Shigeki] 11:19:26
Centers as they exist now. Is there any general motivation on the Hpc site side to sort of meet us halfway, and and or do they recognize that that that that maybe this is the future they really need to wreck to to meet the external workflows

[Enrico Fermi Institute] 11:19:44
Hey? That is there is, but you have to again distinguish between the user facilities and the Lcf: So with the user facilities, we've had very good experience, especially with nurse working with them.

[Shigeki] 11:19:46
halfway, and a common sort of way

[Enrico Fermi Institute] 11:20:00
nurse. We started like 2,016 cms, had our first allocation there, and we started to work, and we started to target these type of work.

[Enrico Fermi Institute] 11:20:10
Frozen, don't we? We tested remote data access and it was Kilobytes per second to each note and the claim the Corey design goal was gigabit to the knowledge.

[Enrico Fermi Institute] 11:20:22
From Ecf. Next time and then, obviously something in the stack didn't work.

[Enrico Fermi Institute] 11:20:25
So we worked with them for multiple years, And now it's actually we're kind of there.

[Enrico Fermi Institute] 11:20:29
Where we're supposed to be. Everything works great, so they are very interested in work with us.

[Enrico Fermi Institute] 11:20:36
the Lcf. I don't think we have that relation that that relationship

[Steven Timm] 11:20:40
cool.

[Enrico Fermi Institute] 11:20:42
I. It would be great if we had, but we don't

[Steven Timm] 11:20:46
So nurse goes also, already going over for Nurse Town, which is the machine It comes after pearl mudder talk to.

[Steven Timm] 11:20:54
oh, what do you call it? High throughput people, and see, What do we need for the next thing? So they're talking.

[Steven Timm] 11:20:59
They start numbers, they're talking to doing whatever so those means already happening for the next round.

[Enrico Fermi Institute] 11:21:04
Yeah.

[Steven Timm] 11:21:05
But the but the others, as you say, are not happening at the moment

[Enrico Fermi Institute] 11:21:08
Yeah, the feedback from nurse we got is that they're very interested, supporting data, intensive science And they took what they learned.

[Enrico Fermi Institute] 11:21:16
And Corey to running these kinds of workload staff They take that into consideration for designing the next machine Yeah, and in fact, I think what he will will But he will say hopefully, say something data, intensive science assume data intensive pulling stuff from the land because that's a different it's a

[Enrico Fermi Institute] 11:21:34
different issue, right? I mean, Yeah, it can be streaming things. You mentioned that I yes, as they scale up, you know, we want to put more workflows on.

[Enrico Fermi Institute] 11:21:45
We have to be cognizant of the the the intrinsic design limitations of the clusters, I mean they are intense of sign running data, intensive science on a facility that means you stream everything, in and stream it out or you need local storage to to to cash that value process

[Enrico Fermi Institute] 11:22:01
later that's that's the simple These are the 2 options here, and that's what I mentioned about storage.

[Enrico Fermi Institute] 11:22:10
It depends what each facility gives you. If you don't have a lot of attached storage, and you can get only a small storage board I've compared to your Cpu quota then you don't have a lot of options in terms of how to make use

[Enrico Fermi Institute] 11:22:23
of that Cpu quota. If you, if you do get a lot of storage, and you can run it like like we run regular production on a grid side.

[Enrico Fermi Institute] 11:22:34
We pre-stage with in our data management systems. We run you stage back things back out that makes things simple.

[Ian Fisk] 11:22:40
oh!

[Enrico Fermi Institute] 11:22:41
You say, Do you have an idea of like what the scale there would be to make it?

[Enrico Fermi Institute] 11:22:45
Make these facilities more cool. I mean, I know the all park figure we usually say Cms side with a sizable amount of Cpu would be would like to have like 500 TB space Roughly I I'd say simply hundreds of terabytes Yeah, I mean Yeah, we could use probably

[Enrico Fermi Institute] 11:23:02
3, 4, but around that that point, if it's less than 100, it gets difficult.

[Enrico Fermi Institute] 11:23:06
Yeah, And that's usually where we are with the experience from Lcc. Grants, for instance, usually 150 is kind of the cut of. That's not a lot

[Enrico Fermi Institute] 11:23:23
Of course it would be nice if we ask for a large storage allocation, and just, you know, you can ruse your storage element Yeah, you know.

[Enrico Fermi Institute] 11:23:29
Treat it like another site, but then that also comes into You know that the storage allocations over long periods of time to expect, rather than a yearly kind of allocation

[Enrico Fermi Institute] 11:23:43
Yeah.

[Ian Fisk] 11:23:43
Yeah, I'm I'm wondering if there, if somehow the concept is streaming in or to local storage, is a distinction without a lot of a difference.

[Ian Fisk] 11:23:52
It's more about the time. Scale, right? The they have a 100 TB of data.

[Enrico Fermi Institute] 11:23:54
Yeah.

[Ian Fisk] 11:23:56
You're either streaming it directly in real time, or you're staging it and staging it out because 100 TB of data is not a ton of space on a large scale.

[Enrico Fermi Institute] 11:24:01
Yeah, there's a small technical difference, because one just you just keep the data in job scratch.

[Enrico Fermi Institute] 11:24:12
And then the other case. You have to place it somewhere that's independent of job execution, and that that can have a technical difference, because I don't think, for instance, nurse doesn't count sharp scratch against your scratch border.

[Ian Fisk] 11:24:27
Okay.

[Enrico Fermi Institute] 11:24:29
While if you, if you put in something via the Dtn via the data transfer notes that does count against. And I think a lot of it's also cultural right in terms of

[Enrico Fermi Institute] 11:24:42
Not commonly seeing flows that stream data for better, and what what most people expect and toward are data coming and through Dtn's the file system.

[Enrico Fermi Institute] 11:25:00
And some time later processes. So the

[Ian Fisk] 11:25:04
But somehow, like there's a balance here that says between the networking and the local, storage and the Io of the jobs that you need to have a suspicion amount of.

[Ian Fisk] 11:25:12
I, or to keep the resources busy and so. It's not much more complicated like.

[Ian Fisk] 11:25:20
And in the test to be a convergent system, in the sense that you don't have, you're not gonna be able to have the storage forever.

[Enrico Fermi Institute] 11:25:27
Okay, Yeah, it's you, of course. Write that the Yeah, it's a It's a It's a storage management problem more than it?

[Enrico Fermi Institute] 11:25:35
Is, it's a storage problem, and you

[Ian Fisk] 11:25:39
Was that with I'm claiming it's a data delivery problem whether it's being streamed in or whether it's being cast from stream.

[Ian Fisk] 11:25:45
It's that they are effectively that both of them are the same problem, which is that, How do I get data?

[Ian Fisk] 11:25:52
And if I look at the time scale of a if something streaming in, it's sort of a real time problem, and it's it's it's a little bit simpler in the sense that I it's a it's a network, it's a I know the I o when there's

[Ian Fisk] 11:26:03
not a long like there's. But if I expand it out to the time scale of even just a couple of weeks, it's still I staging it in requires a certain amount of networking staging now.

[Ian Fisk] 11:26:13
How much time do I have this particular resources? It

[Enrico Fermi Institute] 11:26:17
So dark. Doesn't this depend on the scheduling modality of of the Hpc.

[Enrico Fermi Institute] 11:26:22
because like, because they they tend to come.

[Enrico Fermi Institute] 11:26:26
You know you tend to get put into a queue.

[Enrico Fermi Institute] 11:26:29
You're waiting for another. And then suddenly, you have on use.

[Enrico Fermi Institute] 11:26:37
It's simpler If If you'd strange, you remove the data management hard from the equation, because you assume you just can pull it when you need it.

[Enrico Fermi Institute] 11:26:48
But you can't do that if you're being scheduled for you, where you're suddenly getting 50,000 Course you've been waiting for 2 weeks, Then on Monday morning they give you 50,000 cores.

[Enrico Fermi Institute] 11:26:57
And you've got no data there right Well, if you assume that these 50,000 cores can access the data via streaming, then you can hold them.

[Steven Timm] 11:27:04
Great

[Enrico Fermi Institute] 11:27:06
Yeah, You hold them somewhere else, and you don't need to schedule the data.

[Enrico Fermi Institute] 11:27:09
So data deliveries on demand. If you efficiency eventually, you hit scaling limits.

[Steven Timm] 11:27:09
Right.

[Steven Timm] 11:27:12
Good.

[Enrico Fermi Institute] 11:27:18
But that's more a question that then the network comes in, and how our own sides are dimension.

[Enrico Fermi Institute] 11:27:23
This is still the introduction we have the Dhc focus area.

[Enrico Fermi Institute] 11:27:27
We also have a couple, so I don't want to go too deep into it.

[Steven Timm] 11:27:28
Okay.

[Enrico Fermi Institute] 11:27:30
But I think the point is, if you think about architectural point of view.

[Enrico Fermi Institute] 11:27:33
Having the data that you need on site for your your could be enormously, because it's probably sized you.

[Steven Timm] 11:27:37
Alright.

[Enrico Fermi Institute] 11:27:43
You hope that if uses that the site is sized, appropriate for course may or may not be true on cases, and he also of of reliability.

[Steven Timm] 11:27:48
Great

[Steven Timm] 11:27:52
Okay.

[Enrico Fermi Institute] 11:27:53
You want to do is wait 2 weeks. Get your 50,000 cores if you're ped out.

[Enrico Fermi Institute] 11:27:58
Today was the day that there was this So we got a couple of 2 questions.

[Ian Fisk] 11:27:59
But

[Enrico Fermi Institute] 11:28:04
She got again

[Steven Timm] 11:28:05
Yeah. So you have to consider it only the size of the file system.

[Shigeki] 11:28:08
Hello!

[Steven Timm] 11:28:12
Sorry, but also the reliability of the file system, and also the eye ups of reading the file system, because we managed to scramble the luster file system pack pretty badly several times.

[Steven Timm] 11:28:24
Thanks Larry I'm not sure it's Lester.

[Steven Timm] 11:28:26
But anyway, I mean, it's just scramble They're scratched very badly.

[Steven Timm] 11:28:29
Call times in cool motors, having issues too. It's not our fault.

[Steven Timm] 11:28:33
But there! Oh, spacecraft systems are not always meant to take seamless level.

[Steven Timm] 11:28:38
I o Yep, we have to be prepared for this.

[Enrico Fermi Institute] 11:28:43
Oh, I always especially if you look at generator type or flows, is not great.

[Steven Timm] 11:28:43
Something won't be

[Enrico Fermi Institute] 11:28:49
Not they're basically built for desktop and we scale it up to gridlock.

[Enrico Fermi Institute] 11:28:54
If we have Joe coming

[Shigeki] 11:28:56
Yeah, I guess my, my my fundamental question is, is sort of all of these issues are sort of best done at the design phase of the Hpc center.

[Shigeki] 11:29:06
And I'm kind of wondering. Does the community have an official avenue in which to present our issues and and and work with them at the design space of the Hpc center, where where we can we can both agree on on on the the the mechanism for moving the data in and out

[Enrico Fermi Institute] 11:29:27
Not really, not at the moment I think the the user facilities are at least aware of what we're doing.

[Enrico Fermi Institute] 11:29:34
The type of work we're doing because they see this more often.

[Enrico Fermi Institute] 11:29:37
The Lcf. I don't think we are not not at this level, because they are really.

[Enrico Fermi Institute] 11:29:45
They're targeting these things. Give me a 1,000 notes from my letters QCD.

[Enrico Fermi Institute] 11:29:50
Calculation or protein folding, or whatever they're doing.

[Enrico Fermi Institute] 11:29:52
Let's stay out the target market basically

[Shigeki] 11:29:55
but I mean, probably that's because of the fact that that's the target market that they see.

[Shigeki] 11:30:00
And it's sort of a chicken and egg problem.

[Shigeki] 11:30:01
They're not going to see the high throughput issues, because it's so hard to do it, and they're not gonna do anything about it because they just don't see it it's it's really a chicken and end.

[Enrico Fermi Institute] 11:30:11
But then falsehood on their Congressional mandate.

[Enrico Fermi Institute] 11:30:13
So why would they go against the Congressional mandate that I think this is also a discussion.

[Enrico Fermi Institute] 11:30:18
That's that's we do. That's too high level for us to have any imported.

[Enrico Fermi Institute] 11:30:25
So I know they have discussions going on at the very high level for them to support these type of science better.

[Enrico Fermi Institute] 11:30:33
But until there's actually a as Brian said, as A, until there's actually a mandate for them, and sometimes that they're supposed to support us better.

[Enrico Fermi Institute] 11:30:41
I don't think they're going to move a lot in terms of making making their facilities work better computation that they're doing so What I mean.

[Enrico Fermi Institute] 11:30:53
Is that Apsu works with Alc. F. About taking data from their light source and streaming.

[Enrico Fermi Institute] 11:31:04
I believe nurse is in conversations with a couple of the West Coast light sources, and I remember one talk I was at.

[Enrico Fermi Institute] 11:31:13
I think Olcf was talking about doing that also from like the neutron source, and some of the accelerators on on campus.

[Taylor Childers] 11:31:21
can I? Right Yeah, bye? Sorry: So I was just.

[Enrico Fermi Institute] 11:31:22
So we have a comment from from Taylor: Correct: Yeah.

[Taylor Childers] 11:31:28
Gonna And I mean Doug brought up another good point. But so, just to comment on a few of the things.

[Taylor Childers] 11:31:36
so I'll go to Aps first. So our new Polaris machine actually has 60 some odd nodes dedicated like we purchased in addition for the Aps for real time processing so the idea is that the you know, workflows there have live detectors that are

[Taylor Childers] 11:31:59
taking data. And we want to see if we can get those scientists on our machines when it comes to the design process for the new machines.

[Taylor Childers] 11:32:10
Right, for instance, with Aurora we had the Aurora Early Science program.

[Taylor Childers] 11:32:16
Olcf had a similar program same for pro mutter.

[Taylor Childers] 11:32:20
Those are entirely designed to how communities get on, you know.

[Taylor Childers] 11:32:27
Get early access to our machine that occurred. Atlas submitted one of those projects, and has had myself, and, in fact, a postdoc funded through Alcf to help mostly event generators.

[Enrico Fermi Institute] 11:32:28
Yeah.

[Taylor Childers] 11:32:46
At this point, user Aurora moving forward. So there is a program for helping to be involved in the early process of design for the machine.

[Taylor Childers] 11:33:02
So, for instance, with the Atlas case, Mad Graph is constantly reported in the Intel meetings for Aurora.

[Taylor Childers] 11:33:11
As far as performance and capability, because, you know, we're one of the early science project for projects.

[Taylor Childers] 11:33:23
but the other, I would say the other end of the spectrum.

[Taylor Childers] 11:33:27
There is. Of course, if you're a big user, right?

[Taylor Childers] 11:33:30
And I think Atp has always had the potential to be big users at the Lcfs.

[Enrico Fermi Institute] 11:33:31
Okay.

[Taylor Childers] 11:33:39
granted. There are hurdles, especially now with architectures, but if you're a big user, you have a big sway, right?

[Taylor Childers] 11:33:49
I mean, the lattice. QCD. Groups. They can use our entire machines.

[Taylor Childers] 11:33:53
They use them effectively, and of course we panander to them, I would say unofficially, I guess, but I mean, they get huge sway at our meetings because they are able to effectively use our our resources and same for like, I mean everybody knows the hack group solman's group

[Taylor Childers] 11:34:12
and the climate scientists, right material scientists, the software that our community base where they're easy to port to the next generation.

[Taylor Childers] 11:34:23
Hardware. They move quickly. The communities move quickly, and they all use similar hard software.

[Taylor Childers] 11:34:28
They get a lot of pull in those discussions. Now, the the last thing I wanted to mention, the difference between nurse and the Lcs, I would say, is that Lcfs.

[Taylor Childers] 11:34:42
get less. They have less

[Taylor Childers] 11:34:48
Funding for deploying a lot of user centric hardware.

[Taylor Childers] 11:34:54
So we've been talking to Alcf. I don't know how long for trying to get up a you know.

[Taylor Childers] 11:35:01
Aside cluster for kubernetes and stuff like that where you guys could run all of these services. And, as far as I can tell her up, team, our operations team is just swamped with stuff to do and so that becomes a limiting factor for us

[Enrico Fermi Institute] 11:35:21
Thanks, Taylor. I think that was kind of the the direction of my comment.

[Enrico Fermi Institute] 11:35:26
We We have to make sure, you know. Pretty good at Lcf. If they build a machine to be Hpc machines, there's a you you want to make yourself look like the qCD folks and do Hpc.

[Enrico Fermi Institute] 11:35:39
Work. It's it becomes a huge. Ask for them to to try to do Htc: type Workforce because of the exact sort of pressures you just outlined.

[Taylor Childers] 11:35:51
Yeah.

[Enrico Fermi Institute] 11:35:52
So we have a couple more questions on Zoom. Let's take these questions and then move on to the Cloud section column.

[Paolo Calafiura (he)] 11:36:02
I guys. So it's actually a comment following up on this.

[Paolo Calafiura (he)] 11:36:07
And I I find that if I'm useful sometimes to think, to put boot myself in the shoes of the other of the other partner, when when we have any discussion, I mean think think it from the point of view of of an Lcf today, basically Hp.

[Paolo Calafiura (he)] 11:36:25
Is using Hpcs at arms. Length. Let's be honest, I mean we We have some nice tier, 2 like facility.

[Paolo Calafiura (he)] 11:36:31
A nurse. We we we are pretty happy with with the way Nasty is is working, but you know QCD.

[Paolo Calafiura (he)] 11:36:42
we're talking about. If the Lcf.

[Paolo Calafiura (he)] 11:36:44
Did not exist, today will not be able to assignments.

[Paolo Calafiura (he)] 11:36:46
And so that is something that then, as yes, anyone I mean, we'll consider I mean, am I fundamental, or am I just one of the 25?

[Paolo Calafiura (he)] 11:36:55
32 in the in the in the Federation.

[Paolo Calafiura (he)] 11:37:00
So it is. I think I think, at least for the next generation of Hpcs, not Oururora, but the one after our own, the ones which will start in the twenty-firties or so maybe we have maybe, we have a shot but we will need to make 2 today to make a

[Paolo Calafiura (he)] 11:37:21
company which I don't know if we are ready to make today, which is to say that the at least in the Us.

[Paolo Calafiura (he)] 11:37:29
the Hpcs would become a fundamental part, and not just a beyond the pledge accessory to our 2, 1: one yeah, That's that's that's also because of the enormous amount of effort we would have to put as is, this is, it being said, a couple.

[Paolo Calafiura (he)] 11:37:47
Of times, to to be able to exploit these architectures.

[Paolo Calafiura (he)] 11:37:51
So I think either we jump or we or we stay with the with our friendly talk, and there's people not to work there

[Enrico Fermi Institute] 11:37:58
Okay.

[Enrico Fermi Institute] 11:38:06
in comments.

[Ian Fisk] 11:38:07
yeah, yeah, my comment was sort of along the lines of I've also responded to Shaggy, And from, I think, one of the things we need to be a little bit careful of is sort of what our expectations are and the biggest one is that these facilities were not built for us and that we know.

[Ian Fisk] 11:38:23
that but that doesn't mean that they can't be useful to us.

[Ian Fisk] 11:38:27
at the same time, we can't expect to use all of them, and I think well, there's a frontier is 10 times the size of the Wcg.

[Ian Fisk] 11:38:36
Combined in terms of of floods, and so wouldn't even want to use the whole thing.

[Ian Fisk] 11:38:42
but from the standpoint of like the stability of the palaces, my Steve was saying, the the scale of file system.

[Ian Fisk] 11:38:49
I think all these things are things that we actually can measure, and benchmark, and look at how much of a of a Lcf.

[Ian Fisk] 11:38:55
We might reasonably able to take advantage of in the workflow that is not designed for it.

[Ian Fisk] 11:39:00
And instead of having an expectation that they will be somehow different, they will design these facilities for us.

[Ian Fisk] 11:39:04
They won't. They built for already, And the question is like, Can we still is, is a Ferrari still useful to us at some scale, and the only really way to do that is to measure it is to have a a Benchmark?

[Ian Fisk] 11:39:16
Which we can use, says this is how many resources you can expect to take advantage of before you exceed the local file system where the local network, or the local whatever else, and it seems like this is a tractable problem and These resources exist.

[Ian Fisk] 11:39:33
We we can over the course of time. If we demonstrate that we use them at all, maybe we'll have an influence on the next generation to make them useful for us, too, but I I think that it's not we're not gonna be a situation where we can have basically all of our stuff looks like ai

[Ian Fisk] 11:39:49
and so it's it's a simple transition over to Hbc.

[Ian Fisk] 11:39:53
We're not gonna like our stuff. Looks like our stuff.

[Ian Fisk] 11:39:56
It's not gonna look like lattice is not gonna look like, Yeah, I necessarily completely.

[Ian Fisk] 11:40:00
But we I think, if we say we know what our workflows look like.

[Ian Fisk] 11:40:04
How many of them could we run? It's the possibility that we could get a lot of work done

[Enrico Fermi Institute] 11:40:11
They? I think that's a good one more thing from Oh, there's another one. Yeah, and then we're gonna move on.

[Enrico Fermi Institute] 11:40:17
Okay.

[Dale Carder] 11:40:20
yeah, I was just kind of chime in. There was a question about you know. How does this compare to what you see at the other light sources like Apsu and Alcf: the closest analogy is probably Lcl: Lc: Ls: 2 which is It slack and with The compute

[Dale Carder] 11:40:36
some of that being it, nurse. So there's a a review underway much like How?

[Dale Carder] 11:40:42
Yes, net Did requirements review with high energy, physics.

[Dale Carder] 11:40:45
There's a review under way right now, with the basic energy sciences.

[Dale Carder] 11:40:49
Ves, I think that should be published at least in draft form, on a manner of weeks.

[Dale Carder] 11:40:54
So may that may be something you could look at. Look at in the timeline of this workshop, and it's deliverables

[Enrico Fermi Institute] 11:41:01
Great. Thanks. Yes, Okay, let's move on away from Hpc.

[Enrico Fermi Institute] 11:41:08
For the moment and to cloud, I think, and I know we'll go through these slides

[Fernando Harald Barreiro Megino] 11:41:14
hi, Yeah, So now it's the similar discussion. But for Cloud, what are the work that can be executed?

[Fernando Harald Barreiro Megino] 11:41:26
cloud resources and before getting there, what we have been mostly considering during our previous discussions for the blueprint process I like the Major Commercial Cloud Provider, like Google Amazon Microsoft, which are the ones that we have been really testing in the in the last couple of years.

[Fernando Harald Barreiro Megino] 11:41:45
and here all of these have different service levels, so they provide infrastructure as a service where you would run that machine install, and then whatever you want platform as a service for higher level software, as a service but nowadays, all of these clouds also have emerging intermediate levels in particular

[Fernando Harald Barreiro Megino] 11:42:05
container from service. Like, for example, coordinates were other versions of Kubernetes, like service container executions.

[Fernando Harald Barreiro Megino] 11:42:16
on these services along like cloud, native approaches, to integrate our experiment.

[Fernando Harald Barreiro Megino] 11:42:24
frameworks across the cloud providers, so that all of them look the same.

[Fernando Harald Barreiro Megino] 11:42:30
Yeah, And then the other thing that is the other cloud provider that this being lately is

[Fernando Harald Barreiro Megino] 11:42:47
I'm they differentiate themselves through in particular, like sustainability, and the usage of the renewable energy They are also much more affordable than Google, But they how are also not a full blown cloud they just have a limited services and also

[Fernando Harald Barreiro Megino] 11:43:09
reliability probably depending on on how much renewable energy at the moment.

[Fernando Harald Barreiro Megino] 11:43:17
And so Cms is trying to. I've integrated them once.

[Enrico Fermi Institute] 11:43:18
Okay.

[Fernando Harald Barreiro Megino] 11:43:20
I'm already for some simple tests. So next slide

[Fernando Harald Barreiro Megino] 11:43:29
so for for outlaws. Then, coming to the question, What are the hey guys that are possible to execute on the cloud?

[Fernando Harald Barreiro Megino] 11:43:39
So we are integrating lately. Clouds like independent, complete, completely independent, and self managed sites with a storage element, that also compute for that is integrated in Panda the most What we have the most experiences.

[Fernando Harald Barreiro Megino] 11:44:01
With. Google. And we started in middle of tonight to run.

[Fernando Harald Barreiro Megino] 11:44:05
similar to our Us. The tool, size, the cluster, and we are running now.

[Fernando Harald Barreiro Megino] 11:44:11
10,000 calls and currently are limited to production workloads.

[Fernando Harald Barreiro Megino] 11:44:17
But that's just because we are reorganizing the the storage behind it, and we plan to enable them now this is in a couple of weeks.

[Fernando Harald Barreiro Megino] 11:44:29
the one thing that maybe you want to control is the amount of E address to to to to bring down the cost.

[Fernando Harald Barreiro Megino] 11:44:39
if you want to do that, the obvious choices to do that, run simulation.

[Fernando Harald Barreiro Megino] 11:44:44
But we are also now starting to experiment with full chain, where where you run the full, all of the all of the tasks within.

[Fernando Harald Barreiro Megino] 11:44:55
we can the simulation, same or production thing, and we don't export the intermediate products.

[Fernando Harald Barreiro Megino] 11:45:01
But just with, I'm just in the in the plot I wanted to show is but depending on the workload that you are running, you're egress costs. Come by a lot, and that's why we motivate this trying to keep and then the other thing.

[Fernando Harald Barreiro Megino] 11:45:24
That we have been experimenting in, the cloud is our announced facility type of setups.

[Fernando Harald Barreiro Megino] 11:45:31
with elastic, scaling, so that we set up. I know this is facility with 2 bitter and tasks We keep the like the general components of the running on the cloud to a minimum and only scale, out and a lot of vms when they are requested by a user to

[Fernando Harald Barreiro Megino] 11:45:49
run a product does computation, and this is also a very suitable setup for for the cloud, because you just pay for the resources that you are using at the moment

[Fernando Harald Barreiro Megino] 11:46:03
then in the next slide

[Fernando Harald Barreiro Megino] 11:46:07
So this is the landscape of a close for the Cms.

[Fernando Harald Barreiro Megino] 11:46:12
I don't know if Keny wants to talk about it, or me, too.

[Fernando Harald Barreiro Megino] 11:46:16
Go through it

[Kenyi Paolo Hurtado Anampa] 11:46:18
yes, so in essence, back in 2,016, the Boston I haven't seen monthly.

[Kenyi Paolo Hurtado Anampa] 11:46:26
We done a little more to try different call providers to run production workloads, and at the what was done with the young gang team Did you record workloads, But basically shows that if we we can ron any kind of production workflows in the cloud and you can see

[Enrico Fermi Institute] 11:46:34
Okay.

[Kenyi Paolo Hurtado Anampa] 11:46:52
diagram. They're on the right, bye, and these are when the formula Facility wasn't standard.

[Kenyi Paolo Hurtado Anampa] 11:47:00
In order to get twice the number of resources that will be initially at from the global phone.

[Kenyi Paolo Hurtado Anampa] 11:47:06
So this is showing like a £150,000.

[Kenyi Paolo Hurtado Anampa] 11:47:11
hi! There! On top of the basically will be integrated the resources to kept out that that was also integrated will be gliding 3 as part of, and as as of today, we we we can't use it use this is that there is some work

[Enrico Fermi Institute] 11:47:34
Yeah.

[Kenyi Paolo Hurtado Anampa] 11:47:39
on to choose this. For example, specialized analysis workloads that depend on machine learning, inference.

[Kenyi Paolo Hurtado Anampa] 11:47:48
So there is some to at the

[Enrico Fermi Institute] 11:47:59
Okay.

[Kenyi Paolo Hurtado Anampa] 11:48:01
Utilize what gpus and to use drone different cloud providers.

[Kenyi Paolo Hurtado Anampa] 11:48:10
there is one in France, server, called treatons, that there is.

[Kenyi Paolo Hurtado Anampa] 11:48:18
There is that that was also integrated as part of Sonic, And do with that.

[Kenyi Paolo Hurtado Anampa] 11:48:25
You can. The third running analysis, the analysis pipeline, Both the machine learning springs through 3 times.

[Kenyi Paolo Hurtado Anampa] 11:48:37
cloud providers there, or give using cpus

[Enrico Fermi Institute] 11:48:41
I I can put some numbers in. I think they. They ran on 10,000 Cpu core.

[Enrico Fermi Institute] 11:48:50
There's 10,000 Cpu cores, and they rented a 100 Gpus and sped up the the workflow was running on the cpus by 10,%, so in that game you basically you invest a little bit in Gpus just speed up the calculation that runs on the on

[Enrico Fermi Institute] 11:49:05
the cpus, or third user 10,000, to how many? Gpus? 100, I mean, It's it's early. It's early work so hopefully, that ratio you can reduce that but that Was what they were testing

[Enrico Fermi Institute] 11:49:20
Okay? Or comments on landscape of cloud

[Enrico Fermi Institute] 11:49:30
Very much to bring out, otherwise we can move on to acquisition operation.

[Fernando Harald Barreiro Megino] 11:49:35
okay.

[Ian Fisk] 11:49:36
sorry I I have comments and it, and I I I needed. I was thought I was talking sorry.

[Ian Fisk] 11:49:42
This is Ian. So the general comment was, We have this issue about the egress charges which I've never, we don't ever seem to have as a solution, for, except not to export data.

[Enrico Fermi Institute] 11:49:43
Okay.

[Enrico Fermi Institute] 11:49:43
Okay, Got it.

[Steven Timm] 11:49:56
no, not so. There are agreements.

[Ian Fisk] 11:50:03
But the agreements are always things like it's if it's 15% of the billing charges, we won't like it.

[Ian Fisk] 11:50:09
There, there's there's ways to make it reduce.

[Ian Fisk] 11:50:11
But at fundamentally this is a This is a business practice that they do to lock, to do, vendor, lock, in, and they're not so.

[Ian Fisk] 11:50:19
Far at least, no one's been proposing to not do it.

[Ian Fisk] 11:50:21
And so we're always okay.

[Enrico Fermi Institute] 11:50:23
2 things: Lanceium does not have egress charges.

[Ian Fisk] 11:50:26
Okay.

[Enrico Fermi Institute] 11:50:27
So with the limitation that they we're still exploring and that's very early going.

[Steven Timm] 11:50:28
Pretty good.

[Enrico Fermi Institute] 11:50:32
But by design, at least what they're saying now. They don't charge egress.

[Ian Fisk] 11:50:37
Right.

[Enrico Fermi Institute] 11:50:38
And then, Fernando, you want to say something about this subscription.

[Enrico Fermi Institute] 11:50:41
What? That model is because I

[Fernando Harald Barreiro Megino] 11:50:43
I could to discuss that in the tomorrow during the Cloud session.

[Ian Fisk] 11:50:47
Okay.

[Fernando Harald Barreiro Megino] 11:50:48
But I mean, so basically the agreement. We have the with Google is it's a subscription agreement.

[Fernando Harald Barreiro Megino] 11:50:57
And that's basic that's like a flood rate.

[Fernando Harald Barreiro Megino] 11:51:00
You agree on a price on the amount of resources that are included.

[Fernando Harald Barreiro Megino] 11:51:03
I'm doing will not be touched. Like there is no meter on how much egress you have.

[Fernando Harald Barreiro Megino] 11:51:08
You would do, which is a a fixed price for your 15 months of

[Ian Fisk] 11:51:14
Yeah. Okay. So at the I guess the the question is, is the at the end of your 15 months, if you want to use the last month only to export your data and get out of the cloud that would be within the confines of the model is that a troops statement

[Fernando Harald Barreiro Megino] 11:51:32
was in.

[Fernando Harald Barreiro Megino] 11:51:33
Was in. As you are running jobs, the output is always exported, and that's what the we are always running The the eagles cost

[Ian Fisk] 11:51:40
Okay, Alright: Okay, it's it's I guess my my point is this is this is this is a fundamental problem, which is that we we can only use the essentially the cloud with a lot. Like Hpc: except that with Hpc: we propose for the data

[Enrico Fermi Institute] 11:51:42
Yeah.

[Steven Timm] 11:51:51
Yeah.

[Enrico Fermi Institute] 11:52:01
Good. Yeah, I mean, what it comes, I mean. But my opinion on the cloud is that the workforce, selection, and capabilities is not the issue yeah, because we we can do anything we want on the cloud it's just the machine you rent the question comes down How what's the cost?

[Steven Timm] 11:52:22
Great Great. Well, this one, you

[Enrico Fermi Institute] 11:52:23
And How do they structure the pricing price? What they want? 2 and 2 allowed to do, and in what way?

[Enrico Fermi Institute] 11:52:29
What are illustrations.

[Ian Fisk] 11:52:30
And and there's one and the other thing is my other point I want to make was one of the fundamental differences between sort of Hbc.

[Ian Fisk] 11:52:37
And cloud is that Hbc. Relies almost exclusively at at the leadership class on accelerated Gpu style.

[Ian Fisk] 11:52:44
Hardware, and that's and it's not the client.

[Ian Fisk] 11:52:48
Don't have them but that's the most expensive elements on the cloud, and it's because they depreciate so fast that the cloud providers need to recoup that cost in more in a shorter period of time.

[Ian Fisk] 11:52:59
They do for Cpu. Can you find that the the economics of the Gpu and the Cpu are different on the cloud

[Enrico Fermi Institute] 11:53:09
It's also structural. I I No, I'll I'll leave that comment because we do have the cloud focus there tomorrow.

[Ian Fisk] 11:53:15
Okay, right

[Enrico Fermi Institute] 11:53:15
We should not try to have all the discussions now let's have a comment for me.

[Enrico Fermi Institute] 11:53:20
Honest.

[Johannes Elmsheuser] 11:53:22
yeah, can just to follow up on on the egress right?

[Johannes Elmsheuser] 11:53:26
And so if you go one slide back to slide 11, right Fenando has a little bit of for breakdown there.

[Johannes Elmsheuser] 11:53:34
of the different costs, right and and it's always, I think, some fear, that egress is really humongous compared to what else right, but from what we are seeing, we to running, for example, on adwords.

[Johannes Elmsheuser] 11:53:47
And there doing physics, validation that the egress is not the overall driver here, unless you do really crazy stuff right?

[Johannes Elmsheuser] 11:53:57
So when you have a regular simulation task, egress is not dominant, and it's really the Cpu.

[Johannes Elmsheuser] 11:54:03
that you are scaling up, that is driving the cost.

[Johannes Elmsheuser] 11:54:06
Here it is obviously something that you are using with egress on top us.

[Johannes Elmsheuser] 11:54:13
You have to pay compared to Hpc: That's that's no no discussion here.

[Johannes Elmsheuser] 11:54:17
But it's also not humongous when when you compare everything in and have to fold everything in here right? I just want to make that statement, and I I think, we can discuss this in more detail than later in the dedicated cloud session

[Ian Fisk] 11:54:29
I would claim that I would claim that it was not humongous as long as you're in a very structured environment.

[Ian Fisk] 11:54:35
And you are. Be acting a predictable way that the date will be up to analysis, like, at least for us.

[Johannes Elmsheuser] 11:54:38
Yeah.

[Ian Fisk] 11:54:41
We had a user, So browse some data that we weren't expecting and ran up at $75,000 export bill in a month.

[Johannes Elmsheuser] 11:54:50
do I sure I mean that that is then the way how you structure your workflows Absolutely. I I fully agree.

[Johannes Elmsheuser] 11:54:57
So, if if you have an agreed workflow there, and here we we are showing production that that's totally clear, right? And you don't want to have the surprises from some unstructured use, analysis, fully agreed

[Enrico Fermi Institute] 11:55:14
Is there a comment from Paul

[Paolo Calafiura (he)] 11:55:16
yes, I mean I I I I feel I'm becoming and becoming like a broken record.

[Paolo Calafiura (he)] 11:55:25
But once again I think this slide shows you the benefits of committing versus versus taking, You know, a handslink approach.

[Paolo Calafiura (he)] 11:55:34
So We have always said that the the cloud is a great way to, you know, to do.

[Paolo Calafiura (he)] 11:55:39
Excel computeing, like the slide at the bottom kind of suggests, you know, without, you know, when we need something for doing analysis, we will use it.

[Enrico Fermi Institute] 11:55:39
Hmm.

[Paolo Calafiura (he)] 11:55:49
And then our our loads will be will be elastic, and that's what's expensive.

[Paolo Calafiura (he)] 11:55:55
But what? Of course, Once again take the point of view of the band, or one of the vendor ones. Yes, and they want to to lock you in and lock you in, and not necessarily with some evil evil of mechanism, but just by offering you a good subscription deal so that you take some of the money, that

[Enrico Fermi Institute] 11:55:55
Okay.

[Enrico Fermi Institute] 11:56:11
Yeah.

[Paolo Calafiura (he)] 11:56:13
you otherwise would spend on your own. Hard, do it, and give it to them, That's and so there is a lot.

[Paolo Calafiura (he)] 11:56:19
There is a lock in there, because, of course, the price is constant for 12 months or 50 months, but it can change from one year to the next, and it will be as if, as it should so, you you are logged in because, then you don't have anymore, let's say all of your pr one or

[Paolo Calafiura (he)] 11:56:38
tier, 2 hardware, and then you are locked in with them.

[Enrico Fermi Institute] 11:56:44
kosher

[Kaushik De] 11:56:47
yeah, coming back to the other point, I'm sure it will be discussed tomorrow.

[Kaushik De] 11:56:55
During the dedicated session also, but since it came up, the issue of hydrogen at the in the cloud the heterogeneity is actually extremely useful and extremely good.

[Kaushik De] 11:57:09
In the cloud we are using both Amazon and Google for studies with Fpgas with arm, with Gpus, and there is no in setting up those resources because they're already available in the Cloud so I think the usefulness of highly specialized

[Kaushik De] 11:57:41
hardware at minimal minimal cost, because we don't pay for setting them up in the cloud.

[Kaushik De] 11:57:47
They're already there, but we can go in there, and we can use them, and that is an enormous resource for experiments, because I mean, if we had to set up our own Fpga Farm or arm, phone or or Gpu farm in order to I do some of the studies

[Kaushik De] 11:58:03
it Yeah, be private differently expensive.

[Ian Fisk] 11:58:07
right, and and and I didn't mean to imply that there wasn't real value in the diversity of resources on the cloud.

[Ian Fisk] 11:58:14
I was only commenting that at the production scales that we can can become very expensive

[Enrico Fermi Institute] 11:58:25
Are coming from Fernando

[Fernando Harald Barreiro Megino] 11:58:27
yeah, it's that question. And so again, now about the egress cost.

[Fernando Harald Barreiro Megino] 11:58:34
So. There is always so legend that if there is appearing between, let's say I thought, provide on.

[Fernando Harald Barreiro Megino] 11:58:43
For example, Yes, net. You can bring down the egos cost, and I wanted to ask if that's really true, or just something that we had.

[Fernando Harald Barreiro Megino] 11:58:55
But no one really about it

[Enrico Fermi Institute] 11:59:01
Okay, I think we're gonna definitely have some dedicated time to to talk about that on on Wednesday.

[Enrico Fermi Institute] 11:59:07
I know Dale is gonna have a slide or 2 for us, and and maybe we move that question to Wednesday specifically, unless somebody wants to jump in right now.

[Fernando Harald Barreiro Megino] 11:59:17
okay.

[Enrico Fermi Institute] 11:59:21
comment from what it

[Alexei Klimentov] 11:59:22
okay, So my comment is related to comments from in and Paulo, where different comments.

[Alexei Klimentov] 11:59:30
So I can disagree, but we use clouds as Hpcs, so we use clouds on completely different ways.

[Alexei Klimentov] 11:59:40
This whole idea to try close. Is that what was written on this slide that we can elastically scaling resources.

[Alexei Klimentov] 11:59:48
So we can have this difference of resources, and we can build our own architecture at least.

[Enrico Fermi Institute] 11:59:50
Yeah.

[Enrico Fermi Institute] 11:59:53
Excellent.

[Alexei Klimentov] 11:59:57
But Greek. What we have, if especially with Lca.

[Alexei Klimentov] 12:00:02
Then you have boundary conditions. When this machine was built, as it was mentioned correctly, not for Hp.

[Alexei Klimentov] 12:00:09
But for some our domains, and for Paul or my colleague, we have a cloud.

[Alexei Klimentov] 12:00:19
What we have is many years of all experience. I don't think it is the right way to mirror.

[Alexei Klimentov] 12:00:26
our understanding of commercial companies to what we are doing with calls right now, so certainly they want to make money.

[Alexei Klimentov] 12:00:34
But we're not so stupid, and we are not so stupid to stop our tier.

[Alexei Klimentov] 12:00:38
2, and to use just calls, and the whole idea of the 15 months project of bottles is just to learn it better.

[Alexei Klimentov] 12:00:47
So I think we are on very early stage with clouds and understanding that you know cost model, and how it can be integrated with our agreed model.

[Enrico Fermi Institute] 12:00:50
Okay.

[Alexei Klimentov] 12:00:59
With my 2 comments

[Enrico Fermi Institute] 12:01:05
So it is 11. We're going to have a short present presentation.

[Enrico Fermi Institute] 12:01:11
From? What are you out there

[Wahid Bhimji] 12:01:13
Yes. Hello!

[Wahid Bhimji] 12:01:18
Yeah, hold on. I'll just move my room right now.

[Enrico Fermi Institute] 12:01:18
yeah, okay.

[Wahid Bhimji] 12:01:22
I'm just gonna move into a meeting room

[Enrico Fermi Institute] 12:01:25
So you had your workshop. Now, where are you discussing this 11

[Wahid Bhimji] 12:01:29
just 10, Yes, Yeah, we don't quite that far ahead.

[Enrico Fermi Institute] 12:01:30
Oh, that's 10. I got the number

[Wahid Bhimji] 12:01:36
Yeah, So it's good good timing to have this conversation.

[Wahid Bhimji] 12:01:39
Actually So yeah. So I have a few slides.

[Wahid Bhimji] 12:01:46
I don't. I don't necessarily need to talk to them.

[Wahid Bhimji] 12:01:50
I wasn't sure if you wanted slides or not.

[Enrico Fermi Institute] 12:01:56
you want to share? Can you allow sharing, or the below

[Enrico Fermi Institute] 12:01:59
Are you allowed to share

[Wahid Bhimji] 12:02:02
Yeah, I think so. Well, it hasn't hang on.

[Enrico Fermi Institute] 12:02:03
Oh!

[Wahid Bhimji] 12:02:05
I'm just

[Enrico Fermi Institute] 12:02:06
Great

[Wahid Bhimji] 12:02:09
Let's see, Does that work? You see some window? Yeah, let's see if slide show mood messes it up.

[Enrico Fermi Institute] 12:02:17
Yes, yes.

[Enrico Fermi Institute] 12:02:20
Right.

[Wahid Bhimji] 12:02:23
so. I mean, this is actually just based on some slides I did show at the Cce meeting, or Debbie showed them.

[Wahid Bhimji] 12:02:29
So there's no particular news, hey, here, but just to share, like, just to set the context.

[Enrico Fermi Institute] 12:02:34
Thank you.

[Wahid Bhimji] 12:02:36
And then we can just talk about, you know, whatever you want to talk about.

[Wahid Bhimji] 12:02:38
I guess so. This is the current state. Oh, of nurse system.

[Wahid Bhimji] 12:02:46
So this shouldn't actually save P. One. Now we have the full of perimeter, both the a 100 accelerated Gpu notes, and Cpu only nodes.

[Wahid Bhimji] 12:02:59
but this is still not quite yet in production, as was mentioned briefly earlier, We do have some file system problems in the last stage of kind of upgrading them to use this new singshot high-speed, interconnect there's been a few snags I guess so but

[Wahid Bhimji] 12:03:15
those are being resolved, and I'd say it's probably within a month of being at the point of 40 fully available in production.

[Wahid Bhimji] 12:03:25
Kind of mode, and, as you probably know, we're gonna we've so far it's been in and early science kind of free, mode, where you don't have to use your allocation in in order. To use.

[Wahid Bhimji] 12:03:35
It. But that's coming soon, and then we still have Corey in production, and that is the the main production machine at the moment, and the goal is to retire that at the start of next Yeah, yeah, pending Pomato, actually being fully in production.

[Wahid Bhimji] 12:03:58
and so, yeah, there's just a comment here that assign that we do, you know.

[Wahid Bhimji] 12:04:06
Look at what user requirements are, while in order to get increase computing resources, you know, it is necessary to move to accelerated notes, as the only way we could offer the kind of increasing performance we need from this machine over the previous machine we do recognize that many communities are not ready for using Gpus for all

[Wahid Bhimji] 12:04:26
of their workload. And so that's why there are.

[Wahid Bhimji] 12:04:29
Cpu only nodes that actually provide all of the capable ability of Corey.

[Wahid Bhimji] 12:04:35
In the these notes. Okay, Yeah, So that's system.

[Wahid Bhimji] 12:04:41
This is a bit more of kind of where we're going.

[Wahid Bhimji] 12:04:43
We're only gonna have boma to. So there's a bit more detailed on the Cpu notes here, And then, just to say it was on the previous slide as well.

[Wahid Bhimji] 12:04:51
But as these file systems that made available, and also, we do put a kind of focus in having connections with external facilities, including other Hpc centers as well, as you know, science for facilities

[Wahid Bhimji] 12:05:08
Okay, And then there's been, you know, we've showed this many times that we had this super facility project, And this was really about trying to improve the engagement with data intensive workloads that also need workflow services running alongside that so we have an infrastructure that's

[Wahid Bhimji] 12:05:23
kubernetes based for services on the side we have.

[Wahid Bhimji] 12:05:27
You know, we put focus in things like Jupiter notebooks that can also run on the big machines, and we're really pushing for federated identity.

[Wahid Bhimji] 12:05:35
I mean, that's kind of rolled out now that you can use credentials from of the places to access Nesk.

[Wahid Bhimji] 12:05:43
Assuming you have an desk account now, so you kind of put the 2 and hopefully that will be pushed out, and that's come.

[Wahid Bhimji] 12:05:50
One of the months later. As part of this infrastructure, integrated research infrastructure task force which is trying to really get kind of cooperation across different centers for these these.

[Wahid Bhimji] 12:06:04
Things. So that's just the example with, you know.

[Enrico Fermi Institute] 12:06:07
Please.

[Wahid Bhimji] 12:06:07
Have type. Workflow Lz. They make you know that we are the primary center for them, and they only center in the Us.

[Wahid Bhimji] 12:06:16
So They really have to have all aspects of their workflow working well and desk, and takes a lot of engagement to achieve.

[Wahid Bhimji] 12:06:26
I guess this is saying, okay. So we we engage with scientists in lots of ways.

[Wahid Bhimji] 12:06:32
So there's a kneesap program, and you know, and listen.

[Wahid Bhimji] 12:06:35
Cms. Are both parts of that that help with, you know, can help provide resources to to help to new architectures, and also to explore Ai methods, which is really also a way of using Gpu resources Bo as well.

[Wahid Bhimji] 12:06:51
As having it same benefits in terms of transformative change to the way science works, And then we also have the superivity project that is trying to build more workflow stuff so in the future nose turn that I'm just mentioning We have a workshop about Now, internally that

[Wahid Bhimji] 12:07:08
we're trying to. It has achieved CD 0.

[Wahid Bhimji] 12:07:10
So that means there's a mission need for it. Then we're really putting together an Rfp.

[Wahid Bhimji] 12:07:15
Now, which will go out to vendors to kind of bid for a machine here to provide us with the machine.

[Wahid Bhimji] 12:07:21
So that's the stage. It's at and and part of the way this has been phrased.

[Wahid Bhimji] 12:07:25
The mission need is that that we need a machine to support workflow rather than just applications.

[Wahid Bhimji] 12:07:32
So I think that helps the experimental hep community as well, And then I briefly mentioned this thing: The integrated research, infrastructure effort.

[Wahid Bhimji] 12:07:41
That is another. Do we wide effort to to build workflow technologies and and support different sentences?

[Wahid Bhimji] 12:07:51
I guess this is just the there's 10 mission statement here.

[Wahid Bhimji] 12:07:56
Probably there's nothing new you there, and this is just staying again that we expect this machine to really stretch out into Es net and other places, and and provide, you know, way people can run stuff using data from outside.

[Wahid Bhimji] 12:08:13
then I just briefly wanted to mention the these. Yes, sure.

[Enrico Fermi Institute] 12:08:16
That's a good question on that slide. So that means essentially streaming.

[Enrico Fermi Institute] 12:08:23
Then also streaming in and streaming out

[Wahid Bhimji] 12:08:25
Yes, So that that comment was made earlier. And there are various use cases not just tep who want to do their including the light sources like you mentioned.

[Wahid Bhimji] 12:08:37
so we do anticipate supporting that better in principle.

[Wahid Bhimji] 12:08:42
It should be already much better on permanent than it was on Corey.

[Wahid Bhimji] 12:08:44
I mean, yeah, don't mention the problems we've had on Corey, which really are never being properly resolved Poem. It already.

[Enrico Fermi Institute] 12:08:46
Okay.

[Wahid Bhimji] 12:08:53
Should have better capabilities to do this

[Enrico Fermi Institute] 12:08:59
Okay, Great: Thanks.

[Wahid Bhimji] 12:09:02
Okay, this is just a couple of sides of context as well about you know, the landscape as a whole is getting increasingly challenging with heterogeneity in some ways, there may be advanced from this is the grayson and video grace hopper architecture which has cpus and

[Wahid Bhimji] 12:09:20
G. If you use with, you know with you know, access to memory with them.

[Enrico Fermi Institute] 12:09:22
Yeah.

[Wahid Bhimji] 12:09:26
So in in some sense, this could reduce on data movement costs, and so in some make this easier to program than current architectures.

[Wahid Bhimji] 12:09:36
But, on the other hand, this grace is a is an arm Cpu, so there's some, you know already, some differences.

[Wahid Bhimji] 12:09:42
There and then. There's also this move to its triplets.

[Wahid Bhimji] 12:09:48
Amd, for example, having all kinds of different calls on, there.

[Wahid Bhimji] 12:09:50
that's Dpu: So programming and network.

[Wahid Bhimji] 12:09:54
And then there's all these Ai hardware, specific architectures, and then a bit longer term There's the idea of processing in storage, and there's also a move we see on the nurse 10 time frame just kind of coming in towards disaggregation which

[Wahid Bhimji] 12:10:09
potentially allows more efficient use of resources. So this is the idea that you could have a a disaggregated memory pool which gives you increased memory capacity, but not on the note, So you would you'd be incorporating memory from outside the note but that means that people who need

[Wahid Bhimji] 12:10:27
much higher memory capacity would be actually be able to access that without I was having to buy kind of that in every single node So there's opportunities here.

[Wahid Bhimji] 12:10:37
But also quite complex landscape. And then you know, there's this rise of the cloud market that really is driving everything so you know, it's very lightly that so this is an opportunity of course because we can capitalize on all this investment going into cloud interfaces and so forth but it means

[Wahid Bhimji] 12:10:56
that you know we also have to recognize that in the kind of machines that we have access to so, and we can expect that these interfaces will become the standard way of accessing machine.

[Wahid Bhimji] 12:11:11
So. So this is also good. I think it means that if you use these cloud interfaces, then there's, you know, probably a good expectation that these should be what we.

[Wahid Bhimji] 12:11:25
Should we should definitely work with the other compute centers to make sure these are well supported at the very compute centers

[Wahid Bhimji] 12:11:34
and this is just one slide on, I mean, since this was the Hpc.

[Wahid Bhimji] 12:11:38
And Cloud workshop. I just thought about this like that. You know.

[Enrico Fermi Institute] 12:11:39
Yes.

[Wahid Bhimji] 12:11:41
We're already kind of using that. In there, as mentioned in the spin services that set on the side.

[Wahid Bhimji] 12:11:46
But we're increasingly seeing a tighter integration to the main system.

[Wahid Bhimji] 12:11:51
And so I expect on Nurse 10 there'll be an increasing ability to use cloud type interfaces to access the big supercomputing resources as well.

[Enrico Fermi Institute] 12:12:04
Okay.

[Wahid Bhimji] 12:12:04
No, it's good. At least. Okay. So I think that's all I really had.

[Wahid Bhimji] 12:12:10
This one is just about also data management. I think we see also having an increased role here, in the nurse town timeframe, which I think should also open its community But again, and then this is probably a general point without a thought as the discussion was going on earlier that we do have to cater for a very

[Wahid Bhimji] 12:12:29
wide community. So that's one of the maybe disadvantages we have compared to the leadership computing facilities that we do try to support different user communities.

[Wahid Bhimji] 12:12:39
But we have, you know, thousands of users and several 100 projects that have different needs.

[Wahid Bhimji] 12:12:46
Some of them are traditional. Hpc Center Hpc projects so they need, you know, tightly coupled, large scale resources.

[Wahid Bhimji] 12:12:54
some are more similar to experimental help, but have their own.

[Wahid Bhimji] 12:13:00
You know their own ways of doing things there like a little bit different to how experimental help is doing it And so we have to kind of come to some sort of balance of supporting all of these.

[Wahid Bhimji] 12:13:12
Okay, Okay, I think that's me. Yeah, any, any questions.

[Enrico Fermi Institute] 12:13:18
Thanks for you. I have one question. I think you mentioned that the nurse 10 is going to have a lot of you know, accelerators, for performance and things like that.

[Enrico Fermi Institute] 12:13:29
Yeah, what do you do? You guys have any feeling for what the mix will be?

[Enrico Fermi Institute] 12:13:34
Of of accelerators and cpus. And the next machine.

[Wahid Bhimji] 12:13:39
Oh, well, we don't! And we're having that discussion So so one thing and and this, this this other things that might come into play here as well, because so I mean, I think you can guarantee that there will be some gpus in this machine pretty pretty.

[Enrico Fermi Institute] 12:13:41
Okay.

[Enrico Fermi Institute] 12:13:41
Yeah.

[Wahid Bhimji] 12:13:54
Much realistically. That will be the most slightly, you know, generally usable accelerator.

[Wahid Bhimji] 12:13:59
That's right. Today. Then I mentioned there was these disaggregation technologies, and also several of the vendors are talking about like multi-tenancy, and so forth.

[Wahid Bhimji] 12:14:10
So. It is possible that you know one could run the Cpu only workload a lot died without any dedicated cpu, any nodes, so that would be a judgment on whether that technology really allows that and whether it that would provide sufficient, resources.

[Wahid Bhimji] 12:14:31
So those codes that was super gpu heavy, and accelerated would like leave enough of the Cpu, too, allow other jobs to run on there that you, Cpu, only but anyway, it is that certain part of the community even on the 2026 time scale won't be ready

[Wahid Bhimji] 12:14:50
for accelerated only so, you know, they will continue to be some Cpu resource, and then on the more exotic accelerators, I think it it is likely that we will have Yeah, Well, we will in the rfp have some place that people can which ai

[Enrico Fermi Institute] 12:14:52
Okay.

[Wahid Bhimji] 12:15:10
accelerators. For example, you know whether those are offering a significant benefit above Gpus, I don't think is yet clear.

[Wahid Bhimji] 12:15:20
A minute. I don't think they particularly are now, but they may do.

[Wahid Bhimji] 12:15:24
On the 2026 time scale. But there the Ai workload, you know, is currently not a very big fraction of what we're running, and so it would, you know, have to be sized, accordingly and I would say, on.

[Wahid Bhimji] 12:15:38
The integration with cloud. We're also looking at. As the point was made earlier, There's a huge variety of technology on the cloud, and even though we we tried to deploy cutting edge technology you know, Obviously, there quicker, to deploy various new technologies, so it.

[Wahid Bhimji] 12:15:53
May be that we can, You know, partner, with cloud providers to provide some of this capability for experiments and that particular workloads that need to run on different accelerators

[Enrico Fermi Institute] 12:16:11
But he would it be fair to say that we shouldn't expect significant scale up of the Cpu because I look at what Corey, the pro mode the Cpu basically stayed pretty much flat, one less because you the cpu fraction of parameters it somewhat equivalent of performance

[Wahid Bhimji] 12:16:21
Hmm.

[Enrico Fermi Institute] 12:16:30
to to what what we had on Corey. And just because of power budget reasons, I I wouldn't expect that, like 10 gives us 3 times the cpu.

[Enrico Fermi Institute] 12:16:40
That's in problem. I don't know. Probably most of yeah, okay.

[Wahid Bhimji] 12:16:41
right.

[Wahid Bhimji] 12:16:42
Right? Yeah, I think that's in terms of Cpu only resources.

[Wahid Bhimji] 12:16:47
I think that would be a reasonable expectation

[Enrico Fermi Institute] 12:16:50
Good.

[Enrico Fermi Institute] 12:16:58
Other questions for what you more short-term technical one.

[Enrico Fermi Institute] 12:17:06
So for data transferring out globus is not the be all at end.

[Enrico Fermi Institute] 12:17:10
All for Lhc. I know that there was some work to do, something with X room

[Wahid Bhimji] 12:17:17
Yeah, So that's still ongoing. I mean.

[Enrico Fermi Institute] 12:17:20
How's that? How's that going

[Wahid Bhimji] 12:17:23
Well, I I mean, I think yeah, we still working on it, right?

[Wahid Bhimji] 12:17:28
I mean, it's got a bit slower now, but I think we are trying to do that, and I think it would particularly of both Atlas and Cms can use the same interface and also other.

[Wahid Bhimji] 12:17:39
You know, help experiments, and even potentially, the light sources.

[Wahid Bhimji] 12:17:42
Then it's something worth us putting effort into support. I also think we need to.

[Wahid Bhimji] 12:17:49
So at the moment the spin kind of these containerized services haven't been like optimized for using for data management, services, but I think that's another thing that we should be able to support in the longer run that will add up people to run all kinds of different things on that side I mean globus

[Wahid Bhimji] 12:18:08
is for us the best, you know. Multi. It was supported by the most number of other communities that it's really worth us putting an effort into support.

[Wahid Bhimji] 12:18:20
But yeah, I do appreciate that. Not everyone uses it.

[Wahid Bhimji] 12:18:23
And so we do need other other things. I did have a brief chat.

[Wahid Bhimji] 12:18:26
I saw I am Foster actually conference a couple of weeks ago, and so that did have a brief chat with him.

[Wahid Bhimji] 12:18:34
I think power is always well, but about about ways we can maybe improve global and D kind of interoperation.

[Wahid Bhimji] 12:18:43
but that was no more than a chat at this point, but he seemed open to more discussions on that front

[Enrico Fermi Institute] 12:18:52
And it's probably not, for with this talk back going, we can chat later on that we're we're like.

[Enrico Fermi Institute] 12:18:59
Technically, we were stuck on Thursday things. But it's yeah for we can talk over.

[Enrico Fermi Institute] 12:19:07
One should be in time. Okay, other questions for anybody else.

[Enrico Fermi Institute] 12:19:16
Anybody on zoom

[Enrico Fermi Institute] 12:19:24
By the way, just to we had this plan for the afternoon, for the Hpc focus area but due to the ongoing workshop there was a little bit of a scheduling conflict here.

[Enrico Fermi Institute] 12:19:35
So we okay, alright.

[Wahid Bhimji] 12:19:35
Yeah, so I won't be around in the afternoon. So if you wanna attack me, you should do it now and but yeah, we'll be interested in also seeing the blueprint.

[Enrico Fermi Institute] 12:19:42
considering.

[Wahid Bhimji] 12:19:45
As well once you have it, or whatever, because I think that will help, you know, as was mentioned

[Enrico Fermi Institute] 12:19:49
I'm not. I mean the level. It's probably not gonna be fully public, but there might be a version of it that's going to be public.

[Wahid Bhimji] 12:19:57
Right? Yeah, I mean again for influencing kind of architectural decisions.

[Enrico Fermi Institute] 12:19:57
We'll have to see what

[Wahid Bhimji] 12:20:02
I mean, it's really when we're evaluating the Rfp.

[Wahid Bhimji] 12:20:05
And stuff that we can bring in these considerations

[Enrico Fermi Institute] 12:20:09
So? Are you looking at things like the very low power course like our?

[Wahid Bhimji] 12:20:14
Yeah, I mean, you know, in video, want to sell you this and this great hopper architecture now.

[Wahid Bhimji] 12:20:22
So they're setting arm cpus with the so at least with the Gpu.

[Wahid Bhimji] 12:20:28
So, at least for the Gpu accelerated notes.

[Wahid Bhimji] 12:20:31
If they're Nvidia, then it would they would be on, and they also sell Cpu only, or will do so

[Enrico Fermi Institute] 12:20:39
That's which

[Wahid Bhimji] 12:20:43
but you know again, it depends on the workload, and if for the Cpu only nodes, I mean, given the communities involved, you know many of them are not that flexible.

[Wahid Bhimji] 12:20:53
So it may be that that doesn't really make sense for Cpu.

[Wahid Bhimji] 12:20:57
Any notes to have on

[Enrico Fermi Institute] 12:21:04
Okay, Anything else.

[Enrico Fermi Institute] 12:21:08
Okay, what do you Thank you so much for attending? Appreciate the presentation

[Wahid Bhimji] 12:21:10
Thanks anyone.

[Enrico Fermi Institute] 12:21:19
So slides again. I need to share them.

[Enrico Fermi Institute] 12:21:32
Do you want to do these? Or here? I can go through them.

[Enrico Fermi Institute] 12:21:37
So one other thing questions on the charge was what metrics should be used to decide with a work for this executed efficiently, both in how we acquire the resources and and also then how we operate the work through zoom is it different is it efficient to get a certain resource to basically spent the effort to to get

[Enrico Fermi Institute] 12:22:01
it That's a cost to get it, and then actually to run one out, work through science and acquiring in this context means 2 things: one and like one, is is to actually get access to the resources which on Hbc: And Cloud we're talking about the proposals, and hpc competitive proposals, where you put

[Enrico Fermi Institute] 12:22:22
it. Usually this year at the moment, is his yearly proposals.

[Enrico Fermi Institute] 12:22:27
You have to follow a certain procedure of every Hbc.

[Enrico Fermi Institute] 12:22:29
Facility is different. The exceed slash access There's like an umbrella organization where you can ask for time on multiple facilities.

[Enrico Fermi Institute] 12:22:37
These in one proposal, but others are unique for you.

[Enrico Fermi Institute] 12:22:43
One facility, and on cloud is either you just pay a you go, pay whatever this price, or on demand, or spot, or preempt.

[Enrico Fermi Institute] 12:22:53
So whatever the instruct is called, but is publicly available.

[Enrico Fermi Institute] 12:22:55
Pricing to everyone If you show up the credit card you can get it, or like what athletes is doing right now.

[Enrico Fermi Institute] 12:23:04
A subscription based on a negotiation, basically commit, we commit to a certain amount of money, and you get a certain block of resources with limitations and rules, how you can use them and how things are going on and the second part of the acquiring part is the extra provisioning once someone gives you

[Enrico Fermi Institute] 12:23:23
the key. Basically, here are the resources you actually have to figure out.

[Enrico Fermi Institute] 12:23:27
How do you actually tire them into our systems so that you can make use of them?

[Enrico Fermi Institute] 12:23:34
So at the Hpc. Level. It's like things like batch, Hbc: badge, queues unit of provisioning a number of notes scheduled Our policies all of that comes into play because it's all different than what we I used to on our own resources.

[Enrico Fermi Institute] 12:23:51
That we own where we have a fixed quarter. We say you get 4,000. Course.

[Enrico Fermi Institute] 12:23:55
Okay, There might be a 20 four-hour way to test me back to other people.

[Enrico Fermi Institute] 12:24:01
But eventually, if you provide a stable, I'm basically a sufficient amount of work will always give you 4,000 courts.

[Enrico Fermi Institute] 12:24:09
That's different on the Hpc. It's like you don't have any guarantees there.

[Enrico Fermi Institute] 12:24:14
And then cloud cloud is less problematic in terms of provisioning, because you you pay over your money, but you still have depending on what pricing model and what rules you follow.

[Enrico Fermi Institute] 12:24:26
You can still have to deal with contention with insertion.

[Enrico Fermi Institute] 12:24:29
Certain regions side which depend on the size of the regions.

[Enrico Fermi Institute] 12:24:35
Activity of other customers, What certain instant types you request, and so on.

[Enrico Fermi Institute] 12:24:40
Yes, and then, once you have the resources and they are available, and you provision them and they're integrated.

[Enrico Fermi Institute] 12:24:49
Then you look at what metrics are interesting to determine whether you actually operate efficiently.

[Enrico Fermi Institute] 12:24:56
There standard one we use everywhere. Cpu efficiency gpu efficiency.

[Enrico Fermi Institute] 12:25:05
It's Basically, nothing. It's an open question. We We don't have anything that measures how efficiently we use the Gpu

[Enrico Fermi Institute] 12:25:14
On the cloud what it eventually comes down to is the dollar per event, or the dollar in paid per hsu 6.

[Enrico Fermi Institute] 12:25:21
You get. Hbc: there's no direct all right.

[Enrico Fermi Institute] 12:25:27
Cost associated. So the outlayers 0 in terms of monitory, but of course, is not free, and effort.

[Enrico Fermi Institute] 12:25:36
Then you look at overall utilization. So if you have a certain product number of cloud credits, or if you have a certain allocation size, are you using these up because he spent some effort to get them, so you should use them, like so subscription.

[Enrico Fermi Institute] 12:25:52
Model, If If Google lets you use 10,000 cores for free as part of the subscription, it doesn't make sense to only 1,000.

[Enrico Fermi Institute] 12:26:00
There is no benefit on a penalty for using up your phone.

[Enrico Fermi Institute] 12:26:04
Yes, the other thing is turnaround time. So now run Time, I mean, is provisioning.

[Enrico Fermi Institute] 12:26:13
Turn around this is, comes especially into Hpc: If you talk about Lcf: if you talk about unit of provisioning what's associated with that is also.

[Enrico Fermi Institute] 12:26:26
The latency is very. We are used to that with our normal grid operations.

[Enrico Fermi Institute] 12:26:33
I'll see if you ask for a 1,000 notes. You can wait and you have no idea when you're gonna get it.

[Enrico Fermi Institute] 12:26:39
Eventually you'll get it It's not under your control.

[Enrico Fermi Institute] 12:26:44
and for all these metrics is how are we gathering them?

[Enrico Fermi Institute] 12:26:48
Okay, I mean our own resources. We have services in place. We have many years of preparation, so they have to see.

[Enrico Fermi Institute] 12:26:56
Just get them to forward information, to collect it. Hbc.

[Enrico Fermi Institute] 12:27:00
Cloud is different, especially Hbc. On the cloud. You can run whatever.

[Enrico Fermi Institute] 12:27:04
But Hbc is problematic. You need to collect statistics from the best use from the job system and so on.

[Enrico Fermi Institute] 12:27:13
I'll be how you forward it. So it's actually collected in the right place, and you can compare it.

[Enrico Fermi Institute] 12:27:19
You have phones. Texas? We got a question. Yep.

[Enrico Fermi Institute] 12:27:24
yes.

[Ian Fisk] 12:27:25
yeah, I had a question which was about the concept of You have nothing about Gpu efficiency.

[Ian Fisk] 12:27:31
If you have nothing but deep efficiency, it's just you haven't asked the the the Gpus themselves monitor.

[Ian Fisk] 12:27:38
Their utilization. Very well. Command is Nvidia Smi. They will tell you how much the memory and how much the theoretical processing capacity you're using

[Enrico Fermi Institute] 12:27:49
I'm nuts. I'm not saying that is nothing that you can run on a Gpu to tell you to what degree it's utilized.

[Enrico Fermi Institute] 12:27:56
What I'm saying is, I don't think we have any tool where, with our Gpu workforce, where we actually record this information, and keep track of it.

[Ian Fisk] 12:27:56
Okay.

[Ian Fisk] 12:28:06
Okay. But but at this point, in the same way that you that you record the Cpu efficiency with top you should. Simply There are tools that do exactly the same thing for Gpu

[Enrico Fermi Institute] 12:28:18
And

[Enrico Fermi Institute] 12:28:18
Yes, it just needs to go. Put it place and put into demonic traces, and it's it's more an indicator of how early we are in terms of adoption of gpu workflows in the experiments.

[Ian Fisk] 12:28:19
Okay.

[Enrico Fermi Institute] 12:28:33
Then It is an indicator of the lack of low level 2.

[Enrico Fermi Institute] 12:28:36
So all of the tools are there. It's just a matter of spreading it all through.

[Ian Fisk] 12:28:39
Yeah, I just this: This is the like, I can tell you.

[Ian Fisk] 12:28:43
I send a lot of email a week about people who are not using the Gpus especially.

[Ian Fisk] 12:28:47
Well, and so I It's probably something that should go early on It's a monitoring system, because it it.

[Ian Fisk] 12:28:55
It's not like it's not it's not.

[Ian Fisk] 12:28:56
It's hard to get

[Enrico Fermi Institute] 12:28:58
One thing

[Enrico Fermi Institute] 12:28:59
One thing conceptually, that that is not quite as mature as you know what these different things mean to be.

[Enrico Fermi Institute] 12:29:07
You're comparing cross sites.

[Enrico Fermi Institute] 12:29:12
How do you compare 1080 versus a a 100, or what it being maybe that example, you just round the the 1080 down to 0 have the problem solve.

[Enrico Fermi Institute] 12:29:22
But but trying to aggregate and cross. Compare to, could it?

[Enrico Fermi Institute] 12:29:29
I mean eventually one of the things you're asking here is if I shooting my money's worth.

[Enrico Fermi Institute] 12:29:36
And to be asked in many different directions, including from the from the site.

[Enrico Fermi Institute] 12:29:41
I wanted to start. Do that. You start to accounting, and it's these sorts of things aren't as accepted as what

[Ian Fisk] 12:29:45
right, but I

[Ian Fisk] 12:29:52
Right, but they're like one of the reasons why we have Hs O.

[Ian Fisk] 12:29:57
6 was that we had a variety of cpus weren't sure what the performance was going to be between them and this in Benchmark and figure out the relative capacity each of the sites, and it's not intrinsically more difficult than that and just there there's a wider there's a much

[Ian Fisk] 12:30:12
wider variation in the performance of Gpus

[Enrico Fermi Institute] 12:30:17
We need a Hso. 6 for Gpus. Maybe

[Ian Fisk] 12:30:20
Maybe if you need just 26. But yeah.

[Enrico Fermi Institute] 12:30:25
Another thing is, once you collect this, it's it's not good to me that we know, I mean.

[Enrico Fermi Institute] 12:30:29
And Cpu efficiency. We kind of have an idea of what's bad I mean.

[Enrico Fermi Institute] 12:30:34
Usually it's bad when we get dinked by the review balls that look at our Cpu efficiency and tell us we you make bad use of the resources for you It's It's been. Unclear to me on the gpu side what is bad day I would say we don't have any clue

[Enrico Fermi Institute] 12:30:50
on the Cpu side, but we pretend we do. We have about the same amount of clue on We have even less, because from you know, architecture generation to architecture, generations.

[Ian Fisk] 12:30:59
right.

[Enrico Fermi Institute] 12:31:05
Of Gpus. Things are changing pretty wildly, and the performance basis is very, very different, for you know, a turing class chip versus Okay, So the conclusion is we need to learn to pretend to know what we're doing exactly we need to come up with, sufficiently.

[Enrico Fermi Institute] 12:31:22
Obsceneated language, so that we can sound like we know what we're talking.

[Enrico Fermi Institute] 12:31:25
About; the fact that it's 2,022 in our review boards.

[Enrico Fermi Institute] 12:31:31
Don't understand hyperthread. No, that suggests that 2,040, maybe we'll have Okay.

[Enrico Fermi Institute] 12:31:46
So I guess, or sorry. The 1 one thing I wanted to say is that I I, in terms of like even coming up with performance, benchmarks.

[Enrico Fermi Institute] 12:31:55
I don't know if it makes a lot of sense to compare how you're doing with a bunch of tennis versus how you're doing with a bunch of amperes or a a 100, or whatever, just because the way those peopleors work, is so different yeah that does

[Enrico Fermi Institute] 12:32:10
cut my point. Sorry Sorry we agree. Great! That's awesome.

[Enrico Fermi Institute] 12:32:15
you don't just tell me this next slide, but particularly when it comes to acquisition at some point, you have to feedback to the the powers that be how you spend the money whether it was funny money or real money and that starts to get into things like pledging and you know actually having

[Enrico Fermi Institute] 12:32:35
these resources effectively, more effectively, acknowledgment experiments.

[Enrico Fermi Institute] 12:32:42
Or we're gonna touch that third rail today, or like this is, there's an accounting, pledging discussion on on Wednesday morning, and part of benchmarking is part of that, because, yeah, as you said, the thing is this is Hbc as long as it's opportunistic.

[Enrico Fermi Institute] 12:32:57
no one cares. It's free resources. When people actually start pledging it.

[Enrico Fermi Institute] 12:33:02
Then then you get into comparing performance numbers, and are you meeting your pledge or you're not meeting your pledge, And then things like measuring these things correctly, or at least in the measuring them in the way that you come up with a defensible number Okay, but good one to 4 one interesting

[Enrico Fermi Institute] 12:33:23
thing I observe on the Wcg. Side this month is, especially as some some sites in Europe are saying that we, you know we can't keep.

[Enrico Fermi Institute] 12:33:33
The use running over the winter but we'd like to send, you know, have the same number of hours delivered, which, of course, courses not what we pledge on So I think there's gonna be more for interest and examining some alternate models where interest wasn't

[Enrico Fermi Institute] 12:33:51
before, But I I think we we really need to push.

[Enrico Fermi Institute] 12:33:56
having things like Hpcs. Quote a quote cap and not right right now.

[Enrico Fermi Institute] 12:34:03
The the value delivered officially to the experiments is rounded to 0, even though we know from the resource graphs, these are been delivered lots and months of in terms of events at that that's gonna break some point, and the fact that some of the traditional wcg

[Enrico Fermi Institute] 12:34:25
sites, or also hit the brakes on the old pledge Models might be as close as you could do in turn turn it into an option Yeah, The 2,021 see it the Crsg: and that the comp basically where we added, up what we actually delivered 2021.

[Enrico Fermi Institute] 12:34:44
Us Hbc was slightly above formula. Now, okay, there's the normalization.

[Enrico Fermi Institute] 12:34:49
Factors have larger Arabs. So it was basically comparable.

[Enrico Fermi Institute] 12:34:53
But but again right now at view from some angle, you're you're saying.

[Enrico Fermi Institute] 12:35:01
We delivered as much as Fermi lab did, but then the value was written to, 0 and because none of it quote official accounts.

[Enrico Fermi Institute] 12:35:09
And that's that's a problem. Now when the problem gets increased by bigger, Yeah, the question about the turnaround.

[Enrico Fermi Institute] 12:35:16
So the turnaround time with some set Hpc centers, allow you to reservations Where you plan ahead.

[Enrico Fermi Institute] 12:35:26
Does that change, Then some of these metrics, and then also the also the Is it also simple?

[Enrico Fermi Institute] 12:35:35
Simplify operationally the I can just tell you the experience we had.

[Enrico Fermi Institute] 12:35:44
We have used reservations for Cms and that's mainly for the reason that the type of work for we're sending does always work so we don't really care when it runs.

[Enrico Fermi Institute] 12:35:57
The I know that some of the neutrinos and science experiments, where they had a big specific production that you targeted in Hbc that they had scheduled They planned ahead makes perfect sense to do A reservation in that Scenario for us I don't see that it.

[Enrico Fermi Institute] 12:36:14
Would help as much, because the the turnaround time is not so much a problem in terms of

[Enrico Fermi Institute] 12:36:24
Basically not being able to plan work because most of our work we don't care if it runs this week or next week, or a couple of weeks later.

[Enrico Fermi Institute] 12:36:31
I mean, there's higher. There's high priority stuff, but we usually do it at soon, and then play with the prioritization.

[Enrico Fermi Institute] 12:36:37
The turnaround time is actually more an issue for us in our software stack, because the system is just not designed with with like a 2 week provisioning time, and all week provisioning the assumptions in there So this is more software.

[Enrico Fermi Institute] 12:36:53
Problem than then an actual work plan planning problem, movement.

[Enrico Fermi Institute] 12:37:00
It's still a useful metric to have, because it doesn't right.

[Enrico Fermi Institute] 12:37:05
So I mean, you cannot. If you need a week, you cannot put high priority stuff.

[Enrico Fermi Institute] 12:37:08
There that's that's relatively small invitation, because most of our work is not high priority, No matter if these things had, that's a different issue.

[Enrico Fermi Institute] 12:37:19
But most of the work is just just get it done. We come back later on a month and check that everything is done

[Enrico Fermi Institute] 12:37:30
Okay.

[Enrico Fermi Institute] 12:37:34
Other comments or questions from Zoom: Yeah, yes, Okay, there any questions that we didn't that we should be asking

[Enrico Fermi Institute] 12:37:51
I mean, just to go hit the the dead horse again. I think the the accounting porting of resources has to be a a top level item

[Enrico Fermi Institute] 12:38:07
That's got to be appropriate, zoom. So not a particularly interesting technical topic, Tv.

[Enrico Fermi Institute] 12:38:18
But that's okay. I mean, we we did talk a lot about this in in context of Hpc: Were there any specific comments folks wanted to make about this on cloud

[Enrico Fermi Institute] 12:38:33
We save some of that for the discussion tomorrow

[simonecampana] 12:38:36
sorry? Can I ask a question? This is the morning I It's following up on what Brian said.

[Enrico Fermi Institute] 12:38:38
Yep.

[simonecampana] 12:38:43
I think it would be interesting. In fact, if those resources that today are a bit special, they might not be in the future, could be accounted properly, which means basically being reported back.

[Enrico Fermi Institute] 12:38:45
Okay.

[simonecampana] 12:38:57
Who's that official accounting tools we use? Do you understand that?

[simonecampana] 12:39:02
So is. The problem is, I didn't get from the discussion is the problem technica.

[simonecampana] 12:39:07
It's well understood how to do it, but someone has to do the work because the vergeical ways, for example, of integrating Hpcs.

[simonecampana] 12:39:16
if you use an engine like, head cloud them, you can put some of the intelligence there, and the report upstream.

[simonecampana] 12:39:24
Your accounting records. But if you have something like a direct integration of the Hpc.

[simonecampana] 12:39:29
With the workload management system of the experiment. Like, for example, at the way Atlas is doing, you, You don't have that gateway.

[simonecampana] 12:39:39
You don't have that service you need in practice, Banda, or the workload management system, whatever.

[simonecampana] 12:39:46
That is for report upstream. So I think it's a good idea to look into that.

[simonecampana] 12:39:51
Do you have view of how to do it?

[Enrico Fermi Institute] 12:40:01
In terms of the typical pieces, I'm not so so worried. All right.

[Enrico Fermi Institute] 12:40:04
We've We've done this several times across multiple generations of technology.

[Enrico Fermi Institute] 12:40:09
So we we not like. It's the first time we we had to do an integration like that in the last 5 years.

[Enrico Fermi Institute] 12:40:15
Again. My My my worry is if we come in and say, you know, Oak Ridge delivered a 100 million Cpu hours to Atlas.

[Enrico Fermi Institute] 12:40:29
you know. How does that get contributed as part of a delivered resource to to the experiment? Yeah, how do we form I can say this, then, Does that help meet The us's commitments to the W Lcg: because right?

[Enrico Fermi Institute] 12:40:46
Now, it's it's a very different saying that.

[Enrico Fermi Institute] 12:40:48
Okay, a resource with W. To Gw: And that, did you?

[Enrico Fermi Institute] 12:40:53
Making that count, and that the official we have a touch that in 2 decades versus the technical mechanism to get an integer for pointing.

[simonecampana] 12:40:59
okay.

[Enrico Fermi Institute] 12:41:04
Yeah, we seem to reinvent that over 5 years or so.

[simonecampana] 12:41:08
No, I see. So it's basically a question of policy you are making, which is a good one.

[Enrico Fermi Institute] 12:41:11
Got it. It's

[simonecampana] 12:41:14
has to do a bit with the what the experiment considers the pledge resource, and a lot of the spgaments. We're considering the pledge of source something that they can use to run any workflow in a transparent way So I think, as long as one goes in this

[simonecampana] 12:41:29
direction you will get the buy-in from an experiment.

[simonecampana] 12:41:33
Otherwise there might be some discussions to have

[Enrico Fermi Institute] 12:41:37
Oh, I think there has to be some discussion, because you're I don't think for any, with an exception, Maybe the cloud.

[Enrico Fermi Institute] 12:41:46
And even then landscape is a good counter example, where it probably can't run any so.

[Enrico Fermi Institute] 12:41:52
but but to say that yeah, the experiment site a 1 billion Cpu hours, or again, just making up numbers on overage is worth nothing, because we can't run everything on there, I think, is pretty short-sighted but it is a very important discussion and you know what I find is policies that are

[Enrico Fermi Institute] 12:42:14
older. It's tend to be harder to update the fact that we have a really dug into this in 20 years means it's it's gonna take some effort to come to a a place where everybody is happy and feel that they're concerns are heard.

[simonecampana] 12:42:34
yeah, I have. I I see your point. I think there are a lot of problem.

[simonecampana] 12:42:39
Is that there is a large spectrum right? There are Hpcs that can be used almost for everything which you can say sort of a pleasant source.

[simonecampana] 12:42:46
There are Hbc's that can be used to run one generator.

[Enrico Fermi Institute] 12:42:50
Okay.

[simonecampana] 12:42:51
It's a bit short sighted to say that those are like any other facility.

[simonecampana] 12:42:56
So, and because the spectrum is broad is difficult to.

[Enrico Fermi Institute] 12:42:59
Please.

[simonecampana] 12:42:59
I agree with you is something that

[Enrico Fermi Institute] 12:43:08
Going back to the acquiring for use specifically for Hpc: So right now, you you mentioned that for Hpc.

[Enrico Fermi Institute] 12:43:18
Every you know, there are a couple of different kinds of proposals, suspicion types, and if it's leadership class or or user facility, so these still require proposals which each it each the border each year, or and and so forth, and and to me it seems like you can't, say anything you

[Enrico Fermi Institute] 12:43:39
can't tie that to any sort of pledge situation.

[Enrico Fermi Institute] 12:43:43
If you've got a proposal that has to be proved by a bunch of outside scientific, you know, Committee right?

[Enrico Fermi Institute] 12:43:51
There was something we had in affirmative action This I'm not on this.

[Enrico Fermi Institute] 12:43:58
But something came up. Lis mentioned something that they were high-level discussions within the agencies about better support for data sciences.

[Enrico Fermi Institute] 12:44:08
that was one of the area of discussion She didn't say anything, about.

[Enrico Fermi Institute] 12:44:12
At least there's discussions I don't know if it's to what extent that will go anywhere.

[Enrico Fermi Institute] 12:44:19
The one thing that with this here is nice proposal.

[Enrico Fermi Institute] 12:44:24
They've started out asking the last couple of years about special needs, and like multi year planning horizon.

[Enrico Fermi Institute] 12:44:31
And it seems this year they kind of they already know what they're going to give us for the next 3 years It almost sounded like the feedback but we still have to write a proposal, but they ask us to write a simple proposal so on the ns, on the nsf side we're

[Enrico Fermi Institute] 12:44:47
starting, to see, not not at like the biggest Frontera type skills.

[Enrico Fermi Institute] 12:44:53
Nsf: Start to give allocations as part of the yeah crypt proposal.

[Enrico Fermi Institute] 12:45:02
So if you want a proposal like Uscms ops it.

[Enrico Fermi Institute] 12:45:05
It comes with the allegation as opposed to having Soma other, you know.

[Enrico Fermi Institute] 12:45:10
Peer, review Committee. They basically do this as double jeopardy that they could give you the money and then have somebody else make you unable to. To.

[Enrico Fermi Institute] 12:45:24
Okay, So that that's beginning to go into the system, but not at the Us.

[Enrico Fermi Institute] 12:45:33
Lhc: Ops: skills. So you know, there's there's more than discussion.

[Enrico Fermi Institute] 12:45:39
There's actually a couple of examples of doing this at at modest scale, but not that the the biggest one

[Enrico Fermi Institute] 12:45:51
Not across the finish line, but you know it's starting to show up in solicitations and things like this.

[Enrico Fermi Institute] 12:45:57
We have provided this feedback to the to the age funding agencies.

[Enrico Fermi Institute] 12:46:03
Before I think of the 2,019 child meeting and counting, discuss there.

[Enrico Fermi Institute] 12:46:10
This is, it's difficult to write it a generic of okay, you know, for collaboration that has some broad mix of workflows that competitive against a specific.

[Enrico Fermi Institute] 12:46:26
You know scientific. They get compared by scientific marriage, right?

[Enrico Fermi Institute] 12:46:31
And so, and they're looking for specific specific outcomes like, What did you discover on this machine?

[Enrico Fermi Institute] 12:46:39
Because you know, because we awarded you this this allocation, and that's difficult.

[Enrico Fermi Institute] 12:46:47
If you're saying, Ok, we ran, you know, just a you ran a generic mix of simulation experiment for 2 that at least, on the Nsf. Side.

[Enrico Fermi Institute] 12:47:03
This is where they are looking to tie this to? Yeah, but you know you get get the allocation as part of the Usf Us or Usms operations It's it's just not you know obviously Then again time for this proposal, and rounding and still scaling up I think this has

[Enrico Fermi Institute] 12:47:27
to be addressed I don't know if this is in this book.

[Enrico Fermi Institute] 12:47:30
Prep process. You know, we need to have much dedicated to this, but this is an issue, I mean, this is

[Enrico Fermi Institute] 12:47:42
Spend a lot of time writing and writing proposals, and they get reviewed by by a committee.

[Enrico Fermi Institute] 12:47:47
And yeah, the Lcf proposals. Basically, you have to dress another.

[Enrico Fermi Institute] 12:47:52
You have to do. You have to. Basically, we actually tried that because we did 2 proposals this year. One was Gpu.

[Enrico Fermi Institute] 12:47:59
We construction on summit, and that was approved. Because if something new is something we haven't done before, so the other one we intentionally kept it.

[Enrico Fermi Institute] 12:48:12
We didn't dress it up. That was General Monte Carlo production on theta like.

[Enrico Fermi Institute] 12:48:17
Get us get some resources increase, do like 10% extra. Well, just stand up for the color production that was rejected.

[Enrico Fermi Institute] 12:48:25
And that's basically what I It was. I expected that because it's not exciting.

[Enrico Fermi Institute] 12:48:32
It's something you can do everywhere. They look at it, they say, Why are you on the Lcf.

[Enrico Fermi Institute] 12:48:37
You can do this, Okay, somewhere else And that's the tension with the with the pledged allocation where the pledge is supposed to be able to do everything. But and again. I think that's why it, has to be a major outcome.

[Enrico Fermi Institute] 12:48:49
Report. There is we We have to make the agencies realize experiments that the global collaborations are write down their contributions to $0 and 0 cents because they can get some of these.

[Enrico Fermi Institute] 12:49:06
You know, in a way that we can actually plan and get them to

[Enrico Fermi Institute] 12:49:14
And part of it's gonna be a shift on on the Wc.

[Enrico Fermi Institute] 12:49:17
2 side, I think. But I I think it. We also have to kind of throw some cold water and the agencies to make them real.

[Enrico Fermi Institute] 12:49:25
Wake up and sit up and realize. Oh, I'm not getting my my credit for the money, and because that, you know effectively, they're putting in money.

[Enrico Fermi Institute] 12:49:32
Have a getting no credit for the money, and he should be.

[Enrico Fermi Institute] 12:49:39
I'm mad about it, and but what we've talked about yeah, 5, 6 years, and not lunch Let's move on to the last slide before reporting future workflows.

[Enrico Fermi Institute] 12:49:53
just looking forward. What we need to restrict kind of clothes that we run on clouds on Hpcs when we want to.

[Enrico Fermi Institute] 12:50:00
5 years from now. Will it make sense for us to partition our workflows?

[Enrico Fermi Institute] 12:50:05
Will we be able to expect Hpcs to just run all types of jobs Clouds be able to do that?

[Enrico Fermi Institute] 12:50:11
It sounds like, for clouds, The answer is kind of yes already, but that remains to be seen.

[Enrico Fermi Institute] 12:50:16
If the Hpcs will be able to do that, you know what what technologies, features, or policies are needed, you know.

[Enrico Fermi Institute] 12:50:25
Are there any capabilities provided by Pcr. Cloud that would allow us to run more clothes that we can't learn other places right?

[Enrico Fermi Institute] 12:50:33
and start to stiple some ideas here, but we will have some further discussion in the Rna section.

[Enrico Fermi Institute] 12:50:40
yeah, have a cloud. It seems like we can basically run whatever we want.

[Enrico Fermi Institute] 12:50:43
But we're limited by cost, and we can talk about that more in the yeah and the focus area.

[Enrico Fermi Institute] 12:50:49
But yeah, is open for restricted to the fact that there's sort of a because it's obviously easier.

[Enrico Fermi Institute] 12:51:00
If you can run everything, it's better, but you know, maybe we really should think sometimes for some machines should be different workflows, because that's what they're designed for.

[Enrico Fermi Institute] 12:51:15
But everything's not a now. It's a balance, because if you restrict it too much, it's completely on and and uninteresting for the experiment, and you will never be able to fetch it

[Enrico Fermi Institute] 12:51:28
I mean, if you want to fetch it, it has to be something that can run the majority of what you're doing, Otherwise I expect.

[Enrico Fermi Institute] 12:51:37
I mean, we discussed this over the weeks. I mean, we got the comment.

[Enrico Fermi Institute] 12:51:42
If it's only can run a generator, it might still th that will probably get your your proposal through.

[Enrico Fermi Institute] 12:51:47
But you're not going to get the hours credited

[Enrico Fermi Institute] 12:51:52
Most I mean at least not easily. Maybe that's one of the outcomes from this to push towards the actual Wlcg.

[Enrico Fermi Institute] 12:52:02
Sam on it. Hope you're hearing this so that we should get credit.

[Enrico Fermi Institute] 12:52:08
We should work towards the thing. We're useful, Useful computation gets credit, no matter what it is, but the use for the the pledging comes before the useful computation. If the resource is limited, that makes it less useful, resources you kind of basically I see the argument.

[Enrico Fermi Institute] 12:52:28
That when you go in I have this allocation 100 million hours on the Hpc.

[Enrico Fermi Institute] 12:52:34
Center, that and I can run one generator and then there's the tier one side.

[Enrico Fermi Institute] 12:52:38
I have 100 million hours equivalent over the whole year, or the allocation period.

[Enrico Fermi Institute] 12:52:44
It's worth more It's worth more to the experiment.

[Enrico Fermi Institute] 12:52:47
And I see that point

[Enrico Fermi Institute] 12:52:50
Okay, yeah, But again, we're where we are is right now.

[Enrico Fermi Institute] 12:52:55
We're we're saying it's Worth was right

[Enrico Fermi Institute] 12:53:01
Worthwhile that I I hope to get a 1 billion dollars a mad graph.

[Enrico Fermi Institute] 12:53:05
I was 1 billion hours of that front front somewhere, because I hope is that it enables something for experiment, or it awful lot from back one which is flexible.

[Enrico Fermi Institute] 12:53:18
So so do we need to be where we need to be pledging a different quality of service level.

[Enrico Fermi Institute] 12:53:25
I I don't know. I I I think it doesn't have to be for this blueprint, but I think somebody actually needs to step up and provide a proposal.

[Enrico Fermi Institute] 12:53:37
That people can disagree with. But but actually, somebody at some point needs to do some writing to say, Here's a model, I think, is useful, and and be able to so willing to think criticism right.

[Taylor Childers] 12:53:49
Isn't this a void? I mean this The larger question is, how do you make more?

[Taylor Childers] 12:54:01
the L. The Lhc workflows compatible with modern architectures. Right?

[Taylor Childers] 12:54:07
I mean, and of course I understand all the hang ups there.

[Taylor Childers] 12:54:12
I'm just saying that we can talk about what architectures aren't working for the HD.

[Taylor Childers] 12:54:22
Community. As long as we want, but we also need to be moving our our software in a direction that makes it easier to approach different hardware, cause I mean, it's just gonna get worse before it gets better you're going in in their own direction.

[Enrico Fermi Institute] 12:54:41
Yeah.

[Taylor Childers] 12:54:45
With hardware The Jones going in their own direction with hardware.

[Taylor Childers] 12:54:49
The Us is probably, gonna, I assume, continue with the us manufacturers.

[Taylor Childers] 12:54:55
okay, for political reasons, and of course, the Chinese are all developing their own and plan on having tons of compute power available.

[Taylor Childers] 12:55:04
So it's it's really question of Why can't we move in that direction?

[Taylor Childers] 12:55:11
And of course I I think we all know those answers.

[Taylor Childers] 12:55:13
But it maybe needs to travel up the up the chain.

[Taylor Childers] 12:55:19
One

[Enrico Fermi Institute] 12:55:22
Contract. It was

[Kaushik De] 12:55:26
yeah, right. Wanted to make a few comments about this this.

[Kaushik De] 12:55:36
I mean, it's not that it is useful for experiments to get access to resources.

[Kaushik De] 12:55:46
That may not be globally useful, and provide the value for particular workflows.

[Kaushik De] 12:55:56
I mean, we have the tools to make use of resources like that.

[Kaushik De] 12:56:00
Assuming we are not spending years of development and operational effort to to 2 use the resource, I think there's nothing wrong with having specialized resources as long as they're easy to use.

[Kaushik De] 12:56:17
I mean the experiments know how to use them.

[Kaushik De] 12:56:19
I think the the question is, how do you assign a value to that resource?

[Kaushik De] 12:56:28
I mean clearly that resource. You using the example that this is given, comparing, you know, a 100 million hours at Hbc.

[Kaushik De] 12:56:38
That only runs generators versus 100 million hours at 50 year, one that can do everything for the experiment.

[Kaushik De] 12:56:43
Clearly the 2 things are not the same, so the question is, how how do we assign different values to those 2 different kind of resources?

[Kaushik De] 12:56:50
And I think that is the real challenge for this working group.

[Kaushik De] 12:56:55
I mean, that's what we really need to come up with with our out of this workshop is is how do we assign a fair value to one more?

[Kaushik De] 12:57:03
Is the other

[Enrico Fermi Institute] 12:57:09
That's a great way to maybe, break for the for lunch break, and we have the Hpc focus area where we have more time to go into some of these things in more detail, and we'll have more slides prepared to cover some of that one thing I just want to mention before we closed the

[Enrico Fermi Institute] 12:57:25
framework. Developments are specifically supposed to be outside the scope.

[Enrico Fermi Institute] 12:57:31
I mean, we'll touch it a little bit, because it's some of the sets the scope of what's usable and what's not.

[Enrico Fermi Institute] 12:57:36
But we don't want to go into that Yeah, we have to partition somewhere.

[Enrico Fermi Institute] 12:57:39
Yeah, please, let's not go design to do that themselves.

[Enrico Fermi Institute] 12:57:52
You're from actually the Hbc. People like And I'm gonna go, Okay, So we'll we'll break for an hour.

[Enrico Fermi Institute] 12:58:04
We'll be back at one o'clock us Central time, and we'll do the Hbc.

[Enrico Fermi Institute] 12:58:10
Focuser.

[Enrico Fermi Institute] 12:58:12
See, everybody, then

[Andrew Melo] 13:00:38
everybody I had to pop out for a second, or we don't schedule the at one Pm.

[John Steven De Stefano Jr] 13:00:48
scheduled to reconvene in 1 h to oh, 2 Pm.

[John Steven De Stefano Jr] 13:00:53
Here in Eastern

[Andrew Melo] 13:00:56
Gotcha. Okay, So we're on schedule.

[Andrew Melo] 13:00:57