[Eastern Time]

ACCOUNTING SLIDES

[Enrico Fermi Institute] 11:00:03
Cool Dale is indeed here

[Enrico Fermi Institute] 11:00:25
We'll wait a couple more minutes for folks to filter in

[Enrico Fermi Institute] 11:01:41
oh!

[Enrico Fermi Institute] 11:02:29
e

[Enrico Fermi Institute] 11:02:37
Yes, slides

[Enrico Fermi Institute] 11:03:24
okay, we're gonna get here started here just a minute.

[Enrico Fermi Institute] 11:03:28
So thanks everyone for making it to the to the final day of the workshop goal.

[Enrico Fermi Institute] 11:03:35
For this morning will be to do a discussion about number of topics, one which is counting the other would be to continue our pledging discussion.

[Enrico Fermi Institute] 11:03:50
assuming, you know, of all the appropriate people are here for that.

[Enrico Fermi Institute] 11:03:53
another topic. We wanted to cover was security, you know, both on the on the clouds and the Hpcs.

[Enrico Fermi Institute] 11:04:02
And I think ultimately the the stuff that we cover this morning will inform for poly recommendations that we would, that we would make as part of the report in the afternoon.

[Enrico Fermi Institute] 11:04:17
I don't expect us to take the full 2 h may not even take the full 2 h this morning.

[Enrico Fermi Institute] 11:04:26
but in the afternoon we'll talk about the We'll have a presentation from the vera ribbon folks, and then we'll I think we have a couple other minor topics any other, business, and and sort of next steps for work to go with with the us.

[Enrico Fermi Institute] 11:04:47
see So about 20 people. Online So that's probably good to going here.

[Enrico Fermi Institute] 11:04:56
So yeah.

[Enrico Fermi Institute] 11:05:01
Dirk, did you want to make comments about this one?

[Dirk Hufnagel] 11:05:03
yeah, I I can. I can. Get the started. So as as with previous slides, so that the green is is a question that's directly copied from the charge and the the question, was what can the us at less than us, Cms: recommend to the collaboration to improve the utilization of commercial and Sbc

[Dirk Hufnagel] 11:05:23
resources, and both in terms of enabling more work for so we can run there, and also improving, and also improving the cost effectiveness of using these resources, and and one thing that that was all that already came up in the previous days was but we'll have a dedicated slot

[Dirk Hufnagel] 11:05:44
for here, so we'll see how much excellent discussion we're good is that at the moment these resources are nice to have say opportunistically, they give us some free cycles where we can run some things but they're not included.

[Dirk Hufnagel] 11:05:58
In the pledge, so we don't get full credit for them, and we don't them down included in the planning.

[Dirk Hufnagel] 11:06:06
another. The thing is that to make use of sites like the Lcf.

[Enrico Fermi Institute] 11:06:08
Let's

[Dirk Hufnagel] 11:06:12
Specifically, but also other opportunities that are available in cloud.

[Enrico Fermi Institute] 11:06:15
Okay.

[Dirk Hufnagel] 11:06:16
And then our own good sides potentially, we need Gpu payloads.

[Dirk Hufnagel] 11:06:20
We need some way to utilize, deployed Gpu resources on the cloud side We had a lot of discussion yesterday on that in the cost section and doing the networking the the pick, big worry a big part of the where you're on cost on cloud is egress.

[Enrico Fermi Institute] 11:06:24
See.

[Enrico Fermi Institute] 11:06:27
Okay.

[Dirk Hufnagel] 11:06:41
So minimizing egress or negotiating some way, be the peering agreements.

[Dirk Hufnagel] 11:06:49
Either subscription. Where this cost is basically removed or targeting cloud resources that don't have egress charges.

[Dirk Hufnagel] 11:06:57
We basically want to avoid the the egress costs that currently are, they're not dominating.

[Dirk Hufnagel] 11:07:03
But they are large factor of the whole. Cloud cost calculation, and on the Hbc side the focus will be on reducing operational overheads, especially on the Lcf.

[Dirk Hufnagel] 11:07:19
Side, where this is still a little bit of an on D area in terms of figuring out what the what the final operational model will look like, and then another way to reach cost and make make it easier to operate hpc resources is to get sizable storage allocations because it makes

[Dirk Hufnagel] 11:07:39
something simpler.

[Dirk Hufnagel] 11:07:47
We could move on, Maybe at this point, if you're talking about accounting, which is, which is part of like having this these types of resources be a fully equivalent player, I mean, pledging is one side but then you need to take care of of accounting to make sure that you actually deliver what

[Dirk Hufnagel] 11:08:06
you promise to deliver. We have a comment. I guess we can.

[Dirk Hufnagel] 11:08:10
We can do that right right away

[Enrico Fermi Institute] 11:08:14
Yeah, feel free to jump in at any time.

[Dirk Hufnagel] 11:08:16
Eric

[Eric Lancon] 11:08:17
Yes, good morning. Before we change the slide. I think the first recommendation would be that Atlas and Cms both work on the

[Eric Lancon] 11:08:35
Having a Gpu, or whatever or accelerator find the payloads. Because if you don't have a generic software, you you would not be able to use at hi resources

[Eric Lancon] 11:08:56
And if they are very restricted, these resources to a specific kind of application, let's say, just specific.

[Eric Lancon] 11:09:07
evolution, because it's suited to to Hey, Gpus, it There would be no way to get them plagued, because pledges are as they, are for now.

[Enrico Fermi Institute] 11:09:13
Okay.

[Eric Lancon] 11:09:19
It's for all will work close together. It's not greatest by type of workflow, so

[Dirk Hufnagel] 11:09:28
Yes, that that's that's one problem, And we can discuss this later that that the moment the pledges I mean, you can.

[Enrico Fermi Institute] 11:09:33
Sure.

[Dirk Hufnagel] 11:09:34
You can pledge across Cpu architecture. You can see that because you can.

[Dirk Hufnagel] 11:09:39
You can run your normalized, whatever this be. Right now. Hs: 0 6 in the future will be half score.

[Dirk Hufnagel] 11:09:45
You can run your benchmark, you get your Cpu speed it.

[Dirk Hufnagel] 11:09:49
It includes some averaging for the airbus, but at least it's in principle possible.

[Dirk Hufnagel] 11:09:54
Gpu is more tricky, because how do you account for that?

[Dirk Hufnagel] 11:09:56
Do you give like a 20% bonus? Do you look at the most commonly used workflow?

[Dirk Hufnagel] 11:10:02
And I think that's a that's something we can't decide here.

[Dirk Hufnagel] 11:10:06
That's something we'll have to discuss with Wlcg how they once we have these, these, these workloads I can use Gpu and get a benefit from them.

[Dirk Hufnagel] 11:10:16
How does the the how do you factor? That in and the performance normalization?

[Dirk Hufnagel] 11:10:21
Is there gonna be some extra factors? Is a different category that you pledge?

[Dirk Hufnagel] 11:10:25
I I don't know. I don't have the answers, but it's it's an area that that needs to be discussed.

[Dirk Hufnagel] 11:10:30
But Wlcg

[Eric Lancon] 11:10:33
Yes, but the recommendation to the experiments would be first to actually right.

[Enrico Fermi Institute] 11:10:39
The

[Eric Lancon] 11:10:41
The development of a Gpu friendly software

[Dirk Hufnagel] 11:10:45
Yes, I mean, we're doing that, I mean, there are 2 avenues and cms at least at 2 avenues, and and that's that's the machine.

[Dirk Hufnagel] 11:10:53
Learning that we talked about yesterday, which is that that might be the most common at the moment between Cms and Atlas, because similar training frameworks, and so on.

[Dirk Hufnagel] 11:11:04
That you could use in the framework itself. For obvious reasons.

[Dirk Hufnagel] 11:11:08
This is somewhat distinct from each other. But we're both looking at that.

[Dirk Hufnagel] 11:11:12
Cms: maybe is a little bit more advanced than Atlas because of the the the push from the hlt, and the deployment of Gpu Also resources there.

[Dirk Hufnagel] 11:11:21
But I'm sure that Atlas is also at least looking at this

[Enrico Fermi Institute] 11:11:31
Okay.

[Dirk Hufnagel] 11:11:36
Lincoln. Can you go to the next slide?

[Dirk Hufnagel] 11:11:39
If that comment was addressed. So, looking at accounting, so maybe at this point, this is a good time.

[Dirk Hufnagel] 11:11:46
We have an invited contribution for benchmarking, because for accounting and for pledging one of the prerequisites is that you have to know what you're actually pledging in terms not just not in terms of course, but in terms, of some normalized numbers, hs, 6 or in the

[Dirk Hufnagel] 11:12:05
future have score, so that there that Maria offered contributed Talk about benchmarking of Hbc I think

[Enrico Fermi Institute] 11:12:06
Good.

[Dirk Hufnagel] 11:12:16
And David wanted to give the toocacy connected

HPC BENCHMARKS PRESENTATION

[David Southwick] 11:12:20
yeah, I am. Can you hear me?

[Dirk Hufnagel] 11:12:21
Okay, Yeah, Do you want to? Okay, Do you want to share the slides or otherwise Link can also share

[Enrico Fermi Institute] 11:12:22
Yeah, we'll I'm not sharing anything

[David Southwick] 11:12:25
Okay. Just like my headphones have

[David Southwick] 11:12:35
Just a second here

[David Southwick] 11:12:38
Okay, you can still hear me. Great. So I do have just a few short slides.

[Enrico Fermi Institute] 11:12:41
We can.

[Enrico Fermi Institute] 11:12:45
Okay, okay.

[David Southwick] 11:12:45
That I'd be happy to hear

[David Southwick] 11:12:49
See if I can do this

[Enrico Fermi Institute] 11:12:51
Sure.

[David Southwick] 11:12:59
Yes, hmm.

[David Southwick] 11:13:06
Okay.

[David Southwick] 11:13:13
Okay, Do you see the? So I do. Okay? Great: So Yeah, I've got a few comments, and just to share a bit by what we're doing.

[Enrico Fermi Institute] 11:13:15
We do. Yep. Looks good

[David Southwick] 11:13:27
With Hp. See with the head benchmarking Last couple of years I've been collaborating with the hex benchmarking working group really to take the this replacement model, for he speculated 6.

[David Southwick] 11:13:49
And really developed this that so that it can work on Hpc.

[David Southwick] 11:13:52
As well, cause I'm I'm sure, sure many people are very familiar with this, and how it was done in the past.

[David Southwick] 11:13:57
It was meant to be as as similar as possible to the bare metal works.

[David Southwick] 11:14:04
Notes that Wcg. Was using. So it was Vms, or at some point nested containers.

[David Southwick] 11:14:11
And things like this that are no way compatible with Hbc: So, but in a bunch of work to to make this as lightweight and user friendly as possible, So it's totally rootless.

[Enrico Fermi Institute] 11:14:23
Okay.

[David Southwick] 11:14:27
Now, we're using switch to singularity images, and then also a bunch of quality of life things that that allow you to use it on sites that have, you know, don't have wide area in networking tool Okay, things like that So it's really, been a big big it for

[David Southwick] 11:14:46
the last year or so a bit about the sweet self.

[David Southwick] 11:14:51
I'm sure some of you are No, you're over.

[David Southwick] 11:14:52
Maybe I've run this already since we've been distributing say, proof of concept or release candidates here for the last couple of months.

[David Southwick] 11:15:04
basically it's well, I love this. I already shared, But it's now sort of flexible thing that, and you can run on any hardware.

[David Southwick] 11:15:14
You can see, I've got a small graphic on the right right here, or the suite itself is an orchestrator that will go and collect metadata of whatever hardware it's running on and control the array of benchmarks you want to use so on that the bottom

[David Southwick] 11:15:30
part is graphics. There's a couple different benchmarks had spec of 6 is one of them He score, which is the the the candidate replacement for it or I don't know if I can call it Canada anymore.

[David Southwick] 11:15:46
but you can easily plug in other mitch marks as well.

[David Southwick] 11:15:49
So this is the tool we've been using on Hbc: So a bit about that, This effort, like I said, started.

[David Southwick] 11:15:59
I guess more than a year ago. So I and the initial presentation of the Hpc.

[David Southwick] 11:16:07
Work that was during chat 21, and at that time we had just done large-scale deployments, so doing several 100,000 core campaigns, and at that time we were looking at comparing New Amd.

[David Southwick] 11:16:22
Mdc. Views that were available widely on Hpc.

[David Southwick] 11:16:26
Sites, but not yet widely accessible, and from so we did a comparison of that, and stability, studies and whatnot.

[David Southwick] 11:16:37
So that was interesting. The first step. What's happened since then is we've had a lot of software become available from the experiments.

[David Southwick] 11:16:48
obviously the first look at the run. 3 workloads, but along with that there's been a bunch of heterogeneous development from basically all of the experiments.

[David Southwick] 11:16:57
So and I mean heterogeneous, both in compile compile codes and in accelerators.

[David Southwick] 11:17:06
So we've got several workloads that are in development for power, and of course, Gpus So we've been using Hbc Then to take these these workloads, or let's say, snapshots.

[David Southwick] 11:17:23
Of them. They're sufficiently stable. I will containerize them in singularity, and then run them at scale on Hpc.

[David Southwick] 11:17:32
And that enables a lot of different interesting studies. So Gpu versus cpu, and then combined for some, we're close there, support that as well as more exotic combination.

[David Southwick] 11:17:46
So arm, plus gpu said Power plus gpu, and things like this.

[David Southwick] 11:17:54
And I know this was discussed. Excellent! Some point yesterday.

[David Southwick] 11:17:56
I think I was listening, in but in case these are now available via the benchmarking suites, and you can run it just as you would on a bare metal machine on Hbc.

[David Southwick] 11:18:12
I think you it was presentation yesterday from Eric about the in Lps: Yeah, I workload.

[David Southwick] 11:18:20
That's also containerized, and can be run at scale on each Pc.

[David Southwick] 11:18:25
however, the configuration they have at the moment is a the the snapshot of that is just single note.

[David Southwick] 11:18:31
At the moment. So if you do want to use that for Npi related, scaling, then you need to run just the workload container and not the the suite, because it well it's not the target at the moment of of Wcg to do Npi and as I mentioned I guess we've

[David Southwick] 11:18:54
got a lot of other quality of life things. So if you have local storage like Cdm Fs, you could take advantage of this instead of room, pull copies and whatnot.

[David Southwick] 11:19:03
But really, as has been set here many times there's a lot of interest in Gpus, and then you can see I got a small slice here in the number.

[David Southwick] 11:19:15
Of course you get from current and next generation Gpus.

[David Southwick] 11:19:21
So there's I love to going in that direction, and we know we're expecting close.

[David Southwick] 11:19:30
so, that being said, there isn't industry, standard Gpu benchmark yet, and at least from the benchmarking side of things.

[David Southwick] 11:19:43
We are kind of approaching it in the same way that we have Cpus.

[David Southwick] 11:19:48
Oh, you use workload set production workloads, or what will be production workloads, And we can generate the score in the same way that we do for head score which is some function of throughput or events.

[David Southwick] 11:19:59
Per second. So this is what we've been using so far with trying to understand the capabilities of a machine that is going to be running Gpu.

[David Southwick] 11:20:13
Only Cpu, and I mean there was a he score work last week, I think several people are probably part of, but a lot of discussion happened there as well of how to account for for these sorts of resources.

[David Southwick] 11:20:34
so how just concluded, saying that we we've been active on Hbc.

[David Southwick] 11:20:38
Benchmarking now for couple years. Use the suite because it's automated running and reporting of large scale.

[Enrico Fermi Institute] 11:20:39
okay.

[David Southwick] 11:20:48
You can do whole part or several partition benchmarks.

[David Southwick] 11:20:52
This includes exotic workloads both for machine learning.

[David Southwick] 11:20:56
Yeah, I as well as architectures as well as starting to look at.

[David Southwick] 11:21:03
Let's see, sort of the other services that you get on Hpc: I know it was mentioned yesterday that it issues with scaling.

[David Southwick] 11:21:14
Io bomb workloads. And how can you tell what what's good on a shared filer system that maybe you don't have any information about.

[David Southwick] 11:21:24
So we are starting to develop. And we've got a prototype.

[David Southwick] 11:21:27
Let's see Mark. I say benchmark kind of in quotes, because it's not benchmarking a a compute, unit, but testing the shared filer service.

[David Southwick] 11:21:37
And then from there giving you some feedback on both your workload and let's say, how many nodes you could scale that up to before where it starts locking up that the file system in some way.

[David Southwick] 11:21:53
So that's what's new with us, and a little bit of a peek into.

[David Southwick] 11:21:57
We're doing in where we're going and I'm happy to answer any questions

[Enrico Fermi Institute] 11:22:06
Okay, David. Thank you very much. So we have a couple of hands raised.

[Enrico Fermi Institute] 11:22:10
follow up! Good

[Paolo Calafiura (he)] 11:22:12
good morning, everyone. So the the first you said that you said that the there are no industry standard Ml.

[Paolo Calafiura (he)] 11:22:23
Benchmarks, and I think that is still accurate.

[Paolo Calafiura (he)] 11:22:26
But I want to meet. I want to be sure you guys are aware of Ml.

[Paolo Calafiura (he)] 11:22:30
Per which is becoming a little system, and it's not so.

[Paolo Calafiura (he)] 11:22:36
So okay. So that's that was the first. The first comment.

[David Southwick] 11:22:36
yeah.

[Paolo Calafiura (he)] 11:22:41
The second comment is that the and I mean, I I You guys have a very difficult job, because what we are seeing in Tc.

[Paolo Calafiura (he)] 11:22:52
Is that the same software? I mean the same out of it around on different software, on different platform, performs quite differently.

[Paolo Calafiura (he)] 11:23:04
So, if you have, a if you have a fast parameter or simulation, we should run that with alpaca, or with the kuda, or with the of well, cool, that is a different problem without Packer. With caucus, you'd get different different performance, of the same code in

[Enrico Fermi Institute] 11:23:13
Hmm.

[Paolo Calafiura (he)] 11:23:22
principle on day for different machines depending. What what portability layers to probability to you.

[Paolo Calafiura (he)] 11:23:31
So I wanted to ask you if you have a settled on a platform, for like parallelization platforms, but to use, or if you are taking the world, you know you're taking a mix so what what's your problem

[David Southwick] 11:23:45
so I don't think it's settled at the moment, and of course this is a a popular question of what sort of optimization targets you're using for your workloads.

[David Southwick] 11:23:56
And I mean not just for

[David Southwick] 11:24:01
Translating the Cross architecture. But even within the same families of of of units.

[David Southwick] 11:24:09
So at the moment, I I think the method we have is take the minimum compatibility, so that we don't having, you know, 1020 different versions of of the workplace Okay, want to have the same thing that we can run everywhere and and sort of take all of these variables out of the

[Enrico Fermi Institute] 11:24:18
Okay.

[David Southwick] 11:24:30
equation, But that being said, I don't think there's a clear answer on on the proper way to do that yet, and it's not not so.

[David Southwick] 11:24:42
Let's say.

[Enrico Fermi Institute] 11:24:48
Okay, hum. See, Steve has this hand raised

[Steven Timm] 11:24:51
Yeah, I was just wondering if the toolkit is available anywhere for download that have been

[David Southwick] 11:24:58
Oh, good, Yeah, Absolutely So the let's say I'll just go back.

[David Southwick] 11:25:04
I think I have a link to. And so the benchmarking suite itself, that is it. While all that is open source.

[David Southwick] 11:25:13
It's on git lab at certain. So flash, hip, benchmarks, and then the suite is the the project down there.

[Steven Timm] 11:25:15
oh, okay, I see it. Okay.

[David Southwick] 11:25:22
I have the link on the screen, the yeah, the other benchmark I was talking about, for let's say services.

[David Southwick] 11:25:30
This Io Benchmark. I don't have the link on there, but I can share this afterwards.

[David Southwick] 11:25:36
It's in a That's right. Prototype state right now.

[David Southwick] 11:25:40
We have been working on this year. It doesn't cover all the things you can throw at it yet.

[David Southwick] 11:25:47
But yeah, you can download it and play around with it.

[David Southwick] 11:25:51
And I should mention that the idea for this will really is from, I I think I don't know if he's in in the room now, but it I've seen, and the previous days so shut up. To.

[David Southwick] 11:26:04
Him, I guess.

[Enrico Fermi Institute] 11:26:07
Okay, Okay, go ahead.

[Steven Timm] 11:26:10
Hello! There was no different so they have Benchmark 3.

[Steven Timm] 11:26:14
That's no different than the when they were running on regular note.

[Steven Timm] 11:26:17
Then

[David Southwick] 11:26:18
Yeah, Yep: exactly.

[Steven Timm] 11:26:20
Okay? Good. And then there may be a newly special one.

[David Southwick] 11:26:25
Well, there, yeah, like I said, the there was a workshop last week on this discussing.

[David Southwick] 11:26:33
you know how to choose the final versions and the waiting and whatnot.

[David Southwick] 11:26:37
So there's a little bit more qualified people around, I think, to answer specific questions on on that.

[David Southwick] 11:26:43
But it's in progress. Yeah.

[Enrico Fermi Institute] 11:26:47
Okay.

[David Southwick] 11:26:48
There will be another version, I guess, with the the let's say, gold standard for okay for the Benchmark suite.

[David Southwick] 11:26:55
That's decided

[Enrico Fermi Institute] 11:27:02
Sure.

[Dirk Hufnagel] 11:27:04
yeah, I just had it quick. Question. You said, You're you're benchmarking cpu plus gpu and also, Gpu workloads.

[Dirk Hufnagel] 11:27:13
Now I can see. I mean hapscore, and have specs zeros on cpu that's like well established.

[Dirk Hufnagel] 11:27:20
You take a mix of experiment, specific workloads and average something, throws something together and get get some average.

[David Southwick] 11:27:26
Yep.

[Dirk Hufnagel] 11:27:26
I have. What do you do for the Gpu stuff?

[Dirk Hufnagel] 11:27:30
Because it's so early, and and the experiments I'll algorithm.

[Dirk Hufnagel] 11:27:34
I know Cms has something, but it's not complete.

[Dirk Hufnagel] 11:27:36
It's not a complete picture. Do you run synthetic stuff, or do you run the very early stuff?

[Dirk Hufnagel] 11:27:42
Because that's the only thing you can do

[David Southwick] 11:27:43
We this there. Yeah, So we're running very early stuff from Cms: There's an Lpf.

[David Southwick] 11:27:50
Which is a bit of a an exact bird, which Yeah, there was a talk yesterday on that.

[David Southwick] 11:27:56
we're we also are using hlt and then as well, the sort of rolling builds from check windows, from mangraph

[Dirk Hufnagel] 11:28:11
Okay, thanks.

[David Southwick] 11:28:13
I guess there's also some other exotic Gpu workloads, but these are from Beams Department.

[David Southwick] 11:28:21
So there's this simple track, or

[David Southwick] 11:28:26
I know ped track is in there as well, but I don't think we have a container for Patrack

[Dirk Hufnagel] 11:28:31
so it's it's early going. So the numbers you get might not necessarily be representative of ever we end up running in production later

[David Southwick] 11:28:40
Exactly so. I mean, there's a lot of results results already, since we you know, we use the the suite as a reporting tool as well.

[David Southwick] 11:28:48
So it's it all gets pushed up over Amq into Cabana, and then you can All the workloads are hashed so you can compared performance across every node that it's run on with the same version of the world Okay, so at least with.

[David Southwick] 11:29:04
You know that published version? If you have your own bill, let's say you can track You can compare device to device like this, but you're right.

[David Southwick] 11:29:13
I mean, these are really, or let's say, snapshot releases of some of these.

[David Southwick] 11:29:19
So they will change whenever it's decided that that's going to be a production.

[David Southwick] 11:29:25
Or let's say, a a final version validated in some way.

[David Southwick] 11:29:28
Yeah.

[Paolo Calafiura (he)] 11:29:29
so starting to jump in in Cc. We have assembled the kind of what we think is A is a cross-section of representative applications, and of course we have no, we have No.

[Paolo Calafiura (he)] 11:29:42
Standing is just like asking around. I wonder if we should compare now, and we can see if if we, if we pick the same, and under which configuration maybe we should have an offline discussion between our groups

[David Southwick] 11:29:54
Sure you're using workforce or like off the shell benchmarks

[Paolo Calafiura (he)] 11:30:02
No, no, we're using using a Hp workload.

[David Southwick] 11:30:06
Yeah.

[Paolo Calafiura (he)] 11:30:08
So simulation tracking. We're not doing machine learning workloads yet.

[Paolo Calafiura (he)] 11:30:11
So that's that's something that's something which is not.

[Enrico Fermi Institute] 11:30:13
Okay.

[Paolo Calafiura (he)] 11:30:14
But we should we should, we should compare notes also because of this dimension.

[Paolo Calafiura (he)] 11:30:18
This is not only the workload it's the is, the is the software software platform you use, which makes 1 one configuration different from enough.

[Paolo Calafiura (he)] 11:30:28
Anyway. Bye. Shut up

[David Southwick] 11:30:30
Yeah, I know we can click offline

[Enrico Fermi Institute] 11:30:35
A quick question, How long do these benchmarks take to run

[David Southwick] 11:30:38
So if there is some of the the Gp ones can be fast in the mat in the order of I don't know 20 to 60 min from the some of the Cpu ones are much longer depending on what the experiment code owners have put forward.

[David Southwick] 11:30:58
As well what they you, as a representative set. So I think the default block or Cbu only, and the current release candidate is something like forward to 6 h.

[David Southwick] 11:31:13
But that's many workloads they do back to back, and you run 3 3 iterations of each to get an average and get rid of outliers I'm not sure what that will look like or Gpu because all all the work was that talked about for Hpc that are

[David Southwick] 11:31:32
not sort of run 3 standard ones. These are optional things that you can elect to run on with the suites They're not included by deal.

[David Southwick] 11:31:45
They are available. You just have to use a little bit different configuration, which is included in the

[Enrico Fermi Institute] 11:31:53
Yeah, I guess the thing I'm wondering. And maybe this is just a broader general question for everybody here, I mean, is this the kind of thing that we want to start incorporating?

[Enrico Fermi Institute] 11:32:02
Integration process for Hpcs right? We go to, you know.

[Enrico Fermi Institute] 11:32:07
Stand up pro mudder, and then the next machine should make sure.

[Enrico Fermi Institute] 11:32:11
We run these benchmarks as part of that integration.

[Enrico Fermi Institute] 11:32:13
So we start getting, you know, the benchmark numbers in place, and then maybe that helps eventually with the pledging and that sort of thing

[Dirk Hufnagel] 11:32:23
I I can just say what we're doing right now, because even though they are optimistic, we still at the end of the year cannot compile the usage and and have spec with 6 to come just to have a comparison and to see what the what the big picture looks like and we just went through

[Dirk Hufnagel] 11:32:40
that exercise in 21 for all the Hpc.

[Dirk Hufnagel] 11:32:44
That we're using in the Us. And basically what I'm doing right now.

[Dirk Hufnagel] 11:32:47
I look at the Cpu, and compared to what?

[Dirk Hufnagel] 11:32:49
What others have, benchmark on the on, on, on, and usually you'll find a number where you can come up with a defensible.

[Enrico Fermi Institute] 11:32:56
The

[Dirk Hufnagel] 11:32:58
Have specs 6, but I mean, especially if you want to get to A If we have really pledging the resources with that, and it becomes relevant.

[Dirk Hufnagel] 11:33:07
I think we we need to run the benchmarks.

[Dirk Hufnagel] 11:33:09
Maybe maybe not right now. But but but if once we get to that point, I think we we should to get a better number.

[Enrico Fermi Institute] 11:33:17
Yeah I mean, it seems like we ought to have a plan for it. I mean, even if even if the content of the benchmarks themselves change over time just to have that kind of in our minds, and kind of in the pipeline for one we integrate these resources.

[Dirk Hufnagel] 11:33:17
Okay.

[Dirk Hufnagel] 11:33:30
We have a couple of race since Linkedin. Maybe we should get these comments

[Enrico Fermi Institute] 11:33:34
Yeah, Okay, Andrew set his hand up

[Andrew Melo] 11:33:39
yeah, I was just gonna point out, I mean, so so Dirk talked a little bit earlier about you know, like, How do you?

[Andrew Melo] 11:33:45
How do you you know account for the Gpus and in the Hep score?

[Andrew Melo] 11:33:52
And you know, he suggested, maybe you give like a 20% bonus, or something like that.

[Andrew Melo] 11:33:55
I think that I think that what makes sense, and I argued this at the Heps for meeting last week is that you can't really benchmark machines with just one single scalar anymore.

[Enrico Fermi Institute] 11:34:00
Cool.

[Andrew Melo] 11:34:04
Right? So I think you're just gonna have to be some sort of tuple, you know, per machine to have these different accelerators on it.

[Andrew Melo] 11:34:11
And I was gonna also point out that that while you know we're we're working on, I guess you can call it hip score 22 with with run 3 work close the the number that pops out of he score right now is it waited

[Andrew Melo] 11:34:26
after. It's just the feature. So, at least initially, what will be pledged You know the this head score unit that will be pledged will will only be taken into account the the Cpu which one.

[David Southwick] 11:34:29
Okay.

[David Southwick] 11:34:37
Thanks, Andrew, and I'd like to add on to that, since we do have this reporting automated reporting, and it gives you the the Json.

[David Southwick] 11:34:47
Okay, all of the work ones. Yes, it will give you a single say, Hepscore value, but it also gives you the value.

[Enrico Fermi Institute] 11:34:49
Yes.

[David Southwick] 11:34:55
Every workload, So if you once like, you're just interested in htt, or whatever it is, you can get the number on that were on that machine for that benchmark, and you can go compare just then the benchmarks you're interested in this is also already available.

[David Southwick] 11:35:14
So it is a bit, I guess, in that way

[David Southwick] 11:35:20
A bit more fine grain than than what we had in the past.

[Enrico Fermi Institute] 11:35:25
Okay, question or comment from Ian

[Ian Fisk] 11:35:29
yeah, it was only that I It was second, that I think it's valuable to be using the benchmarks as we begin to commission the multiple Hpc Sites also I think, in addition to having the the benchmark that tells you a number about how well they're

[Ian Fisk] 11:35:43
performing. I'm wondering, if it also starts a purpose as sort of what we used to think of as the site availability tests for some things There's a diversity in the workflows and that if they all succeed and give reasonable numbers.

[Ian Fisk] 11:35:54
You also have a reasonable expectation that the site is pretty well configure.

[Ian Fisk] 11:35:58
Again.

[Andrew Melo] 11:36:03
so so So a fun anecdote about that, email, you know, we we actually did see, some of this where someone was bench benchmarking Dean and saw the you know this machine.

[Andrew Melo] 11:36:15
That you know they They knew what the head score should be, for that machine was about half or you know, 75% of what they were seeing, and it turns out that the cooling of that racket failed and the machine, was actually power throttling.

[Andrew Melo] 11:36:24
So it was something that yeah, people are able to say, Hey, this machine isn't working right just from looking at these numbers

[Ian Fisk] 11:36:30
alright.

[David Southwick] 11:36:33
Yep.

[Ian Fisk] 11:36:33
Yeah, I think the the other thing that the as we in the Commission, probably more applicable to Hpc.

[Ian Fisk] 11:36:40
Than cloud is that these machines are much more complicated than we're.

[Ian Fisk] 11:36:44
They're not as sort of simple as a pile of essentially x 86 servers.

[Ian Fisk] 11:36:48
They tend to have more complex services, whether cooling or interconnect, or whatever; And so I'm a more detailed set of benchmarking meetings

[Enrico Fermi Institute] 11:36:59
Okay.

[Enrico Fermi Institute] 11:37:02
So you You have in one of your sites. You mentioned that you know you do the uploading of all the results, and you also mentioned that you have some kind of batch uploader for portal.

[Enrico Fermi Institute] 11:37:13
Secure, workers. Does. That mean that this would work on Lcfs where the workers don't have any, you know.

[Enrico Fermi Institute] 11:37:18
Outbound connectivity would just batch it and upload it from the login nodes.

[Enrico Fermi Institute] 11:37:21
Is that is that the idea

[David Southwick] 11:37:23
Exactly. You know. Sites like this are that's common, but they're not uncommon, either.

[David Southwick] 11:37:30
There's several that we've been working with here in Europe that have a similar configuration, and normally the default case for this is just running a you know the base case is to run it on single node for vendors.

[David Southwick] 11:37:43
Or whatever it is, and when the runs are finished it'll compile the report, and then send it over and queue.

[David Southwick] 11:37:50
But if you don't have connectivity on the machine that you benchmark, then you can collect these.

[David Southwick] 11:37:59
these Jason afterward and do a batch reporting basically Yeah, from a from a gateway note

[Enrico Fermi Institute] 11:38:06
Okay, great.

[Enrico Fermi Institute] 11:38:10
Other questions for David

[Enrico Fermi Institute] 11:38:18
Okay, Thank you, David.

[David Southwick] 11:38:19
Yep thanks.

SLIDES AGAIN - ACCOUNTING