Strategic considerations and Discussion
[Fernando Harald Barreiro Megino] 12:14:38
we still have a little bit of time. So I think that I mean on the on the cost.
[Fernando Harald Barreiro Megino] 12:14:46
People discussed it a lot, but this is the opportunity to to discuss any other like where your discussion, like, for example, egos, costs or other worries about the Cloud, or are there any particular ideas of how we can make better use of the cloud like to exploit elasticity
[Fernando Harald Barreiro Megino] 12:15:13
that I use what whatever Gpus or it is, discussed. Lansium.
[Dirk] 12:15:32
I I just wanted to it. We already talked about elasticity, and I just wanted to maybe focus on one of the points on the on the slide.
[Dirk] 12:15:42
Say that the different planning horizon versus our own equipment, and that gives you kind of a different layer of elasticity, because when you purchase equipment, it's not only that you have a certain number of deployed deployed cause in your data center but it's also when you purchase
[Dirk] 12:15:58
the equipment you usually basically make a commitment for the next 3, 4, or 5 years.
[Dirk] 12:16:04
Whatever the retirement window now is for for hardware that you buy.
[Dirk] 12:16:08
It's gone up a bit cloud. You don't have to make that commitment.
[Dirk] 12:16:15
Now. The the thing is, though, in our science usually we have pretty stable workloads, so we can't really take full advantage of that.
[Dirk] 12:16:23
So usually we buy equipment for 4 years, and we we expect, I mean, we have year to.
[Dirk] 12:16:30
Year We have always the work to get busy, but I'm not looking out.
[Dirk] 12:16:35
There's the the dip in and before Hlc.
[Dirk] 12:16:42
Comes up. I don't know if that's something that where Cloud maybe could help.
[Dirk] 12:16:48
If you basically, if at that point we were like 20% cloud, you could say, Okay, for these years, for the off years shutdown years, you could just not buy any cloud cycles.
[Dirk] 12:16:59
I'm not sure how that would play with subscription at the renewal like. If you are in a subscription model, and you could just skip a renewal and then resume. A year.
[Dirk] 12:17:08
Later. But that's a that's a possibility. And you don't kind of you really don't have that with purchase equipment, because you you you kind of continuously keep buying equipment.
[Dirk] 12:17:21
Just to not have like With everything retired all at once.
[Dirk] 12:17:24
I mean, you kind of cycle over your data center of all.
[alexei klimentov] 12:17:41
I I think this is a very simplistic approach.
[Enrico Fermi Institute] 12:17:41
Still had Alexa
[alexei klimentov] 12:17:48
I I I think what at least we are trying to do in Atlas.
[alexei klimentov] 12:17:53
We are trying to integrate calls in our computing model, and it is not Oh, we described. I want to remind you what one of the first topics at least which I remember about using clothes for was done by Bell experiment.
[alexei klimentov] 12:18:14
Not by Bill, but by Bill when they needed a Monte Carlo campaign to conduct a Monte Carlo campaign, and the way you signed it.
[alexei klimentov] 12:18:23
But for wham it is cheaper just to buy cycles and to run this Monte Carlo campaign.
[alexei klimentov] 12:18:32
So I I think just this comparison, and also what was mentioned before by several people.
[alexei klimentov] 12:18:40
But is it replacement of how with your tools?
[alexei klimentov] 12:18:44
Of course not. Post notice, not replacement, but it is resources which we can use, and elasticity for me.
[alexei klimentov] 12:18:52
It is one of them. Main features which we can use, and as Paula mentioned also before we go to purchase something new what we don't have now, we can try it in the cloud.
[alexei klimentov] 12:19:07
I also kind of disagree with statement with our workforce.
[alexei klimentov] 12:19:12
very stand up, or whatever you use, because what we see even now, and I think it will.
[alexei klimentov] 12:19:19
be in this direction, but you have new workforce, more complex for falls, which we at least an office.
[alexei klimentov] 12:19:29
We did not have during ground through and for high luminos, she will be more and more like that.
[alexei klimentov] 12:19:33
So that's why I think that's the problem is more complex, And we need to address it on more or complex way, and not try to what I'm afraid.
[alexei klimentov] 12:19:45
But if you start to, you know, split it on small pieces of, then we all know.
[Eric Lancon] 12:20:02
yes, sorry was Mr. Do you agree with Alexey that they are?
[Eric Lancon] 12:20:09
There are more complex workflows coming, and there are a need to adapt.
[Eric Lancon] 12:20:13
why, the what? I don't follow fully, the conclusion is that the cloud is most suited for this.
[Eric Lancon] 12:20:22
the facility needs to work to adapt to the new requirements.
[Eric Lancon] 12:20:28
And that's what make at the end the comparison.
[alexei klimentov] 12:20:46
if you could find my comment, I fully agree with you. I I fully agree with you, and that's why.
[alexei klimentov] 12:20:52
what will be. Try and pineapple, that is, a full chain for me.
[alexei klimentov] 12:21:00
It is bigger ones. And first 8 days also to try.
[Enrico Fermi Institute] 12:21:29
so one comment I had, I mean we we've spent a lot of time talking about how the clouds hook into the existing workflow system, panda and Whatnot.
[Enrico Fermi Institute] 12:21:39
And does it make sense to further, you know, talk about, or explore how clouds can either be used as an analysis facilities or extending analysis facilities in some way you know 1 one of the things that that you know the users might want work for example, or things like you know exotic type
[Enrico Fermi Institute] 12:22:01
of things, or or accelerators. You know. Gpus, things like that.
[Enrico Fermi Institute] 12:22:05
Can we use clouds to can sort of pat out those kind of resources that analysis facilities? Does that? Does it make sense to explore that
[Fernando Harald Barreiro Megino] 12:22:14
so in in all of the problems, both in that there there is always a possibility for the user to get account and really to whatever they need.
[Fernando Harald Barreiro Megino] 12:22:31
if it's more of a a central analysis, facility
[Fernando Harald Barreiro Megino] 12:22:39
The the analysis facilities that we are usually talking about, the Atlas Or Cms.
[Fernando Harald Barreiro Megino] 12:22:46
For that there will also be in the Atlas Project, and R. And D.
[Fernando Harald Barreiro Megino] 12:22:51
To to extend that. And okay, because presented some ideas to do that last week or 2.
[Enrico Fermi Institute] 12:23:14
It is Mike supporting. It so something that was interesting, and I don't have it right at my fingertips.
[Enrico Fermi Institute] 12:23:20
But Purdue actually got a Purdue university, actually got a pretty big grant from Google to set up a system where basically their badge system can burst into the Google cloud But they also have all the vpns and whatnot set up And the images are the same image as their you know, their
[Enrico Fermi Institute] 12:23:42
compute farm is, and with the VPN setting up the networking or whatnot, the the remote hardware that the cloud hardware is put on, quote the same as just the regular best they have there so You know outside of latency, or whatever you're you're basically can run just
[Enrico Fermi Institute] 12:23:57
slam in contour, or I think they run the storm there. You could slam in slum jobs and run whatever you want, So there's definitely work that's been done.
[Enrico Fermi Institute] 12:24:32
So maybe to bring up another topic from yesterday, we and we we mentioned here a little bit about, you know, using Cloud to run some kind of particular campaign or What have you does does that have any effect on on how we think about pledging clouds
[Enrico Fermi Institute] 12:24:53
And then, general, are there any any discussions over, want to have about pledging clouds
[Enrico Fermi Institute] 12:25:04
Turk. You want to jump in.
[Dirk] 12:25:06
yeah, I think I think cloud the the cloud fits into the discussion we had yesterday with pledging.
[Dirk] 12:25:15
I think, under the current rules to pledge a cloud, you would have to pledge a certain minimum amount.
[Dirk] 12:25:22
Of course. So if you replicate aside where you business always give like run, keep 4,000 calls running.
[Enrico Fermi Institute] 12:25:23
Yeah.
[Dirk] 12:25:29
You could pledge the 4,000 cores, but you couldn't, couldn't really take advantage of elasticity.
[Dirk] 12:25:35
So you kind of would have to pledge to lower boundary, because at the moment, with within some limits, because even even grid sites are allowed to cool below the floor for limited amount of time, I think so But but it puts limits on your on your basically how flexible you can use the
[Dirk] 12:25:53
resources The same problem we have with the scheduling on the Hpc.
[Dirk] 12:25:57
That that you basically you can't just keep the keep it off for 10 for 11 months of the year, and then use up everything in a month that wouldn't work with How the pledges are structured right?
[Dirk] 12:26:08
Now, and what the Rules are.
[Enrico Fermi Institute] 12:26:09
We pledge Hs. O. 6. Not course
[Enrico Fermi Institute] 12:26:21
So I but I. The point is that we have to figure out right.
[Enrico Fermi Institute] 12:26:27
If you gotta even consider pledging cloud research how to put it in a unit that is consistent with what we have So it's an apple staples.
[Steven Timm] 12:26:59
Yes, I was going back to the question of exotic resources.
[Steven Timm] 12:27:04
And I know they come with me yesterday that the exotic resources, such as the p machines of Amazon, the the Fpga is in the tensor, things or whatever are always the most highest price things you can get but you still have to weigh that as opposed to having more having them sit on
[Steven Timm] 12:27:21
site as somebody on premise, somebody singing and sucking up for all the time.
[Steven Timm] 12:27:25
And not being used all the time, at least, we don't yeah have a Dc.
[Steven Timm] 12:27:31
Use case for gpus or tensorflow with your fees, or whatever it was about that.
[Steven Timm] 12:27:37
So there is value, and I've heard that from management that they prefer.
[Bockelman, Brian] 12:28:08
yeah, I I just wanted to to. Maybe tackle something.
[Bockelman, Brian] 12:28:13
But what Doug said a little differently. It's I I'm worried less about the have spectacle.
[Bockelman, Brian] 12:28:20
6 equivalent. But the the fact that for cloud resources you probably need to pledge and Hep Speckle 6 h. Right?
[Bockelman, Brian] 12:28:30
Right, we we you know. It's it's the difference between kill a lot versus kilowatt hours, you know, at some aspect of the pledge.
[Bockelman, Brian] 12:28:39
Or, again, going to the the power. Grid analogy needs to be in kilowatt hours.
[Bockelman, Brian] 12:28:45
and what what what the benchmarks is, and I think it's less important.
[Bockelman, Brian] 12:28:49
but you know, How do you come up with a proposal that balances the fact that you do need some base capacity, and that's that's important.
[Bockelman, Brian] 12:28:59
But we it's very unlikely. A 100% of our hours need to be the the base capacity.
[Bockelman, Brian] 12:29:06
So, some combination of kill a lot and kill what hours and earth yeah analogies in our pledges.
[Johannes Elmsheuser] 12:29:20
right, a follow-up comment to this right, and at the end the pledges are, always as you say, a unit per year, right?
[Johannes Elmsheuser] 12:29:31
And we don't have for it's a unique Cpu architecture as well, right?
[Johannes Elmsheuser] 12:29:37
So there's always over the years with all the people appropriate, human, different, different kind of Cp architecture.
[Johannes Elmsheuser] 12:29:47
Still? What what and what's that before? Right? We we have more or less the same problem also on the grid.
[Johannes Elmsheuser] 12:29:57
We are also averaging there. So we don't have the same unit over and over at the same site.
[Johannes Elmsheuser] 12:30:03
Right. So in principle we are solving here, then, on the cloud the same problem.
[Johannes Elmsheuser] 12:30:08
So I I don't see this really as as proper automatic in that sense, because we we have exactly done the same thing, or plus 1015 years in the grid
[Bockelman, Brian] 12:30:18
yep, I I I don't think I'm following, cause what what we pledge on the grid is certain. Heps.
[Bockelman, Brian] 12:30:27
Spec, Oh, 6 capacity that that is available. Starting at a given time period.
[Bockelman, Brian] 12:30:33
Right, let me say we. Oh, but but it's it's
[Johannes Elmsheuser] 12:30:34
Right? And that's for one year, right? It
[Johannes Elmsheuser] 12:30:40
It's good for one year, and and at the site you don't have a specific unit unit of one Cpu: right?
[Johannes Elmsheuser] 12:30:47
You have always an average, and that was the argument before that.
[Bockelman, Brian] 12:30:50
Oh!
[Bockelman, Brian] 12:30:55
Hmm! No, no, no! But that's very different. It's it's not the average right cause.
[Bockelman, Brian] 12:31:01
I I can't come in and give you 12 times as much capacity.
[Bockelman, Brian] 12:31:03
In January, and and 0, it out for the next 11 months.
[Bockelman, Brian] 12:31:07
That that is most definitely not what the the mo use say.
[Bockelman, Brian] 12:31:12
It's very specific. He spectacular. 6 count available you know, depending on whether you're tier one or tier, 2.
[Bockelman, Brian] 12:31:19
I figure what the number or 85, 95% of the time.
[Ian Fisk] 12:31:27
right.
[Johannes Elmsheuser] 12:31:28
sure I but I agree that you you give an average basically over a certain time period.
[Johannes Elmsheuser] 12:31:34
I think we we agree here right and and as you say, we then have to say, Okay, you provided this.
[Johannes Elmsheuser] 12:31:41
Then 4 months, or for 3 months, or something like this. And this is then the pl.
[Ian Fisk] 12:31:49
No, I I guess also I'd like to argue that our pledging model, as it's right now, is probably not ideal, for that we have a model which is based on the fact that we have dedicated facilities we've been purchased, and the experiment's responsibilities to
[Ian Fisk] 12:32:04
demonstrate that over the course of 12 months they can average. Because they can use them in some average rate, that we both provision and schedule for average utilization and whether it's Hbc.
[Ian Fisk] 12:32:14
Or whether it's clouds, there's an opportunity to not do that, and we might find as collaborations that the ability to to schedule 5 times more for some period of a month, and allow you to hold them on a call for a year done was actually a much more efficient use of people's
[Ian Fisk] 12:32:32
time, and that our current existing, pledging model is sort of limiting.
[Ian Fisk] 12:32:36
I think they they. I believe Maria Geron, who's connected, presented this at Chef Osaka.
[Ian Fisk] 12:32:41
Probably 6 years ago. The concept of scheduling for peak, and it seems like we, because we have dedicated resources, and we have to show that they're well.
[Dirk] 12:33:25
yeah, and maybe maybe one complication with scheduling for Peak.
[Dirk] 12:33:30
You actually have to think about and justify using what you want to use for the peak.
[Dirk] 12:33:36
So it's it's more complicated to plan this, and steady state is You just keep it busy
[Ian Fisk] 12:33:39
it is more comfortable. No, it's it's it's more complicated to plan.
[Ian Fisk] 12:33:44
It requires people to be better prepared. It requires people to.
[Dirk] 12:33:47
Yeah. But that's maybe why it hasn't happened yet.
[Ian Fisk] 12:33:49
I right, but at at the same time it would allow, like, imagine that a 6 month Monte Carlo campaign was a one month, Monte Carlo campaign, and then Sp.
[Ian Fisk] 12:33:58
5 months, where people having to complete set for analysis, that might be a much more efficient.
[Ian Fisk] 12:34:04
And that's also, I think, a motivation for why you might want to go to clouds rates, we see, even if they were on paper more expensive, because you'd have to make some metric which is how much time people's time you're saving
[Enrico Fermi Institute] 12:34:17
which time are you trying to say you're saving
[Ian Fisk] 12:34:22
I would claim Oh, well, the entire collaboration time to physics, Perhaps I'm saying
[Enrico Fermi Institute] 12:34:23
Which people which people's time
[Enrico Fermi Institute] 12:34:34
How do you accurately measure without drawing a false conclusion?
[Ian Fisk] 12:34:40
Hi! I don't. I think it's difficult to.
[Ian Fisk] 12:34:42
I I think it's probably somewhat difficult to measure the inefficiency that we have right now, but I think you can.
[Enrico Fermi Institute] 12:34:48
Okay.
[Ian Fisk] 12:34:49
I think, without drawing a false conclusion, I think I can claim that the this particular way it's set up right now is designed to optimize a specific thing which is the utilization of particular just resortions
[Ian Fisk] 12:35:14
and that's I guess I'm claiming that's not the like.
[Ian Fisk] 12:35:18
If I assume that's the most important thing, because we spent all this money buying dedicated computers.
[Ian Fisk] 12:35:23
Yeah, that's a reasonable thing to say. We're not gonna let these things today.
[Ian Fisk] 12:35:27
We're not gonna over provision, but I think it's it's it's very difficult to say that you can state the that optimization was designed to use this particular resource happens to also be exactly the perfect optimization.
[Ian Fisk] 12:35:40
For these other kinds of resources like time to physics, like what a like!
[Dirk] 12:35:56
all efficient use of resources. I mean, that's the one thing, Cloud, and and you buy the re.
[Dirk] 12:36:02
That's the one main difference. I I see you. You buy resources.
[Dirk] 12:36:07
You have them sitting on your floor, you might as well use them, because it's already paid for.
[Dirk] 12:36:10
So it's already paid for. So at that point, use doesn't okay energy costs whatever.
[Dirk] 12:36:14
But you, you kind of have to keep him busy. Hbc.
[Dirk] 12:36:16
And Cloud, You kinda have to. You justify because you're more elastic.
[Dirk] 12:36:19
So you get the allocation, and especially with Cloud. You You wanna make use of like flexible, elastic, scheduling.
[Dirk] 12:36:28
So at that point you have to justify each use So it's it's more complicated to to do that.
[Dirk] 12:36:34
But hopefully, if if you do it right, you get a more efficient use of resources out of it.
[Enrico Fermi Institute] 12:36:43
But how do you measure that
[Dirk] 12:36:46
It's I don't know.
[Enrico Fermi Institute] 12:36:50
Because think of it, this rate is a 10% cut of what we're doing Now, as you let's say that 10% diverts to the cloud. Then you have to see if that 10% divert the 10% diversion would give, you more bang for the park
[Ian Fisk] 12:37:21
and I. Well, we we actually we did this only a standpoint in a country way for disaster.
[Ian Fisk] 12:37:28
Recovery, which would be, What would it cost you? The scenario is, I've messed up my reconstruction I need to reprocess things, and I only have a month.
[Ian Fisk] 12:37:39
what is there Is there a model which says, there's a reasonable insurance policy which says, I'm gonna use the cloud for that kind of thing.
[Ian Fisk] 12:37:45
And so in some sense, you can make arguments for like, where this is valuable in very specific situations like there's been a problem.
[Johannes Elmsheuser] 12:38:25
I have a completely different common to a question. On the third point you have here, with the bullet point data.
[Johannes Elmsheuser] 12:38:32
So safeguarding. Is this something of concern or not?
[Johannes Elmsheuser] 12:38:40
To all. Just we just say the with your team has to basically safeguard our data for well to against users who are repeatedly downloading this.
[Johannes Elmsheuser] 12:38:54
And and then we are safe. What is there something behind the other?
[Fernando Harald Barreiro Megino] 12:38:56
what.
[Johannes Elmsheuser] 12:38:59
Something other behind this data, safeguarding keyword.
[Johannes Elmsheuser] 12:39:02
Here.
[Fernando Harald Barreiro Megino] 12:39:03
Well, that's a comment that sometimes I hear that you don't want to have the like.
[Johannes Elmsheuser] 12:39:27
Okay, right, So that that's the computing model that you have always the, so to say, another unique copy of your raw data.
[Johannes Elmsheuser] 12:39:40
For example, in the cloud that would be behind that
[Fernando Harald Barreiro Megino] 12:39:43
yeah, So like, what overall the role is it like Can a cloud be a nucleus?
[Fernando Harald Barreiro Megino] 12:39:50
Can so for Cloud only be treated as about 10 temporary.
[Fernando Harald Barreiro Megino] 12:40:00
so th the point is to let people express any any worries regarding this
[Ian Fisk] 12:40:12
I guess I would like to express a worry regarding that which is that I don't think that any reasonable funding agency is going to let you make a custodial copy of the data in the cloud because there's no guarantee that they don't change the rate to become
[Ian Fisk] 12:40:28
prohibitively expensive to move things out or prohibitively makes best move things in.
[Ian Fisk] 12:40:33
And in the same way that the agency won't let you sign a a 10 year lease on a fiber without tremendous amounts of negotiation.
[Ian Fisk] 12:40:40
They're not going to allow you to make a commitment in perpetuity for data storage.
[Ian Fisk] 12:40:44
So I think that actually almost by definition puts the clouds in a very particular place in terms of storage and processing to things that are transient, and things that can be there recorded at the end of the Job and the things that are done at the end because otherwise you're in the situation
[Kaushik De] 12:41:16
yeah, coming back to the question of how to make the most out of the case.
[Kaushik De] 12:41:20
I mean one of the things that we have heard a lot over the past many years actually are the Ai Ml tools and capabilities and ecosystem on the cloud is that something we should continue to pursue is that something that should be added to the list in terms of are we missing out on something
[Enrico Fermi Institute] 12:41:33
Okay.
[Kaushik De] 12:41:47
or is that something that we think know how to do better with our own tools?
[Dirk] 12:41:55
there is a session in the afternoon actually on and D.
[Dirk] 12:41:58
It's specifically a machine learning, training, And we actually have an invited talk from Son.
[Dirk] 12:42:04
I think they I I think it's Hbc.
[Dirk] 12:42:07
Training on Hbc: but it's similar, I mean, it's both Hbc.
[Enrico Fermi Institute] 12:42:21
It's also the case that the clouds do have some kind of proprietary exotic cards right that they that aren't available to the general public that are really meant for machine learning applications.
[Dirk] 12:42:37
yeah, but they they had. The bigger question is, then, what role will machine learning play in?
[Dirk] 12:42:46
In our basically computing operations, going going forward. And I I don't know. We have the answer.
[Dirk] 12:42:50
Neither seems, not Atlas. The final answer on that.
[Dirk] 12:42:53
So it's a bit hard to to say. This is the way to go.
[Kaushik De] 12:43:02
I mean the one thing that yeah, I think we are.
[Kaushik De] 12:43:11
You know we have been trailblazers in many, many areas, but I think in when it comes to the production use of aiml when it comes to everyday use of aiml.
[Kaushik De] 12:43:26
I think cloud and business systems that do so much of it.
[Kaushik De] 12:43:34
how do we, or pull that up and access that?
[Kaushik De] 12:43:40
And I'm not just paranoid, but to me, for for or perfect production level activities, because I noticed that almost anything that we look okay nowadays that Google does anything from their own products like maps and this that to services, that they are provide I mean it's really heavily
[Kaushik De] 12:44:08
dominated with aiml. I mean, it's almost exclusively that we Dml. But are we?
[Dirk] 12:44:21
let me. Maybe I can make a comment because the like can.
[Dirk] 12:44:25
You yesterday showed a used case. Cms, where they basically ran a miniod production, which is basically you take the the aod, which is a larger analysis format, and then slim it down and do some recomputations.
[Dirk] 12:44:37
To get it to a Miniod, which is smaller and actually useful.
[Dirk] 12:44:40
Analysis, and they They are pushing for the model where they do machine learning algorithm.
[Dirk] 12:44:47
They basically use algorithm does use machine learning. But then, during the production phase, you run only the inference server. So it's not actually you're not running the the learning.
[Dirk] 12:44:55
And that's that's for me, is the bigger question.
[Dirk] 12:44:58
Because if you do a one time shot where you're done, you run your learning algorithms on a bunch of data that we have.
[Dirk] 12:45:04
You figure out what you want to do, and then you only run the inference.
[Dirk] 12:45:08
During the heavy lifting reconstruction. Whatever else you do, then that's I'm not sure to what extent this is really impacting the overall computing operations.
[Kaushik De] 12:45:32
Yeah, And another aspect of this is that elasticity comes in when you talk about training, I mean, unless you go to control continuous training models, people are trying to do so.
[Dirk] 12:45:57
how much these large training runs, how much capacity is.
[Dirk] 12:46:03
Are we really talking about is is that making an impact on our overall compute resource use
[Kaushik De] 12:46:28
yeah, and we under H speed the service already in that. That's as a service.
[Dirk] 12:46:28
Okay, So that.
[Ian Fisk] 12:46:43
I think that's probably one of the ideal applications for primarily for Hpc.
[Dirk] 12:46:46
Yeah.
[Ian Fisk] 12:46:48
Because they already have that kind of hardware, and it doesn't.
[Dirk] 12:47:03
The the one thing, though, is it? This kind of application? Will Will it goes, and we will make a comment on under the report.
[Dirk] 12:47:10
But it it by design. It kinda happens outside the current production systems and infrastructure So it's kind of standalone so I'm not sure to what extent it's it's really in scope.
[Dirk] 12:47:22
For the report
[Ian Fisk] 12:47:22
I I I think this is one of the places where the concept of scheduling for peak comes into play, because, as you go to more machine learning things that require training and high parameter tuning, before you start running you change when the computing is spent, you spend the computing beforehand, and
[Dirk] 12:47:37
Yes.
[Ian Fisk] 12:47:39
then it's much faster on things like inference. And so it is a place where, like the model that says we're gonna use them all in Dc.
[Dirk] 12:47:56
And it also, I mean, it's that's even what, where I see him watchings.
[Dirk] 12:48:01
If if if this like, exploring the sinking out of it, the pledging, such resources, if you assume that this resource use is significant, you want to be able to pledge it.
[Enrico Fermi Institute] 12:48:14
Okay.
[Dirk] 12:48:15
But it's a single perp purpose pledge, which is completely outside the the scope of what w pledging currently is.
[Dirk] 12:48:22
But you want to get some kind of credit for such a used case, so that's that's even worse than than just what we discussed so far, which is basically just adjusting the the pledging to be more.
[Dirk] 12:48:37
Like a time, integrated value, not just the in the Ac.
[Ian Fisk] 12:48:41
right, and and this, and the kind of resources we're talking about here are the most expensive things we have.
[Dirk] 12:48:41
Dc. Argument
[Enrico Fermi Institute] 12:48:54
So maybe that needs to be written in the final report, so that they get there's the idea to push for flexibility
[Enrico Fermi Institute] 12:49:12
Because it is a different thing. You really do want to use for the training stuff that's designed for it work so much better
[Enrico Fermi Institute] 12:49:22
Which makes it special cause. I specialized until our code stack uses.
[Dirk] 12:49:40
I mean, we're trying that, too. It's if we had. This is.
[Dirk] 12:49:44
This is active area of on D trying different approaches. I mean Cms: We have the the hlt.
[Dirk] 12:49:50
That's attracting Hot tracking basically runs on Gpu, And that says pretty significant speed up.
[Steven Timm] 12:50:13
good student, just with you guys in Lensium, but also for some of the other more exotic resources, is even more probably on the Hps on The Lcf.
[Steven Timm] 12:50:23
System instead, that there are opportunities for things that can be opportunistically can go and grab a couple, or the computer come back with useful stuff.
[Steven Timm] 12:50:36
there. You may want to think about. Do you need? Is there a sense redesigned the workload that has to happen to best exploit those kind of resources because some some more folks are more.
[Steven Timm] 12:50:52
If you pre, you lose everything, Basically, if you're running for 10 h, you get 12 to go, or something like that.
[Steven Timm] 12:50:58
So I mean, we hit on, for instance, that you could only get a 24 h job link if you submitted at least a 1,000 jobs.
[Steven Timm] 12:51:08
Say Rosa is, let me consider. Okay, I don't have any answers for that, but something you should keep in mind when you're planning or non conventional resources.
[Steven Timm] 12:51:20
If you make sure you can get more stuff done
[Dirk] 12:51:23
I I think that's that's where that's one of the differences between the approaches and Targeting Hbc: But that's that's mostly affects Hbc because cloud cloud just allows you to schedule whatever you're paying for it.
[Dirk] 12:51:35
So they they don't
[Steven Timm] 12:51:38
Well, Lensium can go down any time right
[Dirk] 12:51:40
They can; but in practice, I mean, if they go down every 30 min, it it probably would become unusable for us, so we kind of rely on the fact that, in in in essence, even though what what in principle can go down every 30 min It doesn't actually happen all that often and we we cover
[Dirk] 12:52:00
whatever we make it an efficiency problem. Basically, I'll I'll fail your handling codes and Our software.
[Dirk] 12:52:06
Stack can deal with it, and it just becomes an efficiency issue that goes goes into the cost.
[Dirk] 12:52:10
Calculation. I think, if it gets it gets more complicated than that, it becomes really really problematic to use the resources, and I know that Atlas has the harvest the model in principle, you can survive.
[Dirk] 12:52:23
Like you can make use of of very short time windows.
[Dirk] 12:52:28
But we don't have that in Cms, and I'm not sure how effective that is for Atlas, either
[Fernando Harald Barreiro Megino] 12:52:46
check. Can you link on What do you do you think we should close this session?
[Dirk] 12:52:56
Yeah, it's only I mean, it's less than 10 min. There.
[Dirk] 12:52:59
There was some talk about maybe putting one on the talk early, but that's not enough time, and that would probably trigger discussion.
[Enrico Fermi Institute] 12:53:00
The
[Dirk] 12:53:07
So we can go with it first in the in the next session.
[Enrico Fermi Institute] 12:53:11
Yeah, I think the discussions we've been having less 10 or 15 min lead nicely into the R.
[Enrico Fermi Institute] 12:53:17
And D Presentation.
[Enrico Fermi Institute] 12:53:25
Maybe we we break here unless anybody has any other cloud topics that they want to bring up.
[Enrico Fermi Institute] 12:53:30
I think this is the last session that's focused exclusively on cloud
[Enrico Fermi Institute] 12:53:37
Yeah, in the next session. We'll talk about some R.
[Enrico Fermi Institute] 12:53:43
And D things, and and networking
[Enrico Fermi Institute] 12:53:53
Okay, So maybe we break here and we'll we'll see everybody at at one o'clock.