- Landscape of workflows: Cloud

[Enrico Fermi Institute] 11:41:08
For the moment and to cloud, I think, and I know we'll go through these slides

[Fernando Harald Barreiro Megino] 11:41:14
hi, Yeah, So now it's the similar discussion. But for Cloud, what are the work that can be executed?

[Fernando Harald Barreiro Megino] 11:41:26
cloud resources and before getting there, what we have been mostly considering during our previous discussions for the blueprint process I like the Major Commercial Cloud Provider, like Google Amazon Microsoft, which are the ones that we have been really testing in the in the last couple of years.

[Fernando Harald Barreiro Megino] 11:41:45
and here all of these have different service levels, so they provide infrastructure as a service where you would run that machine install, and then whatever you want platform as a service for higher level software, as a service but nowadays, all of these clouds also have emerging intermediate levels in particular

[Fernando Harald Barreiro Megino] 11:42:05
container from service. Like, for example, coordinates were other versions of Kubernetes, like service container executions.

[Fernando Harald Barreiro Megino] 11:42:16
on these services along like cloud, native approaches, to integrate our experiment.

[Fernando Harald Barreiro Megino] 11:42:24
frameworks across the cloud providers, so that all of them look the same.

[Fernando Harald Barreiro Megino] 11:42:30
Yeah, And then the other thing that is the other cloud provider that this being lately is

[Fernando Harald Barreiro Megino] 11:42:47
I'm they differentiate themselves through in particular, like sustainability, and the usage of the renewable energy They are also much more affordable than Google, But they how are also not a full blown cloud they just have a limited services and also

[Fernando Harald Barreiro Megino] 11:43:09
reliability probably depending on on how much renewable energy at the moment.

[Fernando Harald Barreiro Megino] 11:43:17
And so Cms is trying to. I've integrated them once.

[Enrico Fermi Institute] 11:43:18
Okay.

[Fernando Harald Barreiro Megino] 11:43:20
I'm already for some simple tests. So next slide

[Fernando Harald Barreiro Megino] 11:43:29
so for for outlaws. Then, coming to the question, What are the hey guys that are possible to execute on the cloud?

[Fernando Harald Barreiro Megino] 11:43:39
So we are integrating lately. Clouds like independent, complete, completely independent, and self managed sites with a storage element, that also compute for that is integrated in Panda the most What we have the most experiences.

[Fernando Harald Barreiro Megino] 11:44:01
With. Google. And we started in middle of tonight to run.

[Fernando Harald Barreiro Megino] 11:44:05
similar to our Us. The tool, size, the cluster, and we are running now.

[Fernando Harald Barreiro Megino] 11:44:11
10,000 calls and currently are limited to production workloads.

[Fernando Harald Barreiro Megino] 11:44:17
But that's just because we are reorganizing the the storage behind it, and we plan to enable them now this is in a couple of weeks.

[Fernando Harald Barreiro Megino] 11:44:29
the one thing that maybe you want to control is the amount of E address to to to to bring down the cost.

[Fernando Harald Barreiro Megino] 11:44:39
if you want to do that, the obvious choices to do that, run simulation.

[Fernando Harald Barreiro Megino] 11:44:44
But we are also now starting to experiment with full chain, where where you run the full, all of the all of the tasks within.

[Fernando Harald Barreiro Megino] 11:44:55
we can the simulation, same or production thing, and we don't export the intermediate products.

[Fernando Harald Barreiro Megino] 11:45:01
But just with, I'm just in the in the plot I wanted to show is but depending on the workload that you are running, you're egress costs. Come by a lot, and that's why we motivate this trying to keep and then the other thing.

[Fernando Harald Barreiro Megino] 11:45:24
That we have been experimenting in, the cloud is our announced facility type of setups.

[Fernando Harald Barreiro Megino] 11:45:31
with elastic, scaling, so that we set up. I know this is facility with 2 bitter and tasks We keep the like the general components of the running on the cloud to a minimum and only scale, out and a lot of vms when they are requested by a user to

[Fernando Harald Barreiro Megino] 11:45:49
run a product does computation, and this is also a very suitable setup for for the cloud, because you just pay for the resources that you are using at the moment

[Fernando Harald Barreiro Megino] 11:46:03
then in the next slide

[Fernando Harald Barreiro Megino] 11:46:07
So this is the landscape of a close for the Cms.

[Fernando Harald Barreiro Megino] 11:46:12
I don't know if Keny wants to talk about it, or me, too.

[Fernando Harald Barreiro Megino] 11:46:16
Go through it

[Kenyi Paolo Hurtado Anampa] 11:46:18
yes, so in essence, back in 2,016, the Boston I haven't seen monthly.

[Kenyi Paolo Hurtado Anampa] 11:46:26
We done a little more to try different call providers to run production workloads, and at the what was done with the young gang team Did you record workloads, But basically shows that if we we can ron any kind of production workflows in the cloud and you can see

[Enrico Fermi Institute] 11:46:34
Okay.

[Kenyi Paolo Hurtado Anampa] 11:46:52
diagram. They're on the right, bye, and these are when the formula Facility wasn't standard.

[Kenyi Paolo Hurtado Anampa] 11:47:00
In order to get twice the number of resources that will be initially at from the global phone.

[Kenyi Paolo Hurtado Anampa] 11:47:06
So this is showing like a £150,000.

[Kenyi Paolo Hurtado Anampa] 11:47:11
hi! There! On top of the basically will be integrated the resources to kept out that that was also integrated will be gliding 3 as part of, and as as of today, we we we can't use it use this is that there is some work

[Enrico Fermi Institute] 11:47:34
Yeah.

[Kenyi Paolo Hurtado Anampa] 11:47:39
on to choose this. For example, specialized analysis workloads that depend on machine learning, inference.

[Kenyi Paolo Hurtado Anampa] 11:47:48
So there is some to at the

[Enrico Fermi Institute] 11:47:59
Okay.

[Kenyi Paolo Hurtado Anampa] 11:48:01
Utilize what gpus and to use drone different cloud providers.

[Kenyi Paolo Hurtado Anampa] 11:48:10
there is one in France, server, called treatons, that there is.

[Kenyi Paolo Hurtado Anampa] 11:48:18
There is that that was also integrated as part of Sonic, And do with that.

[Kenyi Paolo Hurtado Anampa] 11:48:25
You can. The third running analysis, the analysis pipeline, Both the machine learning springs through 3 times.

[Kenyi Paolo Hurtado Anampa] 11:48:37
cloud providers there, or give using cpus

[Enrico Fermi Institute] 11:48:41
I I can put some numbers in. I think they. They ran on 10,000 Cpu core.

[Enrico Fermi Institute] 11:48:50
There's 10,000 Cpu cores, and they rented a 100 Gpus and sped up the the workflow was running on the cpus by 10,%, so in that game you basically you invest a little bit in Gpus just speed up the calculation that runs on the on

[Enrico Fermi Institute] 11:49:05
the cpus, or third user 10,000, to how many? Gpus? 100, I mean, It's it's early. It's early work so hopefully, that ratio you can reduce that but that Was what they were testing

[Enrico Fermi Institute] 11:49:20
Okay? Or comments on landscape of cloud

[Enrico Fermi Institute] 11:49:30
Very much to bring out, otherwise we can move on to acquisition operation.


[Fernando Harald Barreiro Megino] 11:49:35
okay.

[Ian Fisk] 11:49:36
sorry I I have comments and it, and I I I needed. I was thought I was talking sorry.

[Ian Fisk] 11:49:42
This is Ian. So the general comment was, We have this issue about the egress charges which I've never, we don't ever seem to have as a solution, for, except not to export data.

[Enrico Fermi Institute] 11:49:43
Okay.

[Enrico Fermi Institute] 11:49:43
Okay, Got it.

[Steven Timm] 11:49:56
no, not so. There are agreements.

[Ian Fisk] 11:50:03
But the agreements are always things like it's if it's 15% of the billing charges, we won't like it.

[Ian Fisk] 11:50:09
There, there's there's ways to make it reduce.

[Ian Fisk] 11:50:11
But at fundamentally this is a This is a business practice that they do to lock, to do, vendor, lock, in, and they're not so.

[Ian Fisk] 11:50:19
Far at least, no one's been proposing to not do it.

[Ian Fisk] 11:50:21
And so we're always okay.

[Enrico Fermi Institute] 11:50:23
2 things: Lanceium does not have egress charges.

[Ian Fisk] 11:50:26
Okay.

[Enrico Fermi Institute] 11:50:27
So with the limitation that they we're still exploring and that's very early going.

[Steven Timm] 11:50:28
Pretty good.

[Enrico Fermi Institute] 11:50:32
But by design, at least what they're saying now. They don't charge egress.

[Ian Fisk] 11:50:37
Right.

[Enrico Fermi Institute] 11:50:38
And then, Fernando, you want to say something about this subscription.

[Enrico Fermi Institute] 11:50:41
What? That model is because I

[Fernando Harald Barreiro Megino] 11:50:43
I could to discuss that in the tomorrow during the Cloud session.

[Ian Fisk] 11:50:47
Okay.

[Fernando Harald Barreiro Megino] 11:50:48
But I mean, so basically the agreement. We have the with Google is it's a subscription agreement.

[Fernando Harald Barreiro Megino] 11:50:57
And that's basic that's like a flood rate.

[Fernando Harald Barreiro Megino] 11:51:00
You agree on a price on the amount of resources that are included.

[Fernando Harald Barreiro Megino] 11:51:03
I'm doing will not be touched. Like there is no meter on how much egress you have.

[Fernando Harald Barreiro Megino] 11:51:08
You would do, which is a a fixed price for your 15 months of

[Ian Fisk] 11:51:14
Yeah. Okay. So at the I guess the the question is, is the at the end of your 15 months, if you want to use the last month only to export your data and get out of the cloud that would be within the confines of the model is that a troops statement

[Fernando Harald Barreiro Megino] 11:51:32
was in.

[Fernando Harald Barreiro Megino] 11:51:33
Was in. As you are running jobs, the output is always exported, and that's what the we are always running The the eagles cost

[Ian Fisk] 11:51:40
Okay, Alright: Okay, it's it's I guess my my point is this is this is this is a fundamental problem, which is that we we can only use the essentially the cloud with a lot. Like Hpc: except that with Hpc: we propose for the data

[Enrico Fermi Institute] 11:51:42
Yeah.

[Steven Timm] 11:51:51
Yeah.

[Enrico Fermi Institute] 11:52:01
Good. Yeah, I mean, what it comes, I mean. But my opinion on the cloud is that the workforce, selection, and capabilities is not the issue yeah, because we we can do anything we want on the cloud it's just the machine you rent the question comes down How what's the cost?

[Steven Timm] 11:52:22
Great Great. Well, this one, you

[Enrico Fermi Institute] 11:52:23
And How do they structure the pricing price? What they want? 2 and 2 allowed to do, and in what way?

[Enrico Fermi Institute] 11:52:29
What are illustrations.

[Ian Fisk] 11:52:30
And and there's one and the other thing is my other point I want to make was one of the fundamental differences between sort of Hbc.

[Ian Fisk] 11:52:37
And cloud is that Hbc. Relies almost exclusively at at the leadership class on accelerated Gpu style.

[Ian Fisk] 11:52:44
Hardware, and that's and it's not the client.

[Ian Fisk] 11:52:48
Don't have them but that's the most expensive elements on the cloud, and it's because they depreciate so fast that the cloud providers need to recoup that cost in more in a shorter period of time.

[Ian Fisk] 11:52:59
They do for Cpu. Can you find that the the economics of the Gpu and the Cpu are different on the cloud

[Enrico Fermi Institute] 11:53:09
It's also structural. I I No, I'll I'll leave that comment because we do have the cloud focus there tomorrow.

[Ian Fisk] 11:53:15
Okay, right

[Enrico Fermi Institute] 11:53:15
We should not try to have all the discussions now let's have a comment for me.

[Enrico Fermi Institute] 11:53:20
Honest.

[Johannes Elmsheuser] 11:53:22
yeah, can just to follow up on on the egress right?

[Johannes Elmsheuser] 11:53:26
And so if you go one slide back to slide 11, right Fenando has a little bit of for breakdown there.

[Johannes Elmsheuser] 11:53:34
of the different costs, right and and it's always, I think, some fear, that egress is really humongous compared to what else right, but from what we are seeing, we to running, for example, on adwords.

[Johannes Elmsheuser] 11:53:47
And there doing physics, validation that the egress is not the overall driver here, unless you do really crazy stuff right?

[Johannes Elmsheuser] 11:53:57
So when you have a regular simulation task, egress is not dominant, and it's really the Cpu.

[Johannes Elmsheuser] 11:54:03
that you are scaling up, that is driving the cost.

[Johannes Elmsheuser] 11:54:06
Here it is obviously something that you are using with egress on top us.

[Johannes Elmsheuser] 11:54:13
You have to pay compared to Hpc: That's that's no no discussion here.

[Johannes Elmsheuser] 11:54:17
But it's also not humongous when when you compare everything in and have to fold everything in here right? I just want to make that statement, and I I think, we can discuss this in more detail than later in the dedicated cloud session

[Ian Fisk] 11:54:29
I would claim that I would claim that it was not humongous as long as you're in a very structured environment.

[Ian Fisk] 11:54:35
And you are. Be acting a predictable way that the date will be up to analysis, like, at least for us.

[Johannes Elmsheuser] 11:54:38
Yeah.

[Ian Fisk] 11:54:41
We had a user, So browse some data that we weren't expecting and ran up at $75,000 export bill in a month.

[Johannes Elmsheuser] 11:54:50
do I sure I mean that that is then the way how you structure your workflows Absolutely. I I fully agree.

[Johannes Elmsheuser] 11:54:57
So, if if you have an agreed workflow there, and here we we are showing production that that's totally clear, right? And you don't want to have the surprises from some unstructured use, analysis, fully agreed

[Enrico Fermi Institute] 11:55:14
Is there a comment from Paul

[Paolo Calafiura (he)] 11:55:16
yes, I mean I I I I feel I'm becoming and becoming like a broken record.

[Paolo Calafiura (he)] 11:55:25
But once again I think this slide shows you the benefits of committing versus versus taking, You know, a handslink approach.

[Paolo Calafiura (he)] 11:55:34
So We have always said that the the cloud is a great way to, you know, to do.

[Paolo Calafiura (he)] 11:55:39
Excel computeing, like the slide at the bottom kind of suggests, you know, without, you know, when we need something for doing analysis, we will use it.

[Enrico Fermi Institute] 11:55:39
Hmm.

[Paolo Calafiura (he)] 11:55:49
And then our our loads will be will be elastic, and that's what's expensive.

[Paolo Calafiura (he)] 11:55:55
But what? Of course, Once again take the point of view of the band, or one of the vendor ones. Yes, and they want to to lock you in and lock you in, and not necessarily with some evil evil of mechanism, but just by offering you a good subscription deal so that you take some of the money, that

[Enrico Fermi Institute] 11:55:55
Okay.

[Enrico Fermi Institute] 11:56:11
Yeah.

[Paolo Calafiura (he)] 11:56:13
you otherwise would spend on your own. Hard, do it, and give it to them, That's and so there is a lot.

[Paolo Calafiura (he)] 11:56:19
There is a lock in there, because, of course, the price is constant for 12 months or 50 months, but it can change from one year to the next, and it will be as if, as it should so, you you are logged in because, then you don't have anymore, let's say all of your pr one or

[Paolo Calafiura (he)] 11:56:38
tier, 2 hardware, and then you are locked in with them.

[Enrico Fermi Institute] 11:56:44
kosher

[Kaushik De] 11:56:47
yeah, coming back to the other point, I'm sure it will be discussed tomorrow.

[Kaushik De] 11:56:55
During the dedicated session also, but since it came up, the issue of hydrogen at the in the cloud the heterogeneity is actually extremely useful and extremely good.

[Kaushik De] 11:57:09
In the cloud we are using both Amazon and Google for studies with Fpgas with arm, with Gpus, and there is no in setting up those resources because they're already available in the Cloud so I think the usefulness of highly specialized

[Kaushik De] 11:57:41
hardware at minimal minimal cost, because we don't pay for setting them up in the cloud.

[Kaushik De] 11:57:47
They're already there, but we can go in there, and we can use them, and that is an enormous resource for experiments, because I mean, if we had to set up our own Fpga Farm or arm, phone or or Gpu farm in order to I do some of the studies

[Kaushik De] 11:58:03
it Yeah, be private differently expensive.

[Ian Fisk] 11:58:07
right, and and and I didn't mean to imply that there wasn't real value in the diversity of resources on the cloud.

[Ian Fisk] 11:58:14
I was only commenting that at the production scales that we can can become very expensive

[Enrico Fermi Institute] 11:58:25
Are coming from Fernando

[Fernando Harald Barreiro Megino] 11:58:27
yeah, it's that question. And so again, now about the egress cost.

[Fernando Harald Barreiro Megino] 11:58:34
So. There is always so legend that if there is appearing between, let's say I thought, provide on.

[Fernando Harald Barreiro Megino] 11:58:43
For example, Yes, net. You can bring down the egos cost, and I wanted to ask if that's really true, or just something that we had.

[Fernando Harald Barreiro Megino] 11:58:55
But no one really about it

[Enrico Fermi Institute] 11:59:01
Okay, I think we're gonna definitely have some dedicated time to to talk about that on on Wednesday.

[Enrico Fermi Institute] 11:59:07
I know Dale is gonna have a slide or 2 for us, and and maybe we move that question to Wednesday specifically, unless somebody wants to jump in right now.

[Fernando Harald Barreiro Megino] 11:59:17
okay.

[Enrico Fermi Institute] 11:59:21
comment from what it

[Alexei Klimentov] 11:59:22
okay, So my comment is related to comments from in and Paulo, where different comments.

[Alexei Klimentov] 11:59:30
So I can disagree, but we use clouds as Hpcs, so we use clouds on completely different ways.

[Alexei Klimentov] 11:59:40
This whole idea to try close. Is that what was written on this slide that we can elastically scaling resources.

[Alexei Klimentov] 11:59:48
So we can have this difference of resources, and we can build our own architecture at least.

[Enrico Fermi Institute] 11:59:50
Yeah.

[Enrico Fermi Institute] 11:59:53
Excellent.

[Alexei Klimentov] 11:59:57
But Greek. What we have, if especially with Lca.

[Alexei Klimentov] 12:00:02
Then you have boundary conditions. When this machine was built, as it was mentioned correctly, not for Hp.

[Alexei Klimentov] 12:00:09
But for some our domains, and for Paul or my colleague, we have a cloud.

[Alexei Klimentov] 12:00:19
What we have is many years of all experience. I don't think it is the right way to mirror.

[Alexei Klimentov] 12:00:26
our understanding of commercial companies to what we are doing with calls right now, so certainly they want to make money.

[Alexei Klimentov] 12:00:34
But we're not so stupid, and we are not so stupid to stop our tier.

[Alexei Klimentov] 12:00:38
2, and to use just calls, and the whole idea of the 15 months project of bottles is just to learn it better.

[Alexei Klimentov] 12:00:47
So I think we are on very early stage with clouds and understanding that you know cost model, and how it can be integrated with our agreed model.

[Enrico Fermi Institute] 12:00:50
Okay.

[Alexei Klimentov] 12:00:59
With my 2 comments