ANL slides


[Enrico Fermi Institute] 15:40:16
Yeah. So running a little late. Yes, yeah, let's let's move on Taylor.

[Enrico Fermi Institute] 15:40:24
Do you have slides for us

[Taylor Childers] 15:40:28
Yeah, I have a few slides

[Enrico Fermi Institute] 15:40:30
Okay, great.

[Taylor Childers] 15:40:36
Hey!

[Enrico Fermi Institute] 15:40:40
Right.

[Taylor Childers] 15:40:41
So Hi, this is a disclaimer. This is a disclaimer to make sure I don't do anything silly, but you know the point is this: my own outlook.

[Taylor Childers] 15:40:52
On the future. I I'm not presenting any inside information about Yeah, I I don't even know what's coming after.

[Taylor Childers] 15:41:00
Aurora. There are people at Argon that do, but not me.

[Enrico Fermi Institute] 15:41:06
But but Aurora is still coming right. That's

[Taylor Childers] 15:41:08
Yeah, if there's anything is real that Aurora is still coming. That's been the case for far too long.

[Enrico Fermi Institute] 15:41:21
still coming.

[Taylor Childers] 15:41:22
Yeah, it's still coming. Okay? So going back and updated this plot from a long time ago to provide provide a quick update where things are in the Us.

[Taylor Childers] 15:41:38
we've talked about this at length. At this point but I think it's also useful to look at it in the context of the Lhc.

[Taylor Childers] 15:41:48
Runs right By the time the high Lumi Lhc turns on, we're gonna be dealing with the machines.

[Taylor Childers] 15:41:54
We don't even know what they look like yet, and a lot can happen between now and then that can affect how those machines look.

[Taylor Childers] 15:42:05
So we now have frontier deployed, so the Us.

[Taylor Childers] 15:42:11
Has its first ex- scale machine. We'll have Aurora coming online by the end of the year, and the next generation, which machines, you know, like, I said, we don't know what those are everything that we have is.

[Taylor Childers] 15:42:26
Sort of intel Nvidia Amd. I would come expect these to follow similar trends amazingly because of politics of it.

[Taylor Childers] 15:42:37
All right. I mean, we're spending us taxpayer money, and they want that to go to us corporations.

[Taylor Childers] 15:42:44
so I expect those will stay static. But of course, the variation in combinations, you can already see, are quite large, so those can still change

[Taylor Childers] 15:43:00
just a quick Put that in perspective. So I included the Japanese recent machine that they deployed the European machines that are have been announced I'm pretty sure there that this was confirmed in andrea slide or the slides on the euro.

[Enrico Fermi Institute] 15:43:20
Yeah.

[Taylor Childers] 15:43:25
Apc. That there's gonna be one more X access.

[Taylor Childers] 15:43:29
Give a machine, announced. So we know Jupiter is coming, and the plan was all right always to have €2 Hpc excel machines before 25.

[Taylor Childers] 15:43:42
I include China on here in principle they already have 3 ex scale machines, and in 10 to have 10 by 2425.

[Taylor Childers] 15:43:52
That's their goal. There's no reason they can't do that.

[Taylor Childers] 15:43:55
They seem to be willing to burn as much coal as possible to keep these machines at the Exa scale.

[Taylor Childers] 15:44:02
as I understand it, this one is just a giant.

[Taylor Childers] 15:44:05
Oh, no! That Tiana 3 is a giant upgrade of the 2.

[Taylor Childers] 15:44:09
So it's just a bunch of cpus, and there is no energy budget there.

[Taylor Childers] 15:44:13
So it's you know, a hot machine. The interesting thing about all of these is that they have various architectures that are very different.

[Taylor Childers] 15:44:28
Europe, has gone heavy into arm and eventually will go into the risk.

[Taylor Childers] 15:44:33
V. As an open source, accelerator format.

[Taylor Childers] 15:44:37
they're also, you know, into the sovereign.

[Taylor Childers] 15:44:42
Technology is. Everybody wants to, You know, there's stuff built here.

[Taylor Childers] 15:44:47
so the Japanese are using fruitsu chips.

[Taylor Childers] 15:44:51
The Europeans are trying to design their own I wouldn't be surprised if the arm and the risky stuff changes in the year in you, because I know you know Intel has already announced they're gonna open some boundaries in Europe and I think that's kind of help their image in the

[Taylor Childers] 15:45:11
area, so we'll see

[Taylor Childers] 15:45:16
So just a quick that look at at the distribution of of architectures.

[Taylor Childers] 15:45:22
So I took the top. 500. I made the cut off, and had to be bigger than 10 Petaflops.

[Taylor Childers] 15:45:28
That leaves me at about 50 machines, and I just flaps with the architectures frontier, really heavily dominates this now, so you can see, you know, The Amd, cpus and gpus from an ex scale machine compared to Everyone else.

[Taylor Childers] 15:45:47
so you can see right now, you know, outside of frontier in videos, really dominating the accelerators, there's a nice distribution of of cpus, and then I went ahead.

[Taylor Childers] 15:46:03
To 26, and tried to do the same plot.

[Enrico Fermi Institute] 15:46:05
Okay.

[Taylor Childers] 15:46:10
For what I think is coming. So by 2026 Us.

[Taylor Childers] 15:46:16
And Europe will both have 2 X and scale machines, like said China will have up to 10.

[Taylor Childers] 15:46:20
I didn't include the Chinese in this number largely because I mean, I have no idea the technique technicalities of what they're going to be running.

[Taylor Childers] 15:46:32
You're up has at least put out a roadmap, so their goal is to be using these arms, and the risky accelerators.

[Taylor Childers] 15:46:41
So if I include those at sort of, you know, over an exaflop.

[Taylor Childers] 15:46:48
then you start seeing this distribution. So you see, there's arm amd intel on the Cpu side, and a Amd.

[Taylor Childers] 15:47:00
Intel, And then this is essentially that risk v processor?

[Taylor Childers] 15:47:04
So if the Europeans decide to move to Nvidia or Intel, or Amd.

[Taylor Childers] 15:47:11
This green blob here will shift so you can see the The variation is, you know, early equal.

[Taylor Childers] 15:47:24
So then there's specialty hardware. So the du is has always been strong at in partnering with industry.

[Taylor Childers] 15:47:33
We really like pushing collaborations with industry. Alcf.

[Taylor Childers] 15:47:40
Host, the Doe Ai Test band, and currently we have 5 machines that are all custom silicon that are designs for running large learning jobs And so we've been working with those developers testing out their software And whatnot there's definitely an interest in identifying one or

[Enrico Fermi Institute] 15:47:44
Okay.

[Taylor Childers] 15:48:04
2 that you know, scientists like best, and then moving along with maybe making those as side car side cards to some future supercomputer. Right?

[Taylor Childers] 15:48:18
So you could imagine having the, you know, a couple of racks of these specialized chips available to you, to run your your Ai much much faster than a traditional Gpu or Cpu the other thing I wanted to say moving forward i'm close by Dorothea my kids are coming home to

[Taylor Childers] 15:48:43
school. The other thing is I wanted to mention was, of course, Ai for science, and in the context of Ecp so many of you Will be familiar with Ecp: The ex scale computing project Yeah.

[Enrico Fermi Institute] 15:49:01
Cool.

[Taylor Childers] 15:49:01
Was a large funded project on the Oscar side that you know The last number I heard is in principle.

[Taylor Childers] 15:49:13
It funded about a 1,000 ftees across the and it was all geared toward preparing for ex scale machines.

[Taylor Childers] 15:49:24
now with the landing of our 2 access can systems, This project's going to be ramping down, and there's a lot of worked to figure out what's going to come next.

[Taylor Childers] 15:49:39
And it really looks like Ai, for science is the next big push, so they're already.

[Taylor Childers] 15:49:46
It's already been 2 years now worth of workshops.

[Taylor Childers] 15:49:50
on the Oscar side, where we are trying to lay out the green ground Work for what such a project would look like, and how it would be managed, and what its goals would be so I expect that in the next you know 5 years that this is gonna be sort of a dominating.

[Taylor Childers] 15:50:13
force, just like Ecp. Was so just something to be aware of.

[Enrico Fermi Institute] 15:50:15
Thank you.

[Taylor Childers] 15:50:19
I think that's going to have a big impact on it.

[Taylor Childers] 15:50:24
How our systems look Yeah, in this next round of deployments.

[Taylor Childers] 15:50:31
So? Are there any. So the takeaways, I would say, future of architecture, and hpc facilities is quite diverse.

[Taylor Childers] 15:50:40
I expected to remain so, There might be some custom hardware, but it will be very niche is what I expect for Ai, and you'll just be picking up tensorflow and pike torch and running your software the way You would anywhere else.

[Taylor Childers] 15:50:54
I would say the software implications There are the using portable frameworks will be a benefit, and of course, the more we can complain and and voice our are interest in a standard support theme through the C standard.

[Taylor Childers] 15:51:16
2 companies I think that you know it's a good thing, but until everyone supports something like Std.

[Taylor Childers] 15:51:23
Par out of C standard, you know, using these third party libraries like cocos and Sickle and Peca, are probably gonna be the best way to go for the moment let's see, current ex scale machines.

[Taylor Childers] 15:51:38
I were largely decided before Ai became a real focus.

[Taylor Childers] 15:51:43
And do we science? And I expect that to be a bigger driver for the next round of systems that are coming that might again, of course, with the end, is in the energy budgets and competitive nature of these machines will probably driving them in the direction accelerators again, but things?

[Taylor Childers] 15:52:07
Shift quickly. It's hard to predict. So yeah, that's where I I leave that

[Enrico Fermi Institute] 15:52:19
But Tara had a quick question. I think it's on slide 3 where he kinda made the pie charts of.

[Enrico Fermi Institute] 15:52:26
yeah, if if you would try to make a single pie chart right?

[Enrico Fermi Institute] 15:52:33
If it's the problem pie charts, you can't tell the relative size how much larger is the Gpu flops currently versus the the Cpu flop.

[Enrico Fermi Institute] 15:52:42
Is there? Is there a way to get a don't all to to a single one?

[Taylor Childers] 15:52:48
Yeah, I mean. So any system that has accelerators can be dominated right Last time I calculated that was like probably was Summit, and there was, you know, on the level, 5 to 10 with Cpu flops.

[Enrico Fermi Institute] 15:52:54
Yeah.

[Taylor Childers] 15:53:06
and it got even worse whenever I did. The calculation for frontier and Aurora.

[Taylor Childers] 15:53:13
But it's been a long time since I looked at those

[Enrico Fermi Institute] 15:53:17
So I guess the point is, if it was drawn to scale like the Gpu pie chart would be 10 times larger than the Cpu, or 5 times 10 times not not the same size right

[Taylor Childers] 15:53:23
That's right.

[Taylor Childers] 15:53:29
For sure, for sure.

[Enrico Fermi Institute] 15:53:32
And and you're timing in what? What is other of the Gps here?

[Taylor Childers] 15:53:36
So

[Enrico Fermi Institute] 15:53:38
Is that the

[Taylor Childers] 15:53:40
Yeah, So that would be in this case. That would be the fidget suit

[Enrico Fermi Institute] 15:53:47
Okay.

[Taylor Childers] 15:53:50
I can look back in my spreadsheet, too.

[Enrico Fermi Institute] 15:54:01
They probably also explains why Barb is a larger piece than Kelvin

[Taylor Childers] 15:54:06
Oh, no! Sorry. In this one. The other is the T. On a 2, which is on the 500, and if one of these it's this one

[Enrico Fermi Institute] 15:54:13
Okay.

[Enrico Fermi Institute] 15:54:21
Okay, if you told me that was a 386 ship, I'd also believe you.

[Enrico Fermi Institute] 15:54:26
So okay, So Taylor performance portability. So if they does, that mean if it is, decide on a system design, they make the Lcf.

[Enrico Fermi Institute] 15:54:39
Or whatever fun stuff makes sure that it's supported by the performance.

[Enrico Fermi Institute] 15:54:44
Portability, libraries.

[Taylor Childers] 15:54:46
Well, and I think that's the benefit of something like Co.

[Taylor Childers] 15:54:51
Coast, which is a really it's a third party, the support right?

[Taylor Childers] 15:54:55
So Cocos came out of the Ecp project, and I imagine we'll continue to be supported.

[Taylor Childers] 15:55:06
and since it's third party, they can just come in and write a new plugin for whatever you know New Orleans comes along, and so as long as you use it, you paying the benefit from that I was when we first got we first, we're working with intel and sickle I was

[Taylor Childers] 15:55:31
very skeptical of sickle I mean, I'm in general.

[Taylor Childers] 15:55:35
I'm so skeptical of especially telling scientists to invest their time in the solution that's being pushed by one of the manufacturers.

[Taylor Childers] 15:55:47
Right I mean Cuda is a mess as a You know, someone who came up in in the sciences writing code.

[Taylor Childers] 15:55:55
I would never wish anyone to write code in Cuda, and so I approach sickle in the same respect.

[Taylor Childers] 15:56:07
but I mean it's getting good performance and it allows you to write your code once, and so far we've been able to run it on all 3 systems.

[Taylor Childers] 15:56:17
We run it, at least with Matt Graph. We have a sickle implementation, and it runs on the Amds, the Intel, and the Nvidia Gpus without any problem, and does very well, and Cocoa is the same with and like you said the nice thing about those 2 is that you write

[Enrico Fermi Institute] 15:56:32
See.

[Taylor Childers] 15:56:37
your code once, but with cuda the coulda implementation of ad graph right now is a riddled with compiler pre-compiler if depths everywhere, because if you're not on a computer device you need to run the C and they you know, it just becomes really hard to

[Taylor Childers] 15:56:56
maintain for someone who's not the dedicated software

[Enrico Fermi Institute] 15:57:07
Still have to cover the Hpc. Cost, I would like to at least attempted it to go through the slide where you have to see.

[Enrico Fermi Institute] 15:57:14
Okay, there's too long. Eventually we might have to cut it off and move it to tomorrow or something.

[Enrico Fermi Institute] 15:57:18
Yeah, we could could start a little earlier tomorrow I don't know how people feel about that.

[Enrico Fermi Institute] 15:57:24
Yeah, thanks, Taylor. Appreciate it. So let's try to go to the Hpc.