## Profiling ATLAS CPU architectures

Jyoti Prakash Biswal Rutherford Appleton Laboratory

### GridPP49 & SWIFT-HEP05

The Cosener's House, Abingdon

29 March 2023



- Since 2007, the CPUs on the grid have some baseline set of features.
- Anything newer is not used except via a special library, *e.g.*, **Intel Math Library function multi-versioning**.
- Therefore, the ability to run vectorised codes are passed over, not recognising the speed boost from the latest features.

 $\Rightarrow$  Familiarity with the CPU architectures on the grid should pave the way for more effective use of computing resources.

• This study aims to gather specifics on CPU architectures via CPU flags using information from the jobs running on the grid.

- By and large, ATLAS codes are dominated by basic maths, C++ STL operations, and memory allocation/deallocation.
- These codes are compiled for this baseline because if compiled at a higher level and the CPU feature is unavailable, then athena will crash with "illegal instruction."
- The compiler can do auto-vectorisation on appropriate loops if the relevant architecture is enabled.
  - The baseline compiler can do some auto-vectorisation, but the more recent CPUs allow more parallel operations.
- Some codes will figure out at runtime what is available!
- In reality, there is no information on how much of the grid is which kind of CPU architecture either in terms of which grid sites have them or how important they are.
- There are potentially significant benefits that could be gained with compiler optimisation if they are allowed to use more vectorisation.

- CPU features:
  - Define a number of different processor attributes, *e.g.*, the presence of a floating-point unit (FPU).
  - Reflect on CPU operations at the current time; architecture dependent.
  - Description of CPU features: <u>link-1</u>; <u>link-2</u>.
- CPU instructions:
  - Patterns of bits, digits, or characters that correspond to machine commands.
  - The instruction set is specific to a class of processors using (mostly) the same architecture.
  - Successor or derivative processor designs often include instructions of a predecessor and may add new additional instructions.
- CPU architectures:
  - System design: physical computer system all hardware parts of a computer.
  - Instruction set architecture (ISA): the functions and capabilities of the CPU; what programming it can perform or process.
  - *Microarchitecture*: computer organisation; defines the data processing and storage element and how they should be implemented into the ISA.
    - Four variations: x86-64-v1, x86-64-v2, x86-64-v3, and x86-64-v4.

## CPU: features, instructions, and architectures

• References: <u>Wiki</u>; Application Binary Interface (ABI).

| CP           | CPU microarchitecture levels |              |  |  |  |  |
|--------------|------------------------------|--------------|--|--|--|--|
| Architecture | Features                     | Example      |  |  |  |  |
|              |                              | instructions |  |  |  |  |
|              | CMOV                         | cmov         |  |  |  |  |
|              | CX8                          | cmpxchg8b    |  |  |  |  |
|              | FPU                          | fld          |  |  |  |  |
|              | FXSR                         | fxsave       |  |  |  |  |
| x86-64-v1    | MMX                          | emms         |  |  |  |  |
|              | OSFXSR                       | fxsave       |  |  |  |  |
|              | SCE                          | syscall      |  |  |  |  |
|              | SSE                          | cvtss2si     |  |  |  |  |
|              | SSE2                         | cvtpi2pd     |  |  |  |  |
|              | CMPXCHG16B                   | cmpxchg16b   |  |  |  |  |
|              | LAHF-SAHF                    | lahf         |  |  |  |  |
|              | POPCNT                       | popcnt       |  |  |  |  |
| x86-64-v2    | SSE3                         | addsubpd     |  |  |  |  |
|              | SSE4_1                       | blendpd      |  |  |  |  |
|              | SSE4_2                       | pcmpestri    |  |  |  |  |
|              | SSSE3                        | phaddd       |  |  |  |  |
|              | AVX                          | vzeroall     |  |  |  |  |
|              | AVX2                         | vpermd       |  |  |  |  |
|              | BMI1                         | andn         |  |  |  |  |
|              | BMI2                         | bzhi         |  |  |  |  |
| x86-64-v3    | F16C                         | vcvtph2ps    |  |  |  |  |
|              | FMA                          | vfmadd132pd  |  |  |  |  |
|              | LZCNT                        | lzcnt        |  |  |  |  |
|              | MOVBE                        | movbe        |  |  |  |  |
|              | OSXSAVE                      | ×getbv       |  |  |  |  |
|              | AVX512F                      | kmovw        |  |  |  |  |
|              | AVX512BW                     | vdbpsadbw    |  |  |  |  |
| x86-64-v4    | AVX512CD                     | vplzcntd     |  |  |  |  |
|              | AVX512DQ                     | vpmullq      |  |  |  |  |
|              | AVX512VL                     |              |  |  |  |  |

### A case in point: vectorisation

- A key tool to improve performance on modern CPUs.
- Converts an algorithm from operating on a single value at a time to operating on a set of values simultaneously.
- Modern CPUs provide direct support for vector operations where a single instruction is applied to multiple data (SIMD).
- Advantages:
  - A 512-bit CPU could hold 16 32-bit single precision doubles and do a single calculation.
    - $\Rightarrow$  16 times faster than executing a single instruction at a time.
    - $\Rightarrow$  Combination with threading and multi-core CPUs leads to enormous performance gains.
  - The individual vector (array) elements are added in sequence in a serial calculation.



[Scalar mode: unused additional spaces in CPU.]

[Vector mode]

#### Vectorisation means optimising the algorithm to utilise SIMD instructions in the processors.

#### Instruction: **SSE4** (*Streaming SIMD Extensions 4*)

- Architecture: x86-64-v2
- Processor: 128-bit
- Simultaneous operations on: four 32-bit single-precision floating point numbers / two 64-bit double-precision floating point numbers.

#### Instruction: AVX2 (Advanced Vector Extensions 2)

- Architecture: x86-64-v3
- Processor: 256-bit
- Simultaneous operations on: eight 32-bit single-precision floating point numbers / four 64-bit double-precision floating point numbers.

#### Instruction: AVX512 (Advanced Vector Extensions 512)

- Architecture: x86-64-v4
- Processor: 512-bit
- Simultaneous operations on: sixteen 32-bit single-precision floating point numbers / eight 64-bit double-precision floating point numbers.

#### AVX512 $\sim$ 2×AVX2; AVX2 $\sim$ 2×SSE4.

## Impact of newer CPU features (a recent study)

- Speeding up Madgraph5\_aMC@NLO through data parallelism: CPU vectorisation (A. Valassi's talk).
- On CPUs, in vectorised C++, the maximum x8/x16 (double/float) SIMD speedup is reached for Matrix Elements (MEs) alone.
  - The speedups achieved for the overall workflow are slightly lower due to Amdahl's law, but not much.
  - e.g., current overall speedup is x6/x10 (double/float) for  $gg \rightarrow t\bar{t}gg$  on one CPU core.

|                                                            |                             |                | ACAT2022                                                    | madevent                                           |                                       | standalone       |
|------------------------------------------------------------|-----------------------------|----------------|-------------------------------------------------------------|----------------------------------------------------|---------------------------------------|------------------|
|                                                            | $gg \rightarrow t\bar{t}gg$ | MEs            | $t_{\text{TOT}} = t_{\text{Mad}} + t_{\text{MEs}}$<br>[sec] | $N_{\text{events}}/t_{\text{TOT}}$<br>[events/sec] | N <sub>events</sub> /<br>[MEs/s       |                  |
|                                                            | Fortran(scalar)             | double         | 37.3 = 1.7 + 35.6                                           | 2.20E3 (=1.0)                                      | 2.30E3 (=1.0)                         | _                |
|                                                            | C++/none(scalar)            | double         | 37.8 = 1.7 + 36.0                                           | 2.17E3 (x1.0)                                      | 2.28E3 (x1.0)                         | 2.37E3           |
|                                                            | C++/sse4(128-bit)           | double         | 19.4 = 1.7 + 17.8                                           | 4.22E3 (x1.9)                                      | 4.62E3 (x2.0)                         | 4.75E3           |
|                                                            | C++/avx2(256-bit)           | double         | 9.5 = 1.7 + 7.8                                             | 8.63E3 (x3.9)                                      | 1.05E4 (x4.6)                         | 1.09E4           |
| 512y = AVX512, ymm registers                               | C++/512y(256-bit)           | double         | 8.9 = 1.8 + 7.1                                             | 9.29E3 (x4.2)                                      | 1.16E4 (x5.0)                         | 1.20E4           |
| 512z = AVX512, zmm registers                               | C++/512z(512-bit)           | double         | 6.1 = 1.8 + 4.3                                             | 1.35E4 (x6.1)                                      | 1.91E4 (x8.3)                         | 2.06E4           |
| The latter is only better on<br>nodes with 2 FMA units     | C++/none(scalar)            | float          | 36.6 = 1.8 + 34.9                                           | 2.24E3 (x1.0)                                      | 2.35E3 (x1.0)                         | 2.45E3           |
| (here an Intel Gold 6148)                                  | C++/sse4(128-bit)           | float          | 10.6 = 1.7 + 8.9                                            | 7.76E3 (x3.6)                                      | 9.28E3 (x4.1)                         | 9.21E3           |
|                                                            | C++/avx2(256-bit)           | float          | 5.7 = 1.8 + 3.9                                             | 1.44E4 (x6.6)                                      | 2.09E4 (x9.1)                         | 2.13E4           |
|                                                            | C++/512y (256-bit)          | float          | 5.3 = 1.8 + 3.6                                             | 1.54E4 (x7.0)                                      | 2.30E4 x10.0)                         | 2.43E4           |
| FLOAT                                                      | C++/512z 512-bit)           | float          | 3.9 = 1.8 + 2.1                                             | 2.10E4 (x9.6)                                      | 3.92E4 x17.1)                         | 3.77E4           |
| Scalar DOUBLE                                              |                             |                |                                                             |                                                    |                                       |                  |
| SSE4 FLOAT FLOAT FLOAT FLOAT                               |                             |                | edup ~ x8 (doub                                             |                                                    | · · · · · · · · · · · · · · · · · · · |                  |
| DOUBLE DOUBLE<br>FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT |                             |                | ngine reaches th                                            |                                                    |                                       |                  |
| AVX2 DOUBLE DOUBLE DOUBLE                                  | DOUBLE                      | rall spee      | dup so far~ x6 (                                            |                                                    | · · ·                                 | r scalar Fortrai |
| AVX512 FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT FLOAT           | DAT FLOAT FLOAT FLOAT FLOAT | FLOAT FLOAT FL | UAT FLUAT FLUAT FLUAT                                       | Amdahl's law                                       | 0                                     |                  |
| DOUBLE DOUBLE DOUBLE                                       | DOUBLE DOUBLE               | DOUBLE         | DOUBLE DOUBLE                                               |                                                    |                                       |                  |

Jyoti Prakash Biswal (RAL)

ATLAS CPUs

- CPU flags: CPU features + CPU instructions.
  - Utilised in this study for making the decision on CPU architectures.
- The flags are obtained via -
  - /proc/cpuinfo →

e.g., flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant\_tsc rep\_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4\_1 sse4\_2 x2apic movbe popcnt tsc\_deadline\_timer aes xave avx f16c rdrand hypervisor lahf\_lm abm 3dnowprefetch invpcid\_single ssbd rsb\_ctxsw ibrs ibpb stibp fsgsbase tsc\_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xaveopt arat md clear spec ctrl intel stibp

- The same info is also stored in the pilot logs.
- The pilot logs for all jobs are accessible via Big PanDA as well as apfmon Cloud.
  - There is a specific period through which the pilot logs are available on <u>Big PanDA</u>; the best approach is to download those.
- Also, kibana has quite some statistics available (sans CPU flags as of now).
- Historical data obtained from Big PanDA and kibana are made use of for this work.
- A Python script is designed for the purpose.

## Procedure to decide on CPU architectures

- Deciding on the definition of CPU architectures based on CPU flags is not trivial.
- The following recommendations on CPU flags were considered before making a decision:
  - Naive/simplified:
    - SSE4\_2.\*  $\rightarrow$  x86-64-v2.
    - AVX2.\*  $\rightarrow$  x86-64-v3.
    - AVX512.\*  $\rightarrow$  x86-64-v4.
  - GNU Compiler Collection (GCC): <u>x86-64-v2</u>, <u>x86-64-v3</u>, <u>x86-64-v4</u>.

#### • The finalised one:

- Modified GCC lists *i.e.*, LAHF\_SAHF  $\longrightarrow$  LAHF\_LM; LZCNT  $\longrightarrow$  ABM; removal of SSE3. *i.e.*,
  - x86-64-v2 = [ MMX, SSE, SSE2, LAHF\_LM, POPCNT, SSE4\_1, SSE4\_2, SSSE3 ]
  - x86-64-v3 = [ MMX, SSE, SSE2, LAHF LM, POPCNT, SSE4\_1, SSE4\_2, SSSE3, AVX, AVX2, F16C, FMA, ABM, MOVBE, XSAVE ]
  - x86-64-v4 = [ MMX, SSE, SSE2, LAHF LM, POPCNT, SSE4 1, SSE4 2, SSSE3, AVX, AVX2, F16C, FMA, ABM, MOVBE, XSAVE, AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL]
- AMD CPUs do not qualify for x86-64-v4 under the above criteria!
- CPUs w/ ARM, High Performance Computing (HPC) are NOT examined.
- Presently, the results are the same as that of naive/simplified criteria.

# CPU popularity architecture-wise (all grid sites)

 Metric: ∑(HS06\_day) ⇒ sum of HEP-SPEC06 per day. HEP-SPEC06: the HEP-wide benchmark for measuring CPU performance. Duration: January-December 2022 (yearly). [Numbers are in the backup]



The order of dominance: x86-64-v3, x86-64-v4, x86-64-v2, x86-64-v1.

| Jyoti Prakash Biswal (RAL) | ATLAS CPUs | 29 March 2023 | 11 / 24 |
|----------------------------|------------|---------------|---------|

# CPU popularity architecture-wise (all grid sites)

Metric: ∑(HS06\_day).
Duration: January-December 2022 (monthly). [Numbers are in the backup]



A clear and steady trend of  $\times$ 86-64-v3 dominance followed by  $\times$ 86-64-v4,  $\times$ 86-64-v2, and  $\times$ 86-64-v1.

# CPU popularity architecture-wise (UKI sites only)

 Metric: ∑(HS06\_day). Durations: January 2022 and January 2023. [Numbers are in the backup]



In January 2022, the dominant one is x86-64-v3, but during January 2023, it's x86-64-v4.

# Intel and AMD CPUs (all grid sites)

• Metric:  $\sum$ (HS06\_day) [top] and  $\sum$ (HS06\_day) share [bottom]. Duration: January-December 2022 (yearly). [Numbers are in the backup]





# Intel and AMD CPUs (UKI sites only)

 Metric: \(\L2164(HS06\_day)\) [top] and \(\L2164(HS06\_day)\) share [bottom]. Durations: January 2022 (top/bottom-left) and January 2023 (top/bottom-right). [Numbers are in the backup]









Jyoti Prakash Biswal (RAL)

ATLAS CPUs

29 March 2023 15 / 24

# CPU popularity over one year (kibana tree map)

- Metric:  $\sum$ (HS06\_day).
- Duration: January-December 2022 (yearly).
- Search link; top 200 CPUs.

| s+AMD EPYC<br>7452 32-Core<br>Processor 512<br>KB+AVX2 <b>6.29%</b> | s+Intel(R)<br>Xeon(R) Gold<br>5320 CPU @<br>2.20GHz 39936<br>KB+AVX2 <b>3.66%</b> | s+AMD EP\<br>7702P 64-C<br>Processor 5<br>KB+AVX2 3                                                                                           | ore<br>12<br>. <b>12%</b>                                          | 7402                 |                                                                          | 2 F            | HAMD EF<br>351 16-C<br>Processor<br>B+AVX2<br>2.46%                       | ore             | s+Inte<br>Xeon(<br>6148 (<br>2.40G<br>KB+A<br>2.38%        | Ř) Ġo<br>CPU (<br>Hz 28<br>VX2 | old 2<br>@ E<br>8160 2                             | E5-26       | (Ř) ČPU<br>80 v3 @<br>Hz 3072<br>VX2                              | 20 2<br>20 3<br>K    | +Inteli<br>eon(F<br>5-268<br>.40GH<br>5840<br>B+AV<br>.11% | i) CPU<br>0 v4 @<br>Iz                      |
|---------------------------------------------------------------------|-----------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|----------------------|--------------------------------------------------------------------------|----------------|---------------------------------------------------------------------------|-----------------|------------------------------------------------------------|--------------------------------|----------------------------------------------------|-------------|-------------------------------------------------------------------|----------------------|------------------------------------------------------------|---------------------------------------------|
|                                                                     | s+Intel(R)<br>Xeon(R) CPU<br>E5-2640 v4 @<br>2.40GHz 25600                        | s+Intel(R)<br>Xeon(R) Silver<br>4114 CPU @<br>2.20GHz<br>14080<br>KB+AVX2<br>2.2027                                                           | s+Intel(F<br>Xeon(R)<br>E5-2683<br>2.10GHz<br>KB+AVX               | CPU<br>v4 @<br>40960 | s+Intel(R)<br>Xeon(R) Go<br>6130 CPU @<br>2.10GHz 22<br>KB+AVX2<br>1.59% | ld<br>⊮<br>528 | s+Intel Xeon<br>Processor<br>(Skylake,<br>IBRS) 16384<br>KB+AVX2<br>1.52% | Xe<br>E5<br>2.6 | Intel(R)<br>on(R) C<br>-2650 v<br>0GHz<br>480 KB<br>480 KB | PU X<br>2@ 2                   | +Intel(R)<br>5-2640 v<br>60GHz 2<br>8+AVX2<br>.46% | 3@<br>20480 | s+Intel(R)<br>Xeon(R) i<br>E5-2697<br>2.30GHz<br>KB+AVX2<br>1.45% | CPU<br>v4 @<br>46080 | CPU 7                                                      | ((R)<br>Phi(TM)<br>/250 @<br>Hz 1024<br>VX2 |
| s+AMD EPYC<br>7302 16-Core<br>Processor 512<br>KB+AVX2 <b>4.99%</b> | KB+AVX2 <b>3.52%</b>                                                              | s+Intel(R) E5-2695<br>Xeon(R) Gold 6252 CPU @<br>2.10GHz 36608<br>KB+AVX2 Xeon(R)<br>1.92% 6150 CF<br>2.70GHz 5-1616 (R)<br>5+Intel(R) KB+AVX | s+Intel(R)<br>Xeon(R) (<br>E5-2695<br>2.10GHz<br>KB+AVX2<br>1.41%  | /4 @<br>46080        |                                                                          | 7443 2         | D EPYC<br>24-Core<br>ssor 512<br>/X2                                      |                 |                                                            |                                |                                                    |             |                                                                   |                      |                                                            |                                             |
|                                                                     | s+AMD EPYC<br>7702 64-Core<br>Processor 512<br>KB+AVX2 <b>3.37%</b>               |                                                                                                                                               | s+Intel(R)<br>Xeon(R) 0<br>6150 CPU<br>2.70GHz<br>KB+AVX2<br>1.34% | Gold<br>J @<br>25344 |                                                                          |                |                                                                           |                 |                                                            |                                |                                                    |             |                                                                   |                      |                                                            |                                             |
| s+Intel(R) Xeon(R)<br>Platinum 8160<br>CPU @ 2.10GHz                | s+AMD EPYC<br>7742 64-Core                                                        | Xeon(R) CPU<br>X5650 @<br>2.67GHz<br>12288 KB<br>1.79%                                                                                        | s+Intel(R)<br>Xeon(R)<br>E5-2630<br>2.20GHz<br>KB+AVX2             | CPU<br>v4 @<br>25600 |                                                                          |                |                                                                           |                 |                                                            |                                |                                                    |             |                                                                   |                      |                                                            |                                             |
| 33792 KB+AVX2<br><b>4.58%</b>                                       | Processor 512<br>KB+AVX2 <b>3.37%</b>                                             | s+Intel(R)<br>Xeon(R) CPU<br>E5-2690 v4 @<br>2.60GHz 35840<br>KB+AVX2<br>1.69%                                                                | 1.31%                                                              |                      |                                                                          |                |                                                                           |                 |                                                            |                                |                                                    |             |                                                                   |                      |                                                            |                                             |

#### AMD usage is concentrated over fewer different models.

#### Wrap-up

- On the grid, x86-64-v3 ( $\sim$  60%) is the most popular CPU architecture, followed by x86-64-v4 ( $\sim$  30%).
  - x86-64-v1's presence is negligible (< 0.5%).
  - x86-64-v2 (  $\sim$  10%) is somewhere in the middle!
- UKI sites seem to be shifting towards x86-64-v4 (starting 2023)!
- Among the Intel CPUs, x86-64-v4 is beginning to dominate.
- > 90% of AMD CPUs are x86-64-v3.
  - Had there been a list of flags from AMD corresponding to x86-64-v4, it would be conducive in re-defining the architectures.
- HS06\_day share is Intel-dominated  $\sim \frac{2}{3}$ rd.
- Ways to improve this study further:
  - Individual site-wise analysis.
  - National grid-wise analysis.
  - A catalogue of different Intel and AMD models.

#### Record of architectures and flags will be obtainable for all future jobs on kibana!

- What happens if a certain architecture, e.g., x86-64-v1 is completely dropped from the grid?
- Is it viable to have architecture-specific sites? *E.g.*, X site has at least x86-64-v3 CPUs.
- GPUs on the grid may face similar challenges.
- Is there a way to match types of jobs to particular architectures?
- Next steps:
  - Investigation on non-x86-64 architectures, e.g., ARM CPUs.
  - The Python script will eventually be made available on CernVM-File System.
- There is enough data to discuss and propose the architectures ATLAS could require on the grid.

# Backup

| All grid sites; January-December 2022 (yearly) |                       |  |  |  |  |
|------------------------------------------------|-----------------------|--|--|--|--|
| CPU architecture                               | HS06_day fraction [%] |  |  |  |  |
| ×86-64-v1                                      | 0.44                  |  |  |  |  |
| ×86-64-v2                                      | 9.76                  |  |  |  |  |
| ×86-64-v3                                      | 59.77                 |  |  |  |  |
| ×86-64-v4                                      | 30.03                 |  |  |  |  |

| All grid sites; January-December 2022 (monthly); HS06_day fraction [%] |           |           |           |           |  |  |
|------------------------------------------------------------------------|-----------|-----------|-----------|-----------|--|--|
| Month-Year                                                             | ×86-64-v1 | x86-64-v2 | x86-64-v3 | ×86-64-v4 |  |  |
| January-2022                                                           | 0.90      | 14.06     | 57.34     | 27.70     |  |  |
| February-2022                                                          | 0.77      | 12.50     | 57.23     | 29.50     |  |  |
| March-2022                                                             | 0.70      | 12.65     | 53.82     | 32.83     |  |  |
| April-2022                                                             | 0.74      | 10.14     | 58.78     | 30.34     |  |  |
| May-2022                                                               | 0.49      | 10.15     | 58.75     | 30.61     |  |  |
| June-2022                                                              | 0.53      | 10.18     | 59.88     | 29.41     |  |  |
| July-2022                                                              | 0.30      | 9.26      | 61.00     | 29.44     |  |  |
| August-2022                                                            | 0.32      | 9.81      | 61.38     | 28.49     |  |  |
| September-2022                                                         | 0.37      | 8.58      | 61.82     | 29.23     |  |  |
| October-2022                                                           | 0.29      | 8.14      | 60.37     | 31.20     |  |  |
| November-2022                                                          | 0.10      | 7.14      | 62.79     | 29.97     |  |  |
| December-2022                                                          | 0.09      | 6.66      | 61.47     | 31.78     |  |  |

| UKI sites; the same month after one year; HS06_day fraction [%] |              |              |  |  |  |  |
|-----------------------------------------------------------------|--------------|--------------|--|--|--|--|
| CPU architecture                                                | January 2022 | January 2023 |  |  |  |  |
| ×86-64-v1                                                       | 0.04         | 0.0          |  |  |  |  |
| ×86-64-v2                                                       | 12.44        | 5.23         |  |  |  |  |
| ×86-64-v3                                                       | 47.75        | 41.20        |  |  |  |  |
| ×86-64-v4                                                       | 39.77        | 53.57        |  |  |  |  |

| All grid sites; January-December 2022; HS06_day fraction [%] |       |       |  |  |  |  |
|--------------------------------------------------------------|-------|-------|--|--|--|--|
| CPU architecture I Intel AMD                                 |       |       |  |  |  |  |
| ×86-64-v1                                                    | 0.06  | 0.14  |  |  |  |  |
| ×86-64-v2                                                    | 9.03  | 2.50  |  |  |  |  |
| ×86-64-v3                                                    | 41.03 | 97.36 |  |  |  |  |
| ×86-64-v4                                                    | 49.88 | 0.0   |  |  |  |  |

# Intel and AMD CPUs (UKI sites only)

| UKI sites; January 2022; HS06_day fraction [%] |       |       |  |  |  |  |  |
|------------------------------------------------|-------|-------|--|--|--|--|--|
| CPU architecture                               | Intel | AMD   |  |  |  |  |  |
| ×86-64-v1                                      | 0.05  | 0.0   |  |  |  |  |  |
| ×86-64-v2                                      | 16.58 | 1.90  |  |  |  |  |  |
| ×86-64-v3                                      | 28.00 | 98.10 |  |  |  |  |  |
| ×86-64-v4                                      | 55.37 | 0.0   |  |  |  |  |  |

| UKI sites; January 2023; HS06_day fraction [%] |       |       |  |  |  |  |  |
|------------------------------------------------|-------|-------|--|--|--|--|--|
| CPU architecture                               | Intel | AMD   |  |  |  |  |  |
| ×86-64-v1                                      | 0.0   | 0.0   |  |  |  |  |  |
| ×86-64-v2                                      | 6.74  | 1.73  |  |  |  |  |  |
| ×86-64-v3                                      | 16.46 | 98.27 |  |  |  |  |  |
| ×86-64-v4                                      | 76.80 | 0.0   |  |  |  |  |  |