

# Introduction to Field Programmable Gate Arrays

Hannes Sakulin CERN / EP-CMD

International School of Trigger and Data Acquisition 2024

USTC Hefei, China, 20 June 2024



# What is a Field Programmable Gate Array? ... a quick answer for the impatient

- An FPGA is an integrated circuit
  - Mostly digital electronics
- An FPGA is programmable in the in the field (=outside the factory), hence the name "field programmable"
  - Circuit design is specified with a hardware description language or schematics
  - Tools compute a programming file for the FPGA (bitstream)
  - The FPGA is configured with the design (gateware / firmware)
  - Your electronic circuit is ready to use

With an FPGA you can build electronic circuits ... ... without using a bread board or soldering iron ... without plugging together NIM modules ... without having a chip produced at a factory



# Outline

- Quick look at digital electronics
- FPGAs and their features
- Programming techniques
- Design flow
- Example Applications in the Trigger and DAQ domain

# The basic elements of digital electronics

# The building blocks: logic gates

#### Truth table

C equivalent

AND gate



| INPUT |   | OUTPUT  |  |
|-------|---|---------|--|
| Α     | В | A AND B |  |
| 0     | 0 | 0       |  |
| 0     | 1 | 0       |  |
| 1     | 0 | 0       |  |
| 1     | 1 | 11      |  |

q = a && b;

OR gate



| INPUT<br>A B |   | OUTPUT<br>A+B |
|--------------|---|---------------|
| 0            | 0 | 0             |
| 0            | 1 | 1             |
| 1            | 0 | 1             |
| 1            | 1 | 1             |

 $q = a \parallel b$ ;

Exclusive OR gate XOR gate



| INP<br>A |   | OUTPUT<br>A XOR B |  |  |
|----------|---|-------------------|--|--|
| 0        | 0 | 0                 |  |  |
| 0        | 1 | 1                 |  |  |
| 1        | 0 | 1                 |  |  |
| 1        | 1 | 0                 |  |  |

q = a != b;

# Combinatorial logic (asynchronous)



Outputs are determined by Inputs, only

Example: Full adder with carry-in, carry-out

| A | В | Cin | S | Cout |
|---|---|-----|---|------|
| 0 | 0 | 0   | 0 | 0    |
| 1 | 0 | 0   | 1 | 0    |
| 0 | 1 | 0   | 1 | 0    |
| 1 | 1 | 0   | 0 | 1    |
| 0 | 0 | 1   | 1 | 0    |
| 1 | 0 | 1   | 0 | 1    |
| 0 | 1 | 1   | 0 | 1    |
| 1 | 1 | 1   | 1 | 1    |

Combinatorial logic may be implemented using Look-Up Tables (LUTs)

# (Synchronous) sequential logic



Outputs are determined by Inputs and their history (Sequence) The logic has an internal state

Example: 2-bit binary counter

https://www.zeepedia.com/read.php?b=9&c=32&d\_flip-flop\_based\_implementation\_digital\_logic\_design

#### Element that keeps the state: Flip-flop



D Flip-flop (D=data, delay): samples the data at the rising (or falling) edge of the clock

The output will be equal to the last sampled input until the next rising (or falling) clock edge

# (Synchronous) sequential logic



Outputs are determined by Inputs and their history (Sequence) The logic has an internal state

Example: 2-bit binary counter

https://www.zeepedia.com/read.php?b=9&c=32&d\_flip-flop\_based\_implementation\_digital\_logic\_design

|                     | Flip-flop 1 |   | 1  | Flip-flop 2              |   |                |       |
|---------------------|-------------|---|----|--------------------------|---|----------------|-------|
| time                | D           | Q | Q' | $D=Q_0 \text{ xor } Q_1$ | Q | Q <sub>1</sub> | $Q_0$ |
| Before clock edge 1 | 1           | 0 | 1  | 0                        | 0 | 0              | 0     |
| after clock edge 1  | 0           | 1 | 0  | 1                        | 0 | 0              | 1     |
| after clock edge 2  | 1           | 0 | 1  | 1                        | 1 | 1              | 0     |
| after clock edge 3  | 0           | 1 | 0  | 0                        | 1 | 1              | 1     |
| after clock edge 4  | 1           | 0 | 1  | 0                        | 0 | 0              | 0     |

# Synchronous sequential logic



Signal processor

Trigger logic

Data compression logic

**Network Interface Card** 

Neural net classifier



Using Look-Up-Tables and Flip-Flops any kind of digital electronics may be implemented

Of course electronics design is an art in itself ...

### What is inside an FPGA?

# Basic elements of an FPGA



Fine-grained: 10.000's up to millions of logic blocks

Programmable Input / Output pins

# **LUT-based Fabrics**



# Typical LUT-based Logic Cell



Xilinx: logic cell, Altera: logic element

- LUT may implement any function of the inputs
- Flip-Flop registers the LUT output
- May use only the LUT or only the Flip-flop
   LUT may alternatively be configured a shift register
   Additional elements (not shown): fast carry logic

# General-Purpose Input/Output (GPIO)



Today: Up to >1000 user I/O pins Input and / or output Voltages from (1.0), 1.2 .. 3.3 V Many IO standards

Single-ended: LVTTL, LVCMOS, ... 14

Differential pairs: LVDS, ...

# A toy example

# Toy example: trigger on energy cluster

1 2 3 4 5 6 7 8 9 Say, we have a 3x3 pixel detector Each pixel can measure deposited energy with 2 bit resolution Trigger condition: the sum of energies deposited in a 2x2 pixel area exceeds 5 counts.



### Toy example: VHDL code



#### Toy example: constraints



# Toy example: timing and floorplan



Toy example: floorplan 5 8 m  $e_1$  $e_2$  $e_3$  $e_4$  $e_5$  $e_6$  $e_7$ e<sub>8</sub> **e**<sub>9</sub>

Toy example: floorplan 5 8  $e_1$  $e_2$  $e_3$  $e_4$  $e_5$  $e_6$  $e_7$ e<sub>8</sub> **e**<sub>9</sub> SLICE\_X0Y79 (SLICEL)

#### Toy example: Register Transfer Level (RTL) design



#### Toy example: Full example, 4 possible clusters



#### Toy example: resource usage and floorplan



#### Toy example: floorplan



# Toy example: RTL design



If we look closely, we can see that adders that are shared between adjacent 2x2 areas, are only implemented once.



### Toy example: Timing



#### Toy example: Timing



# Doing the same with a microcontroller



#### Doing the same with a microcontroller



# Microcontroller vs FPGA





|                        | μC / CPU                                                              | FPGA                                                                                                 |  |  |
|------------------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------|--|--|
| Principle of operation | Source code is translated to machine instructions and executed by CPU | Hardware Description Language is translated to configuration of FPGA, defining an electronic circuit |  |  |
|                        |                                                                       |                                                                                                      |  |  |
| Processing time        | Microseconds                                                          | 10's of nanoseconds                                                                                  |  |  |

#### A closer look at FPGAs

#### Additional elements in an FPGA

- Besides logic cells and interconnect (distributed logic)
   we have additional elements in an FPGA:
  - Either to provide functions that cannot be implemented with distributed logic (because the logic would be too slow)
    - Clock resources, clock Managers
    - Gigabit transceivers
    - ...
  - Or to provide functionality that could also be implemented with distributed logic, but is more efficiently(\*) implemented as a hard macro (in silicon)
    - Multipliers, DSPs
    - RAM
    - Processors

# Clock Trees



Typical FPGA designs use one or multiple clocks
Clock trees guarantee that the clock arrives at the same time at all flip-flops
Typical fabric clock 10's to 100's of MHz up to ~ 1 GHz

# Clock Managers



# Our toy example with clock









# Resource usage & floorplan

| Name ^1                                                                    | Slice LUTs<br>(20800) | Slice Registers<br>(41600) | Slice<br>(8150) | LUT as Logic<br>(20800) | Bonded IOB<br>(210) | (210) | OLOGIC<br>(210) | BUFGCTRL<br>(32) | MMCME2_ADV<br>(5) |
|----------------------------------------------------------------------------|-----------------------|----------------------------|-----------------|-------------------------|---------------------|-------|-----------------|------------------|-------------------|
| N top                                                                      | 27                    | 30                         | 10              | 27                      | 20                  | 18    | 1               | 2                | 1                 |
| <pre>check_cluster_energy_1 (check_cluster_energy)</pre>                   | 9                     | 9                          | 5               | 9                       | 0                   | 0     | 0               | 0                | 0                 |
| <pre>[] check_cluster_energy_2 (check_cluster_energy_parameterized0)</pre> | 5                     | 6                          | 2               | 5                       | 0                   | 0     | 0               | 0                | 0                 |
| <pre>[] check_cluster_energy_3 (check_cluster_energy_parameterized1)</pre> | 8                     | 9                          | 3               | 8                       | 0                   | 0     | 0               | 0                | 0                 |
| check_cluster_energy_4 (check_cluster_energy_parameterized2)               | 5                     | 6                          | 2               | 5                       | 0                   | 0     | 0               | 0                | 0                 |
| > [ clk_wiz_0_1 (clk_wiz_0)                                                | 0                     | 0                          | 0               | 0                       | 0                   | 0     | 0               | 2                | 1                 |



#### Floorplan: clock resources





# VHDL code

3) Instantiate cluster energy check module

```
entity top is
   Port ( energies : in t_energy_array;
          accept : out STD LOGIC;
          clk pin : in std logic);
end top;
architecture Behavioral of top is
     component clk wiz 0 is
       port (
         clk out1 : out STD LOGIC;
         locked : out STD LOGIC;
         clk in1 : in STD LOGIC);
     end component clk wiz 0;
     component check cluster energy is
       generic (
         il, i2, i3, i4 : integer);
       port (
         energies : in t energy array;
         accept : out STD LOGIC;
                  : in std logic):
         clk
     end component check cluster energy;
     signal clk, locked : std logic;
     signal energies i : t energy array;
     signal accept1, accept2, accept3, accept4 : std logic;
begin
 clk wiz 0 1: entity work.clk wiz 0
   port map (
     clk outl => clk,
     locked => locked.
     clk in1 => clk pin);
  reg inputs: process (clk) is
  begin
   if rising edge(clk) then
     energies i <= energies;
   end if:
  end process reg inputs;
```

```
check cluster energy 1: entity work.check cluster energy
   generic map (
     il => 1,
     i2 => 2,
     i3 => 4,
     i4 => 5)
   port map (
     energies => energies i,
     accept => accept1,
     clk
              => clk):
   check cluster energy 2: entity work.check cluster energy...
   check cluster energy 3: entity work.check cluster energy...
   check_cluster_energy_4: entity work.check_cluster_energy...
  reg output: process (clk) is
 begin
   if rising edge(clk) then
     accept <= accept1 or accept2 or accept3 or accept4;
   end if;
                                         4) Clocked process:
  end process reg output;
                                         register OR of
end Behavioral;
                                         Cluster checks
```

- 1) Instantiate clocking logic
- Customized firmware block produced by FPGA design tool
- Also called IP (Intellectual Property) Core
- 2) Clocked process: register inputs

#### VHDL code – cluster energy check

```
entity check cluster energy is
 generic (il, i2, i3, i4: integer);
 port (energies : in t energy array;
       accept : out STD LOGIC;
       clk : in std logic);
end entity check cluster energy;
architecture Behavioral of check cluster energy is
                                                                Inside clocked process:
 signal sum1, sum2 : UNSIGNED(2 downto 0);
                                                                <= assignment creates flip-flop
 signal sum : UNSIGNED (3 downto 0);
begin -- architecture Behavioral
pl: process (clk) is
   begin -- process pl
     if clk'event and clk = '1' then -- rising clock edge
       suml <= UNSIGNED('0' & energies(i1)) + UNSIGNED('0' & energies(i2));</pre>
       sum2 <= UNSIGNED('0' & energies(i3)) + UNSIGNED('0' & energies(i4));</pre>
       sum <= UNSIGNED('0' & suml) + UNSIGNED('0' & sum2);
     end if;
   end process pl;
   accept <= '1' when sum > 5 else '0';
end architecture Behavioral:
```

Outside process: asynchronous logic

#### Constraints

```
set property PACKAGE_PIN V12 [get_ports {energies[9][1]}]
set property PACKAGE PIN E3 [get ports clk pin]
                        [get ports {energies[1][0]}]
set property IOB true
set property IOB true
                        [get ports {energies[1][1]}]
set property IOB true
                        [get ports {energies[2][0]}]
                        [get ports {energies[2][1]}]
set property IOB true
set property IOB true
                        [get ports {energies[3][0]}]
set property IOB true
                        [get ports {energies[3][1]}]
set property IOB true
                        [get ports {energies[4][0]}]
                        [get ports {energies[4][1]}]
set property IOB true
                        [get ports {energies[5][0]}]
set property IOB true
set property IOB true
                        [get ports {energies[5][1]}]
                        [get ports {energies[6][0]}]
set property IOB true
                        [get ports {energies[6][1]}]
set property IOB true
                        [get ports {energies[7][0]}]
set property IOB true
                        [get ports {energies[7][1]}]
set property IOB true
set property IOB true
                        [get ports {energies[8][0]}]
set property IOB true
                        [get ports {energies[8][1]}]
                        [get ports {energies[9][0]}]
set property IOB true
                        [get ports {energies[9][1]}]
set property IOB true
set_property IOB true [get ports accept]
```

Assign clock pin
Use input/output flip-flops

Timing

100 MHz clock at pin

```
∨ □ clk pin (100.00 MHz) (drives 50 loads)

   ∨ P clk pin
       V ■ I (clk wiz 0 1/inst/clkin1 ibufg/l)

√ □ clkin1 ibufg (IBUF)

√ ■ O (clk wiz 0 1/inst/clkin1 ibufg/0)

√ 

∫ clk in1 clk wiz 0 (clk wiz 0 1/inst/clk in1 clk wiz 0)

                     ✓ D CLKIN1 (clk wiz 0 1/inst/mmcm adv inst/CLKIN1)

✓ ■ mmcm adv inst (MMCME2 ADV)

                            U clk_out1_clk_wiz_0 (200.00 MHz) (drives 49 loads)

√ (CLKOUTO (clk wiz 0 1/inst/mmcm adv inst/CLKOUTO)

                                   v _ clk_out1_clk_wiz_0 (clk_wiz_0_1/inst/clk_out1_clk_wiz_0)
                                       V ■ I (clk wiz 0 1/inst/clkout1 buf/I)

✓ ■ clkout1 buf (BUFG)

√ ■ O (clk wiz 0 1/inst/clkout1 buf/0)
                                                 v _ clk_out1 (clk_wiz_0_1/inst/clk_out1)
                                                     > FDRE (49 loads)
```

200 MHz internal clock (we set this as a parameter to the clock manager)

Design can check for clusters at 200 MHz (every 5 ns), but needs 4 clock cycles (20 ns) to compute the trigger decision



#### Other elements in FPGAs

#### Embedded RAM blocks



Can be used in many ways:

Look-up of mathematical function

Buffer memory

Today: Up to ~500 Mbit of RAM\_9

#### Embedded Multipliers & DSPs



### Digital Signal Processor (DSP)



DSP block (Xilinx 7-series)
Up to several 1000 per chip

#### Soft and Hard Processor Cores

- Soft core
  - Design implemented with the programmable resources (logic cells) in the chip
- Hard core
  - Processor core that is available in addition to the programmable resources
  - E.g.: Power PC, ARM



#### High-Speed Serial Interconnect

- Using differential pairs
- Standard I/O pins limited to about 1 Gbit/s
- Latest serial transceivers: typically 25 Gb/s
  - up to 112 Gb/s with Pulse Amplitude Modulation (PAM)
- FPGAs with multi-Tbit/s IO bandwidth



#### Components in a modern FPGA



#### **Programming techniques**

#### Fusible Links (not used in FPGAs)



# Antifuse Technology



# **EPROM Technology**

Erasable Programmable Read Only Memory



Intel, 1971

#### **EEPROM** and FLASH Technology

Electrically Erasable Programmable Read Only Memory



EEPROM: erasable word by word

FLASH: erasable by block or by device

#### **SRAM-Based Devices**



Multi-transistor SRAM cell

#### Programming a 3-bit wide LUT



# Summary of Technologies

| Technology                    | Symbol | Predominantly associated with |  |  |  |  |
|-------------------------------|--------|-------------------------------|--|--|--|--|
| Fusible-link                  |        | SPLDs                         |  |  |  |  |
| Antifuse                      |        | FPGAs                         |  |  |  |  |
| EPROM                         | 一片     | SPLDs and CPLDs               |  |  |  |  |
| E <sup>2</sup> PROM/<br>FLASH | 一片     | SPLDs, CPLDs,<br>and FPGAs    |  |  |  |  |
| SRAM                          | SRAM — | FPGAs (some CPLDs)            |  |  |  |  |







#### Major Manufacturers

- AMD Xilinx (formerly Xilinx)
  - First company to produce FPGAs in 1985
  - About 55% market share, today
  - SRAM based CMOS devices
- Intel FPGA (formerly Altera)
  - About 35% market share
  - SRAM based CMOS devices
- Microchip (Microsemi, Actel)
  - Anti-fuse FPGAs
  - Flash based FPGAs
  - Mixed Signal
- Lattice Semiconductor
  - SRAM based with integrated Flash PROM
  - low power









#### **Trends**

# Ever-decreasing feature size



#### **Trends**

- Speed of logic keeps increasing
- Look-up-tables with more inputs (5 or 6)
- Speed of serial links increasing (multiple Gb/s)
- More integrated memory
  - Integrated High Bandwidth Memory (HBM) in-package
    - 10x faster than DDR4 (Xilinx: up to 8 GB, Intel: up to 16GB)
- More and more hard macro cores on the FPGA
  - PCI Express
    - Gen2: 5 Gb/s per lane
    - Gen3: 8 Gb/s per lane (typically up to 16 lanes)
    - Gen4: 16 Gb/s per lane
  - 10 Gb/s, 40 Gb/s, 100 Gb/s Ethernet, 150 Gb/s Interlaken
- Sophisticated soft macros
  - CPUs
  - Gb/s MACs
  - Memory interfaces (DDR2/3/4)
- Processor-centric architectures see next slide

#### System-On-a-Chip (SoC) FPGAs



Xlinix Zynq

Intel Stratix 10 SoC

# Adaptive Compute Acceleration Platform (ACAP)



Xlinix Versal

https://www.electronicdesign.com/markets/automation/video/21234012/electronic-design-versal-card-streamlines-acap-fpga-ai-development

CPU(s) + Peripherals + FPGA + AI (Adaptable Intelligence) Engines in one package

#### FPGA – ASIC comparison

#### **FPGA**

- A chip (the FPGA) is configured to represent a digital circuit
- May be reprogrammed in the field (gateware upgrade)
  - New features
  - Bug fixes
- Rapid development cycle (minutes / hours)
- Only digital designs are possible
- Low development cost
  - You can get started with a development board (< \$100) and free software
- High-end FPGAs rather expensive



#### ASIC(\*)

- A chip is produced in a foundry for a specific purpose
- Design cannot be changed once it is produced
- Long development cycle (weeks / months)
- Analog designs possible
- Higher performance
  - Speed, Area, Power
- Better radiation hardness
- Extremely high development cost
  - ASICs are produced at a semiconductor fabrication facility ("fab") according to your design
- Lower cost per device compared to FPGA, when large quantities are needed



#### **FPGA** design flow

# Design entry

#### **Schematics**



- Graphical overview
- Can draw entire design
- Use pre-defined blocks

rarely used today

#### Hardware description language VHDL, Verilog

- Can generate blocks using loops
- Can synthesize algorithms
- Independent of design tool
- May use tools used in SW development (SVN, git ...)

#### Hardware Description Language

- Looks similar to a programming language
  - BUT be aware of the difference
    - Programming language => translated into machine instructions that are executed by a CPU
    - HDL => translated into gateware (logic gates & flip-flops)
- Common HDLs
  - VHDL
  - Verilog
  - AHDL (Altera specific)
- Newer trends
  - C-like languages (handle-C, System C)
  - Labview
  - High Level Synthesis (HLS) from C/C++

#### architecture behavioral of VMEReg is

# Example: VHDL

```
signal vme_en_i : std_logic;
  signal 0 : std_logic_vector(15 downto 0);
                                               Asynchronous logic
begin -- behavioral
                                               All signals in sensitivity list
  vme_addr_decode : process (vme_addr, vme_en) is
    variable my_addr_vec : std_logic_vector(vme_addr'high downto 0);
    variable selected
                        : boolean;
  begin -- process vme_addr_decode
    my_addr_vec := std_logic_vector( TO_UNSIGNED ( my_vme_base_address, vme_addr'high+1 ) );
               := my_addr_vec(vme_addr'high downto 1) = vme_addr(vme_addr'high downto 1);
   vme_en_i <= '0' :
    if selected then
     vme_en_i <= vme_en;
    end if:
  end process vme_addr_decode;
                                    Synchronous logic
                                     Only clock (and reset) in sensitivity list
  reg: process (vme_clk, reset) is
  begin -- process reg
   if reset = '1' then
                                       -- asynchronous reset
       0 <= init_val;</pre>
       vme_en_out <= '0';
    elsif vme_clk'event and vme_clk = '1' then -- rising clock edge
     vme_en_out <= vme_en_i;</pre>
     if vme_en_i = '1' and vme_wr = '1' then
       Q <= vme_data;
      end if:
   end if:
  end process reg;
  data <= 0:
  vme_data_out <= 0;
end behavioral;
```

Looks like a programming language

All statements executed in parallel, except inside processes

# Schematics & HDL combined





# Floorplan



# Manual Floor planning



For large designs, manual floor planning may be necessary



Routing congestion
Xilinx Virtex 7 (Vivado)

# Simulation



# Embedded Logic Analyzers



A great tool for debugging your design

# FPGA applications in the Trigger & DAQ domain

# First-Level Trigger at Collider



# Pipelined Logic



# Pipelined Logic – a clock cycle later



### Why are FPGAs ideal for First-Level Triggers?

- They are fast
  - Much faster than discrete electronics (shorter connections)
- Many inputs
  - Data from many parts of the detector has to be combined

Low latency

- All operations are performed in parallel
  - Can build pipelined logic
- They can be re-programmed
  - Trigger algorithms can be optimized

High performance

### Trigger algorithms implemented in FPGAs

- Trigger
  - Peak finding
  - Pattern Recognition
  - Track Finding
  - Clustering / Energy summing
  - Topological Algorithms (invariant mass)
  - Vertex Finding
  - Particle flow (reconstruction jets, etc. from individual particle tracks)
  - Inference with Neural Networks
  - Many more ...
- Trigger Control system
  - Fast (busy) signal merging & monitoring
  - Generation of random triggers
  - Generation of calibration sequences
  - Automatic recovery sequences
  - Monitoring (dead times, rates, ...)

# Neural Networks in Trigger





By Glosser.ca - Own work, Derivative of File:Artificial neural network.svg, CC BY-SA 3.0. https://commons.wikimedia.org/w/index.php?curid=24913461

#### Principle

- Node is assigned a value based on the weighted sum of nodes in the previous layer
- Maps well to DSP resources in FPGA (multiplier + adder)

#### Applications:

- Jet classification
- Assignment of transverse momentum based on many measurements
- Topological trigger
- ...

#### Tools

- Many commercial tools
- hls4ml (optimized for latency)
  - Firmware generation from high-level model using Vivado HLS



### CMS Global Muon Trigger



- The CMS Global Muon trigger received 16 muon candidates from the three muon systems of CMS
  - It merged different measurements for the same muon and found the best 4 over-all muon candidates

- VME card (9U)
- Input: ~1000 bits@ 40 and 80 MHz
- Output: ~50 bits @ 80MHz
- Processing time: 250 ns
- Pipelined logic one new result every 25 ns
- 10 Xilinx Virtex-II FPGAs
- up to 500 user I/Os per chip
- Up to 25000 LUTs per chip used
- Up to 96 x 18kbit RAM used
- In use in the CMS trigger 2008-2015

# CMS Global Muon Trigger main FPGA



# μTCA board for Run 2&3 CMS trigger based on Virtex 7



MP7, Imperial College

Virtex 7 with 690k logic cells
80 x 10 Gb/s transceivers bi-directional
72 of them as optical links on front panel
0.75 + 0.75 Tb/s

Being used in the CMS trigger since 2015

Input/output: up to 14k bits per 40 MHz clock

Same board used for different functions (different gateware)
Separation of framework + algorithm fw

### CMS ATCA Trigger boards for HL-LHC (2029+)

120 x 25 Gb/s



APX, US

#### Serenity, UK

- Few types of generic boards, ATCA standard
- Xilinx Virtex/Kintex Ultrascale+ FPGAs (> 3 million logic cells / FPGA)
- 25-28 Gb/s optical links
- SoC FPGAs used for board control (on some boards)
- Advanced firmware algorithms
  - Vertex finding
  - Particle flow
  - Neural network classifiers

## FPGAs in Data Acquisition

- Frontend Electronics
  - Pedestal subtraction
  - Zero suppression
  - Compression
  - Buffering ...
- Custom data links
  - E.g. SLINK-64 over copper
    - Several serial LVDS links in parallel
    - Up to 400 MB/s
  - SLINK/SLINK-express over optical
- Interface from custom hardware to commercial electronics
  - PCI/PCIe, VME bus, Myrinet, 10/40/100 Gb/s Ethernet etc.

### C-RORC (Alice) / Robin NP (ATLAS) for Run-2

#### Xilinx Virtex-6 FPGA



Commercial PCle
link out (DMA to host memory)

#### CMS Front-end Readout Link (Run-1)

- SLINK Sender Mezzanine Card: 400 MB / s
  - 1 FPGA (Altera)
  - CRC check
  - Automatic link test

Commercial Myrinet Network Interface Card on internal PCI bus



- 1 main FPGA (Altera)
- 1 FPGA as PCI interface
- Custom Compact PCI card
- Receives 1 or 2 SLINK64
- 2nd CRC check
- Monitoring, Histogramming
- Event spy



# CMS Readout Link for Run-2&3 in use since 2015



Myrinet NIC replaced by custom-built card ("FEROL")

Cost effective solution (need many boards)
Rather inexpensive FPGA

+ commercial chip to combine 3 Gb/s links to 10 Gb/s

#### **FEROL (Front End Readout Optical Link)**

Input: 1x or 2x SLINK (copper)

1x or 2x 5Gb/s optical

1x 10Gb/s optical

Output: 10 Gb/s Ethernet optical

TCP/IP sender in FPGA

# CMS Readout Link for Run-2&3 in use since 2015

Commercial data link out

10 Gb/s TCP/IP

10 Gb/s SLINK Express

5 Gb/s SLINK Express

5 Gb/s SLINK Express

Custom data link in

SLINK-64 input LVDS / copper



#### **FEROL** (Front End Readout Optical Link)

Input: 1x or 2x SLINK (copper)

1x or 2x 5Gb/s optical

1x 10Gb/s optical

Output: 10 Gb/s Ethernet optical

TCP/IP sender in FPGA

# PCle40 – LHCb and ALICE Run-3



- 48 bidirectional links running at up to 10 Gbits/s each (minipods)
- 2 bidirectional links running at up to 10 Gbits/s devoted to time distribution (can use SFP+ or 10G PON devices)
- Sustained 112 Gbits/s interface with CPU through PCIe

### CMS DTH (DAQ and Timing Hub) for HL-LHC (2029+)

Custom data link in



Commercial data link (TCP/IP) out





Clock & control Main board uplink

DAQ FPGA

Zynq SoC FPGA for control

> Rear transition module

DTH prototype 2

**Clock & control distribution** via backplane

- **ATCA** board using Xilinx Virtex Ultrascale + FPGAs
- One or two DAQ units per board
  Up to 24 inputs at 25 Gb/s
  5x 100 Gb/s Ethernet to commercial network
  - TCP/IP in FPGA

Board contains switch for control network

# FPGAs in other domains

- Machine Learning Inferencing
- Automotive Driver Assist (Image Processing)
- 5G Wireless
- Medical imaging
- Speech recognition
- Cryptography
- Bioinformatics (Genome sequencing)
- Aerospace / Defense
- ( Bitcoin mining )

- ASIC Prototyping
- Compute accelerators
  - Accelerator cards





- Server processors w. FPGA
- Financial
- Video transcoding
- ...

# Lab Session 5: Programming an FPGA





You are going to design the digital electronics inside this FPGA!

99

## Lab Session 13: System-on-a-chip FPGA



Design the digital electronics and software in this SoC FPGA!

# Thank you

# Acknowledgement

 Parts of this lecture are based on material by Clive Maxfield, author of several books on FPGAs. Many thanks for his kind permission to use his material!

# Re-use

 Re-use of the material is permitted only with the written authorization of both Hannes Sakulin (<u>Hannes.Sakulin@cern.ch</u>) and Clive Maxfield.

## **Reference Material**

#### Top-of-the-line Xilinx devices

|                                                                                     |                      | Device Name                 | VU3P         | VU5P         | VU7P         | VU9P         | VU11P        | VU13P        | VU27P        | VU29P        | VU31P        | VU33P        | VU35P        | VU37P     |
|-------------------------------------------------------------------------------------|----------------------|-----------------------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|--------------|-----------|
|                                                                                     | System               | Logic Cells (K)             | 862          | 1,314        | 1,724        | 2,586        | 2,835        | 3,780        | 2,835        | 3,780        | 962          | 962          | 1,907        | 2,852     |
|                                                                                     | CLB                  | Flip-Flops (K)              | 788          | 1,201        | 1,576        | 2,364        | 2,592        | 3,456        | 2,592        | 3,456        | 879          | 879          | 1,743        | 2,607     |
|                                                                                     |                      | CLB LUTs (K)                | 394          | 601          | 788          | 1,182        | 1,296        | 1,728        | 1,296        | 1,728        | 440          | 440          | 872          | 1,304     |
|                                                                                     | Max. Di              | st. RAM (Mb)                | 12.0         | 18.3         | 24.1         | 36.1         | 36.2         | 48.3         | 36.2         | 48.3         | 12.5         | 12.5         | 24.6         | 36.7      |
|                                                                                     | Total Blo            | ck RAM (Mb)                 | 25.3         | 36.0         | 50.6         | 75.9         | 70.9         | 94.5         | 70.9         | 94.5         | 23.6         | 23.6         | 47.3         | 70.9      |
|                                                                                     | U                    | traRAM (Mb)                 | 90.0         | 132.2        | 180.0        | 270.0        | 270.0        | 360.0        | 270.0        | 360.0        | 90.0         | 90.0         | 180.0        | 270.0     |
|                                                                                     | HBN                  | M DRAM (GB)                 | -            | -            | _            | -            | -            | _            | -            | -            | 4            | 8            | 8            | 8         |
|                                                                                     | HBM /                | AXI Interfaces              | _            | -            | -            | -            | -            | -            | -            | -            | 32           | 32           | 32           | 32        |
|                                                                                     | Clock Mgm            | t Tiles (CMTs)              | 10           | 20           | 20           | 30           | 12           | 16           | 16           | 16           | 4            | 4            | 8            | 12        |
|                                                                                     |                      | DSP Slices                  | 2,280        | 3,474        | 4,560        | 6,840        | 9,216        | 12,288       | 9,216        | 12,288       | 2,880        | 2,880        | 5,952        | 9,024     |
|                                                                                     | Peak INT             | B DSP (TOP/s)               | 7.1          | 10.8         | 14.2         | 21.3         | 28.7         | 38.3         | 28.7         | 38.3         | 8.9          | 8.9          | 18.6         | 28.1      |
|                                                                                     | PC                   | le® Gen3 x16                | 2            | 4            | 4            | 6            | 3            | 4            | 1            | 1            | 0            | 0            | 1            | 2         |
| PCle                                                                                | Gen3 x16/Ge          | n4 x8 / CCIX <sup>(1)</sup> | -            | -            | -            | -            | -            | -            | -            | -            | 4            | 4            | 4            | 4         |
| 150G Interlaken                                                                     |                      | 3                           | 4            | 6            | 9            | 6            | 8            | 6            | 8            | 0            | 0            | 2            | 4            |           |
| 10                                                                                  | 0G Ethernet w        | / KR4 RS-FEC                | 3            | 4            | 6            | 9            | 9            | 12           | 11           | 15           | 2            | 2            | 5            | 8         |
| Max. Single-Ended HP I/Os                                                           |                      | 520                         | 832          | 832          | 832          | 624          | 832          | 520          | 676          | 208          | 208          | 416          | 624          |           |
| (                                                                                   | STY 32.75Gb/s        | Transceivers                | 40           | 80           | 80           | 120          | 96           | 128          | 32           | 32           | 32           | 32           | 64           | 96        |
| MT                                                                                  | 58Gb/s PAM4          | Transceivers                |              |              |              |              |              |              | 32           | 48           |              |              |              |           |
|                                                                                     | 100G/                | 50G KP4 FEC                 |              |              |              |              |              |              | 16/32        | 24 / 48      |              |              |              |           |
|                                                                                     |                      | Extended <sup>(2)</sup>     | -1 -2 -2L -3 | -1 -2 -2L |
|                                                                                     |                      | Industrial                  | -1 -2        | -1 -2        | -1 -2        | -1 -2        | -1 -2        | -1 -2        | -1 -2        | -1 -2        | -            | -            | -            | -         |
| F                                                                                   | ootprint(3,4,5)      | Dim. (mm)                   |              |              | HP I/0       | O, GTY       |              |              | HP I/O, 0    | STY, GTM     |              | HP I/C       | ), GTY       |           |
|                                                                                     | C1517                | 40x40                       | 520, 40      |              |              |              |              |              |              |              |              |              |              |           |
| er                                                                                  | F1924 <sup>(6)</sup> | 45x45                       |              |              |              |              | 624, 64      |              |              |              |              |              |              |           |
| ntifi                                                                               | A2104                | 47.5x47.5                   |              | 832, 52      | 832,52       | 832, 52      |              |              |              |              |              |              |              |           |
| nm<br>it ide                                                                        | ALIOT                | 52.5x52.5 <sup>(7)</sup>    |              |              |              |              |              | 832, 52      |              |              |              |              |              |           |
| n 20                                                                                | B2104                | 47.5x47.5                   |              | 702, 76      | 702,76       | 702, 76      | 572, 76      |              |              |              |              |              |              |           |
| e foc                                                                               | DZIO4                | 52.5x52.5 <sup>(7)</sup>    |              |              |              |              |              | 702, 76      |              |              |              |              |              |           |
| sam                                                                                 | C2104                | 47.5x47.5                   |              | 416, 80      | 416,80       | 416, 104     | 416, 96      |              |              |              |              |              |              |           |
| with                                                                                | C2104                | 52.5x52.5 <sup>(7)</sup>    |              |              |              |              |              | 416, 104     |              |              |              |              |              |           |
| ices                                                                                | D2104                | 47.5x47.5                   |              |              |              | 676, 76      | 572, 76      |              |              |              |              |              |              |           |
| Dev                                                                                 | 02104                | 52.5x52.5 <sup>(7)</sup>    |              |              |              |              |              | 676, 76      | 676, 16, 30  | 676, 16, 30  |              |              |              |           |
| roatprint compatible with 20nm<br>UltraScale Devices with same footprint identifier | A2577                | 52.5x52.5                   |              |              |              | 448, 120     | 448, 96      | 448, 128     | 448, 32, 48  | 448, 32, 48  |              |              |              |           |
| Iltras                                                                              | H1924                | 45x45                       |              |              |              |              |              |              |              |              | 208, 32      |              |              |           |
| ٦                                                                                   | H2104                | 47.5x47.5                   |              |              |              |              |              |              |              |              |              | 208, 32      | 416, 64      |           |
|                                                                                     | H2892                | 55x55                       |              |              |              |              |              |              |              |              |              |              | 416, 64      | 624, 9    |



(55 mm x 55 mm, 1.0 mm pitch)

#### Intel Stratix 10

#### INTEL® STRATIX® 10 GX/SX PRODUCT TABLE

| PRO  | DUCT LINE                                                                                                                                                                                                                                                                                                                                                      | GX 400<br>SX 400    | GX 650<br>SX 650                      | GX 850<br>SX 850               | GX 1100<br>SX 1100  | GX 1650<br>SX 1650   | GX 2100<br>SX 2100  | GX 2500<br>SX 2500  | GX 2800<br>SX 2800 | GX 4500<br>SX 4500 | GX 5500<br>SX 5500 |  |
|------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|---------------------------------------|--------------------------------|---------------------|----------------------|---------------------|---------------------|--------------------|--------------------|--------------------|--|
|      | Logic elements (LEs) <sup>1</sup>                                                                                                                                                                                                                                                                                                                              | 378,000             | 612,000                               | 841,000                        | 1,092,000           | 1,624,000            | 2,005,000           | 2,422,000           | 2,753,000          | 4,463,000          | 5,510,000          |  |
|      | Adaptive logic modules (ALMs)                                                                                                                                                                                                                                                                                                                                  | 128,160             | 207,360                               | 284,960                        | 370,080             | 550,540              | 679,680             | 821,150             | 933,120            | 1,512,820          | 1,867,680          |  |
|      | ALM registers                                                                                                                                                                                                                                                                                                                                                  | 512,640             | 829,440                               | 1,139,840                      | 1,480,320           | 2,202,160            | 2,718,720           | 3,284,600           | 3,732,480          | 6,051,280          | 7,470,720          |  |
|      | Hyper-Registers from Intel® HyperFlex™ FPGA architecture                                                                                                                                                                                                                                                                                                       |                     |                                       |                                | Millions of Hyper-R | egisters distributed | throughout the mon  | olithic FPGA fabric |                    |                    |                    |  |
|      | Programmable clock trees synthesizable                                                                                                                                                                                                                                                                                                                         |                     | Hundreds of synthesizable clock trees |                                |                     |                      |                     |                     |                    |                    |                    |  |
| Jrce | M20K memory blocks                                                                                                                                                                                                                                                                                                                                             | 1,537               | 2,489                                 | 3,477                          | 4,401               | 5,851                | 6,501               | 9,963               | 11,721             | 7,033              | 7,033              |  |
| esor | M20K memory size (Mb)                                                                                                                                                                                                                                                                                                                                          | 30                  | 49                                    | 68                             | 86                  | 114                  | 127                 | 195                 | 229                | 137                | 137                |  |
| K    | MLAB memory size (Mb)                                                                                                                                                                                                                                                                                                                                          | 2                   | 3                                     | 4                              | 6                   | 8                    | 11                  | 13                  | 15                 | 23                 | 29                 |  |
|      | Variable-precision digital signal processing (DSP) blocks                                                                                                                                                                                                                                                                                                      | 648                 | 1,152                                 | 2,016                          | 2,520               | 3,145                | 3,744               | 5,011               | 5,760              | 1,980              | 1,980              |  |
|      | 18 x 19 multipliers                                                                                                                                                                                                                                                                                                                                            | 1,296               | 2,304                                 | 4,032                          | 5,040               | 6,290                | 7,488               | 10,022              | 11,520             | 3,960              | 3,960              |  |
|      | Peak fixed-point performance (TMACS) <sup>2</sup>                                                                                                                                                                                                                                                                                                              | 2.6                 | 4.6                                   | 8.1                            | 10.1                | 12.6                 | 15.0                | 20.0                | 23.0               | 7.9                | 7.9                |  |
|      | Peak floating-point performance (TFLOPS) <sup>3</sup>                                                                                                                                                                                                                                                                                                          | 1.0                 | 1.8                                   | 3.2                            | 4.0                 | 5.0                  | 6.0                 | 8.0                 | 9.2                | 3.2                | 3.2                |  |
| Т    | Secure device manager  AES-256/SHA-256 bitsream encryption/authentication, physically unclonable function (PUF), ECDSA 256/384 boot code authentication, side channel attack protection                                                                                                                                                                        |                     |                                       |                                |                     |                      |                     |                     |                    |                    |                    |  |
| 2    | Hard processor system <sup>4</sup> Quad-core 64 bit ARM* Cortex*-A53 up to 1.5 GHz with 32 KB I/D cache, NEON* coprocessor, 1 MB L2 cache, direct memory access (DMA), system memory management unit, cache coherency unit hard memory controllers, USB 2.0 x2, 1G EMAC x3, UART x2, SPI x4, I <sup>2</sup> C x5, general-purpose timers x7, watchdog timer x4 |                     |                                       |                                |                     |                      |                     |                     |                    |                    |                    |  |
| Feat | Maximum user I/O pins                                                                                                                                                                                                                                                                                                                                          | 392                 | 400                                   | 736                            | 736                 | 704                  | 704                 | 1160                | 1160               | 1640               | 1640               |  |
|      | Maximum LVDS pairs 1.6 Gbps (RX or TX)                                                                                                                                                                                                                                                                                                                         | 192                 | 192                                   | 360                            | 360                 | 336                  | 336                 | 576                 | 576                | 816                | 816                |  |
| ,    | Total full duplex transceiver count                                                                                                                                                                                                                                                                                                                            | 24                  | 48                                    | 48                             | 48                  | 96                   | 96                  | 96                  | 96                 | 24                 | 24                 |  |
| 5    | GXT full duplex transceiver count (up to 30 Gbps)                                                                                                                                                                                                                                                                                                              | 16                  | 32                                    | 32                             | 32                  | 64                   | 64                  | 64                  | 64                 | 16                 | 16                 |  |
| alla | GX full duplex transceiver count (up to 17.4 Gbps)                                                                                                                                                                                                                                                                                                             | 8                   | 16                                    | 16                             | 16                  | 32                   | 32                  | 32                  | 32                 | 8                  | 8                  |  |
| 2    | PCI Express* (PCIe*) hard intellectual property (IP) blocks (Gen3 x16)                                                                                                                                                                                                                                                                                         | 1                   | 2                                     | 2                              | 2                   | 4                    | 4                   | 4                   | 4                  | 1                  | 1                  |  |
|      | Memory devices supported                                                                                                                                                                                                                                                                                                                                       |                     |                                       |                                | DDR4, DDR3, DDR2    | , DDR, QDR II, QDR   | I+, RLDRAM II, RLDR | AM 3, HMC, MoSys    |                    |                    |                    |  |
| ick  | age Options and I/O Pins: General-Purpose I/O (GPIO) Count,                                                                                                                                                                                                                                                                                                    | High-Voltage I/O Co | ount, LVDS Pairs, and                 | Transceiver Count <sup>5</sup> |                     |                      |                     |                     |                    |                    |                    |  |
|      | 52 pin<br>mm x 35 mm, 1.0 mm pitch)                                                                                                                                                                                                                                                                                                                            | 392,8,192,24        | 392,8,192,24                          |                                | : <b>-</b>          | -                    | -                   | -                   | -1                 | -                  | -                  |  |
|      | 60 pin<br>5 mm x 42.5 mm, 1.0 mm pitch)                                                                                                                                                                                                                                                                                                                        | _                   | 400,16,192,48                         | -                              | 12                  | _                    | 2                   | -                   | -                  | 2                  | -                  |  |
|      | 50 pin<br>5 mm x 42.5 mm, 1.0 mm pitch)                                                                                                                                                                                                                                                                                                                        | -                   | -                                     | 688,16,336,48                  | 688,16,336,48       | 688,16,336,48        | 688,16,336,48       | 688,16,336,48       | 688,16,336,48      | -                  | -                  |  |
|      | 12 pin<br>5 mm x 47.5 mm, 1.0 mm pitch)                                                                                                                                                                                                                                                                                                                        | 18                  | ä                                     | 736,16,360,48                  | 736,16,360,48       | -                    | -                   | -                   | -                  | -                  | -                  |  |
|      | 97 pin<br>mm x 50 mm, 1.0 mm pitch)                                                                                                                                                                                                                                                                                                                            | -                   | -                                     | 21                             | 1-                  | 704,32,336,96        | 704,32,336,96       | 704,32,336,96       | 704,32,336,96      | 2                  | -                  |  |
|      | 12 pin                                                                                                                                                                                                                                                                                                                                                         |                     |                                       |                                | 7.5                 | _                    | _                   | 1160,8,576,24       | 1160,8,576,24      | 1640,8,816,24      | 1547,8,016,        |  |



F2912 pin (55 mm x 55 mm, 1.0 mm pitch)

#### Intel Stratix 10

296,8,144,120,24

#### INTEL® STRATIX® 10 TX PRODUCT TABLE

| PRO         | DDUCT LINE                                                                                                        | TX 1                                                                          | 650                          | TX                                  | 2100                        |                                                                          | TX 2500                      |                           |                                                       | TX 2800            |                     |  |
|-------------|-------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------|------------------------------|-------------------------------------|-----------------------------|--------------------------------------------------------------------------|------------------------------|---------------------------|-------------------------------------------------------|--------------------|---------------------|--|
| _           | Logic elements (LEs) <sup>1</sup>                                                                                 | 1,679,000                                                                     |                              | 2,073,000                           |                             | 2,422,000                                                                |                              |                           | 2,753,000                                             |                    |                     |  |
|             | Adaptive logic modules (ALMs)                                                                                     | 569,200                                                                       |                              | 702,720                             |                             | 821,150                                                                  |                              |                           | 933,120                                               |                    |                     |  |
|             | ALM registers                                                                                                     | 2,276,800                                                                     |                              | 2,810,880                           |                             | 3,284,600                                                                |                              |                           | 3,732,480                                             |                    |                     |  |
| s           | Hyper-Registers from Intel® Hyperflex™ FPGA architecture                                                          | Millions of Hyper-Registers distributed throughout the monolithic FPGA fabric |                              |                                     |                             |                                                                          |                              |                           |                                                       |                    |                     |  |
|             | Programmable clock trees synthesizable                                                                            | Hundreds of synthesizable clock trees                                         |                              |                                     |                             |                                                                          |                              |                           |                                                       |                    |                     |  |
|             | eSRAM memory blocks                                                                                               | 2                                                                             |                              | 2                                   |                             |                                                                          | -                            |                           | -                                                     |                    |                     |  |
| urce        | eSRAM memory size (Mb)                                                                                            | 9                                                                             | 0                            | 9                                   | 90                          |                                                                          | -                            |                           |                                                       |                    |                     |  |
| Resources   | M20K memory blocks                                                                                                | 6,1                                                                           | 62                           | 6,8                                 | 347                         |                                                                          | 9,963                        |                           | 11,721                                                |                    |                     |  |
| œ           | M20K memory size (Mb)                                                                                             | 12                                                                            | 20                           | 13                                  | 34                          | 195                                                                      |                              |                           | 229                                                   |                    |                     |  |
|             | MLAB memory size (Mb)                                                                                             | 9                                                                             |                              | 11                                  |                             | 13                                                                       |                              |                           | 15                                                    |                    |                     |  |
|             | Variable-precision digital signal processing (DSP) blocks                                                         | 3,326                                                                         |                              | 3,960                               |                             | 5,011                                                                    |                              |                           | 5,760                                                 |                    |                     |  |
| s           | 18 x 19 multipliers                                                                                               | 6,652                                                                         |                              | 7,920                               |                             | 10,022                                                                   |                              |                           | 11,520                                                |                    |                     |  |
|             | Peak fixed-point performance (TMACS) <sup>2</sup>                                                                 | 13.3                                                                          |                              | 15                                  | 15.8                        |                                                                          | 20.0                         |                           |                                                       | 23.0               |                     |  |
|             | Peak floating-point performance (TFLOPS) <sup>3</sup>                                                             | 5.3                                                                           |                              | 6.3                                 |                             | 8.0                                                                      |                              |                           | 9.2                                                   |                    |                     |  |
|             | Hard processor system                                                                                             | management unit, cache cohere                                                 |                              | -                                   |                             | rs, USB 2.0 x2, 1G EMAC x3, UART x2, SPI x4, I <sup>2</sup> C x5,<br>Yes |                              |                           | , general purpose timers x7, watchdog timer x4<br>Yes |                    |                     |  |
| Featur      | Maximum user I/O pins                                                                                             | 544                                                                           | 440                          | 544                                 | 440                         | 544                                                                      | 440                          | 296                       | 544                                                   | 440                | 296                 |  |
|             | Maximum LVDS pairs 1.6 Gbps (RX or TX)                                                                            | 264                                                                           | 216                          | 264                                 | 216                         | 264                                                                      | 216                          | 144                       | 264                                                   | 216                | 144                 |  |
| ural        | Total full duplex transceiver count                                                                               | 72                                                                            | 96                           | 72                                  | 96                          | 72                                                                       | 96                           | 144                       | 72                                                    | 96                 | 144                 |  |
| Architectur | GXE transceiver count - PAM-4 (up to 58 Gbps) or NRZ (up to 30 Gbps)                                              | 12 PAM-4<br>24 NRZ                                                            | 36 PAM-4<br>72 NRZ           | 12 PAM-4<br>24 NRZ                  | 36 PAM-4<br>72 NRZ          | 12 PAM-4<br>24 NRZ                                                       | 36 PAM-4<br>72 NRZ           | 60 PAM-4<br>120 NRZ       | 12 PAM-4<br>24 NRZ                                    | 36 PAM-4<br>72 NRZ | 60 PAM-4<br>120 NRZ |  |
|             | GXT transceiver count - NRZ (up to 28.3 Gbps)                                                                     | 32                                                                            | 16                           | 32                                  | 16                          | 32                                                                       | 16                           | 16                        | 32                                                    | 16                 | 16                  |  |
| and         | GX transceiver count - NRZ (up to 17.4 Gbps)                                                                      | 16                                                                            | 8                            | 16                                  | 8                           | 16                                                                       | 8                            | 8                         | 16                                                    | 8                  | 8                   |  |
|             |                                                                                                                   |                                                                               |                              |                                     |                             |                                                                          | 2.5                          | 2,00                      |                                                       |                    | 0                   |  |
| I/O and     | PCI Express* (PCIe*) hard intellectual property (IP) blocks (Gen3 x16)                                            | 2                                                                             | 1                            | 2                                   | 1                           | 2                                                                        | 1                            | 1                         | 2                                                     | 1                  | 1                   |  |
|             | PCI Express* (PCIe*) hard intellectual property (IP) blocks (Gen3 x16)  100G Ethernet MAC (no FEC) hard IP blocks | 2                                                                             | 1                            | 2                                   | 1                           | 2                                                                        | 1                            | 1                         | 2                                                     | 1                  | _                   |  |
|             |                                                                                                                   |                                                                               |                              |                                     | -                           |                                                                          |                              | 100                       |                                                       |                    | 1                   |  |
|             | 100G Ethernet MAC (no FEC) hard IP blocks                                                                         | 2                                                                             | 1                            | 2 4                                 | 1 12                        | 2                                                                        | 1 12                         | 1 20                      | 2                                                     | 1                  | 1                   |  |
| 0/1         | 100G Ethernet MAC (no FEC) hard IP blocks 100G Ethernet MAC + FEC hard IP blocks                                  | 2 4                                                                           | 1 12                         | 2<br>4<br>DDR4                      | 1<br>12<br>4, DDR3, DDR2, D | 2<br>4<br>DR, QDR II, QDR II                                             | 1<br>12<br>I+, RLDRAM II, RL | 1 20                      | 2                                                     | 1                  | 1                   |  |
| Pac         | 100G Ethernet MAC (no FEC) hard IP blocks 100G Ethernet MAC + FEC hard IP blocks Memory devices supported         | 2 4                                                                           | 1<br>12<br>DS pairs, GXE (E- | 2<br>4<br>DDR4<br>Tile) Transceiver | 1<br>12<br>4, DDR3, DDR2, D | 2<br>4<br>DR, QDR II, QDR II                                             | 1<br>12<br>I+, RLDRAM II, RL | 1<br>20<br>DRAM 3, HMC, M | 2<br>4<br>oSys                                        | 1                  | 1                   |  |

296,8,144,120,24



#### Intel Stratix 10

### INTEL® STRATIX® 10 MX (DRAM SYSTEM-IN-PACKAGE) PRODUCT TABLE

| PRODUCT LIN                                                                                                                                       | NE                                                                                                                                                                                                                                                                                                                                                                                                                                 | MX 1100                                                                                                                                                                                                  | MX 1650                                | MX 1650                                                                                  | MX 1650                                                         | MX 2100                                                                       | MX 2100                                               | MX 2100                                     | MX 2100                                     |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|------------------------------------------------------------------------------------------|-----------------------------------------------------------------|-------------------------------------------------------------------------------|-------------------------------------------------------|---------------------------------------------|---------------------------------------------|--|
| Logic el                                                                                                                                          | lements (LEs) <sup>1</sup>                                                                                                                                                                                                                                                                                                                                                                                                         | 1,092,000                                                                                                                                                                                                | 1,679,000                              | 1,679,000                                                                                | 1,679,000                                                       | 2,073,000                                                                     | 2,073,000                                             | 2,073,000                                   | 2,073,000                                   |  |
| Adaptiv                                                                                                                                           | ve logic modules (ALMs)                                                                                                                                                                                                                                                                                                                                                                                                            | 370,080                                                                                                                                                                                                  | 569,200                                | 569,200                                                                                  | 569,200                                                         | 702,720                                                                       | 702,720                                               | 702,720                                     | 702,720                                     |  |
| ALM reg                                                                                                                                           | gisters                                                                                                                                                                                                                                                                                                                                                                                                                            | 1,480,320                                                                                                                                                                                                | 2,276,800                              | 2,276,800                                                                                | 2,276,800                                                       | 2,810,880                                                                     | 2,810,880                                             | 2,810,880                                   | 2,810,880                                   |  |
| Hyper-F                                                                                                                                           | Registers from Intel® Hyperflex™ FPGA architecture                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                          |                                        | Millions of Hyper-Re                                                                     | gisters distributed th                                          | roughout the mono                                                             | lithic FPGA fabric                                    |                                             |                                             |  |
| Progran                                                                                                                                           | mmable clock trees synthesizable                                                                                                                                                                                                                                                                                                                                                                                                   | Hundreds of synthesizable clock trees                                                                                                                                                                    |                                        |                                                                                          |                                                                 |                                                                               |                                                       |                                             |                                             |  |
| HBM2 h                                                                                                                                            | high-bandwidth DRAM memory (GBytes)                                                                                                                                                                                                                                                                                                                                                                                                | 3.25                                                                                                                                                                                                     | 8                                      | 16                                                                                       | 8                                                               | 8                                                                             | 8                                                     | 16                                          | 8                                           |  |
| eSRAM                                                                                                                                             | memory blocks                                                                                                                                                                                                                                                                                                                                                                                                                      | 1                                                                                                                                                                                                        | 2                                      | 2                                                                                        | 2                                                               | 2                                                                             | 2                                                     | 2                                           | 2                                           |  |
| eSRAM<br>eSRAM<br>M20K n                                                                                                                          | 1 memory size (Mb)                                                                                                                                                                                                                                                                                                                                                                                                                 | 45                                                                                                                                                                                                       | 90                                     | 90                                                                                       | 90                                                              | 90                                                                            | 90                                                    | 90                                          | 90                                          |  |
| M20K n                                                                                                                                            | memory blocks                                                                                                                                                                                                                                                                                                                                                                                                                      | 4,401                                                                                                                                                                                                    | 6,162                                  | 6,162                                                                                    | 6,162                                                           | 6,847                                                                         | 6,847                                                 | 6,847                                       | 6,847                                       |  |
| M20K n                                                                                                                                            | memory size (Mb)                                                                                                                                                                                                                                                                                                                                                                                                                   | 86                                                                                                                                                                                                       | 120                                    | 120                                                                                      | 120                                                             | 134                                                                           | 134                                                   | 134                                         | 134                                         |  |
| MLAB n                                                                                                                                            | memory size (Mb)                                                                                                                                                                                                                                                                                                                                                                                                                   | 6                                                                                                                                                                                                        | 9                                      | 9                                                                                        | 9                                                               | 11                                                                            | 11                                                    | 11                                          | 11                                          |  |
| Variable                                                                                                                                          | e-precision digital signal processing (DSP) blocks                                                                                                                                                                                                                                                                                                                                                                                 | 2,520                                                                                                                                                                                                    | 3,326                                  | 3,326                                                                                    | 3,326                                                           | 3,960                                                                         | 3,960                                                 | 3,960                                       | 3,960                                       |  |
| 18 x 19                                                                                                                                           | multipliers                                                                                                                                                                                                                                                                                                                                                                                                                        | 5,040                                                                                                                                                                                                    | 6,652                                  | 6,652                                                                                    | 6,652                                                           | 7,920                                                                         | 7,920                                                 | 7,920                                       | 7,920                                       |  |
| Peak fix                                                                                                                                          | xed-point performance (TMACS) <sup>2</sup>                                                                                                                                                                                                                                                                                                                                                                                         | 10.1                                                                                                                                                                                                     | 13.3                                   | 13.3                                                                                     | 13.3                                                            | 15.8                                                                          | 15.8                                                  | 15.8                                        | 15.8                                        |  |
| Peak flo                                                                                                                                          | oating-point performance (TFLOPS) <sup>3</sup>                                                                                                                                                                                                                                                                                                                                                                                     | 4.0                                                                                                                                                                                                      | 5.3                                    | 5.3                                                                                      | 5.3                                                             | 6.3                                                                           | 6.3                                                   | 6.3                                         | 6.3                                         |  |
| 100000000000000000000000000000000000000                                                                                                           | rocessor system                                                                                                                                                                                                                                                                                                                                                                                                                    | management unit, cache coherency unit, hard memory controllers, USB 2.0 x2, 1G EMAC x3, UART x2, serial peripheral interface (SPI) x4, I <sup>2</sup> C x5, general-purpose timers x7, watchdog timer x4 |                                        |                                                                                          |                                                                 |                                                                               |                                                       |                                             |                                             |  |
| res                                                                                                                                               |                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                          |                                        |                                                                                          |                                                                 |                                                                               |                                                       | eripheral interface                         | (SPI) x4,                                   |  |
|                                                                                                                                                   |                                                                                                                                                                                                                                                                                                                                                                                                                                    | Yes                                                                                                                                                                                                      | -                                      |                                                                                          |                                                                 |                                                                               |                                                       | eripheral interface                         | (SPI) x4,                                   |  |
| Maximu                                                                                                                                            | um user I/O pins                                                                                                                                                                                                                                                                                                                                                                                                                   | Yes<br>448                                                                                                                                                                                               | - 656                                  |                                                                                          |                                                                 |                                                                               |                                                       | eripheral interface                         |                                             |  |
| Maximu<br>LVDS pa                                                                                                                                 |                                                                                                                                                                                                                                                                                                                                                                                                                                    | 11777                                                                                                                                                                                                    |                                        | l <sup>2</sup> C x5, ge                                                                  | eneral-purpose time<br>–                                        | rs x7, watchdog time<br>–                                                     | er x4<br>-                                            | -                                           | _                                           |  |
| Maximu<br>LVDS pa                                                                                                                                 | um user I/O pins                                                                                                                                                                                                                                                                                                                                                                                                                   | 448                                                                                                                                                                                                      | 656                                    | l <sup>2</sup> C x5, gr                                                                  | eneral-purpose time<br>–<br>584                                 | rs x7, watchdog time<br>-<br>640                                              | er x4<br>-<br>656                                     | -<br>656                                    | - 584                                       |  |
| Maximu LVDS pa Total fu GXE tra (up to 3                                                                                                          | um user I/O pins<br>airs 1.6 Gbps (RX or TX)                                                                                                                                                                                                                                                                                                                                                                                       | 448<br>216                                                                                                                                                                                               | 656<br>312                             | I <sup>2</sup> C x5, gr<br>-<br>656<br>312                                               | eneral-purpose time<br>-<br>584<br>288                          | rs x7, watchdog time<br>-<br>640<br>312                                       | er x4<br>-<br>656<br>312                              | -<br>656<br>312                             | -<br>584<br>288                             |  |
| Maximu LVDS pa Total fu GXE tra (up to 3                                                                                                          | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ                                                                                                                                                                                                                                                                                                              | 448<br>216<br>48                                                                                                                                                                                         | 656<br>312<br>96                       | 1 <sup>2</sup> C x5, gr<br>-<br>656<br>312<br>96                                         | eneral-purpose time<br>-<br>584<br>288<br>96                    | 640<br>312<br>48                                                              | 656<br>312<br>96                                      | -<br>656<br>312<br>96                       | -<br>584<br>288<br>96                       |  |
| Maximu LVDS pa Total fu GXE tra (up to 3 GXT tra GX tran                                                                                          | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps)                                                                                                                                                                                                                                                                                                     | 448<br>216<br>48<br>0                                                                                                                                                                                    | 656<br>312<br>96                       | 1 <sup>2</sup> C x5, g0<br>-<br>656<br>312<br>96                                         | - 584<br>288<br>96                                              | rs x7, watchdog time<br>-<br>640<br>312<br>48<br>0                            | 656<br>312<br>96                                      | -<br>656<br>312<br>96                       | 584<br>288<br>96                            |  |
| o or truit                                                                                                                                        | um user I/O pins pairs 1.6 Gbps (RX or TX)  ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCIe*) hard intellectual property (IP) blocks                                                                                                                                                              | 448<br>216<br>48<br>0                                                                                                                                                                                    | 656<br>312<br>96<br>0                  | 1 <sup>2</sup> C x5, g0<br>-<br>656<br>312<br>96<br>0                                    | 96<br>72<br>16                                                  | - 640<br>312<br>48<br>0                                                       | 656<br>312<br>96<br>0                                 | -<br>656<br>312<br>96<br>0                  | -<br>584<br>288<br>96<br>72                 |  |
| PCI Exp<br>(Gen3 x                                                                                                                                | um user I/O pins pairs 1.6 Gbps (RX or TX)  ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCIe*) hard intellectual property (IP) blocks                                                                                                                                                              | 448<br>216<br>48<br>0<br>32<br>16                                                                                                                                                                        | 656<br>312<br>96<br>0<br>64<br>32      | 1°C x5, g0<br>-<br>656<br>312<br>96<br>0<br>64<br>32                                     | 96<br>72<br>16<br>8                                             |                                                                               | - 656<br>312<br>96<br>0<br>64<br>32                   | -<br>656<br>312<br>96<br>0<br>64<br>32      | -<br>584<br>288<br>96<br>72<br>16<br>8      |  |
| PCI Exp<br>(Gen3 x<br>100G Et                                                                                                                     | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCIe*) hard intellectual property (IP) blocks x16)                                                                                                                                                          | 448<br>216<br>48<br>0<br>32<br>16                                                                                                                                                                        | 656<br>312<br>96<br>0<br>64<br>32      | 1°C x5, g0<br>-<br>656<br>312<br>96<br>0<br>64<br>32<br>4                                | 288<br>96<br>72<br>16<br>8                                      | rs x7, watchdog time - 640 312 48 0 32 16                                     | 96<br>0<br>64<br>32<br>4                              | -<br>656<br>312<br>96<br>0<br>64<br>32      | -<br>584<br>288<br>96<br>72<br>16<br>8      |  |
| PCI Exp<br>(Gen3 x<br>100G Et                                                                                                                     | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ BO Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCIe*) hard intellectual property (IP) blocks k16) Ethernet MAC (no FEC) hard IP blocks                                                                                                                     | 448<br>216<br>48<br>0<br>32<br>16<br>2                                                                                                                                                                   | 656<br>312<br>96<br>0<br>64<br>32<br>4 | 1°C x5, g0<br>-<br>656<br>312<br>96<br>0<br>64<br>32<br>4                                | 96<br>72<br>16<br>8<br>1                                        | 7, watchdog time 640 312 48 0 32 16 2 0                                       | 96<br>0<br>64<br>32<br>4<br>0                         | -<br>656<br>312<br>96<br>0<br>64<br>32<br>4 | -<br>584<br>288<br>96<br>72<br>16<br>8      |  |
| PCI Exp<br>(Gen3 x<br>100G Et<br>100G Et<br>Memory                                                                                                | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCIe*) hard intellectual property (IP) blocks x16) Ethernet MAC (no FEC) hard IP blocks                                                                                                                     | 448<br>216<br>48<br>0<br>32<br>16<br>2                                                                                                                                                                   | 656<br>312<br>96<br>0<br>64<br>32<br>4 | 1°C x5, gr<br>-<br>656<br>312<br>96<br>0<br>64<br>32<br>4<br>4<br>0<br>DDR4, DDR3, DDR2, | 96<br>72<br>16<br>8<br>1                                        | 7, watchdog time 640 312 48 0 32 16 2 0                                       | 96<br>0<br>64<br>32<br>4<br>0                         | -<br>656<br>312<br>96<br>0<br>64<br>32<br>4 | -<br>584<br>288<br>96<br>72<br>16<br>8      |  |
| PCI Exp<br>(Gen3 x<br>100G Et<br>100G Et<br>Memory                                                                                                | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCIe*) hard intellectual property (IP) blocks (x16) Ethernet MAC (no FEC) hard IP blocks Ethernet MAC + FEC hard IP blocks (xy) devices supported                                                           | 448<br>216<br>48<br>0<br>32<br>16<br>2                                                                                                                                                                   | 656<br>312<br>96<br>0<br>64<br>32<br>4 | 1°C x5, gr<br>-<br>656<br>312<br>96<br>0<br>64<br>32<br>4<br>4<br>0<br>DDR4, DDR3, DDR2, | 96<br>72<br>16<br>8<br>1                                        | 7, watchdog time 640 312 48 0 32 16 2 0                                       | 96<br>0<br>64<br>32<br>4<br>0                         | -<br>656<br>312<br>96<br>0<br>64<br>32<br>4 | -<br>584<br>288<br>96<br>72<br>16<br>8      |  |
| LVDS part Total fur to 3 GXE tran (up to 3 GXT tran GX tran GE Total fur to 3 GXT tran GX tran 100G Et 100G Et Memory ackage Optical 1760 pin (42 | um user I/O pins pairs 1.6 Gbps (RX or TX) ull duplex transceiver count ansceiver count - PAM4 (up to 58 Gbps) or NRZ 30 Gbps) ansceiver count - NRZ (up to 28.3 Gbps) ansceiver count - NRZ (up to 17.4 Gbps) bress* (PCle*) hard intellectual property (IP) blocks (x16) Ethernet MAC (no FEC) hard IP blocks Ethernet MAC + FEC hard IP blocks (xy devices supported (ions and I/O Pins: General-Purpose I/O (GPIO) Count, High | 448<br>216<br>48<br>0<br>32<br>16<br>2<br>2<br>0                                                                                                                                                         | 656<br>312<br>96<br>0<br>64<br>32<br>4 | 1°C x5, gr<br>-<br>656<br>312<br>96<br>0<br>64<br>32<br>4<br>4<br>0<br>DDR4, DDR3, DDR2, | 96<br>72<br>16<br>8<br>1<br>1<br>1<br>2<br>DDR, QDR II, QDR II+ | - 48<br>0<br>312<br>48<br>0<br>32<br>16<br>2<br>2<br>0<br>-, RLDRAM II, RLDRA | 96<br>0<br>64<br>32<br>4<br>4<br>0<br>M 3, HMC, MoSys | -<br>656<br>312<br>96<br>0<br>64<br>32<br>4 | -<br>584<br>288<br>96<br>72<br>16<br>8<br>1 |  |

## FPGA count in CMS trigger for HL-LHC



#### History

#### Long long time ago ...



# Simple Programmable Logic Devices (sPLDs) a) Programmable Read Only Memory (PROMs)





Late 60's

Unprogrammed PROM (Fixed AND Array, Programmable OR Array)

## Simple Programmable Logic Devices (sPLDs) b) Programmable Logic Arrays (PLAs)



**Unprogrammed PLA (Programmable AND and OR Arrays)** 

Most flexible but slower

## Simple Programmable Logic Devices (sPLDs) c) Programmable Array Logic (PAL)





Unprogrammed PAL (Programmable AND Array, Fixed OR Array)

### Complex PLDs (CPLDs)



Coarse grained

(EE)PROM based



#### FPGAs ...



#### Design Considerations (SRAM Config.)



#### Configuration at power-up



Typical FPGA configuration time: milliseconds

#### Programming via JTAG

Joint Test Action Group



#### Remote programming



Timing in FPGA design is critical



#### Data paths must respect setup and hold times



Setup time is the amount of time required for the input to a Flip-Flop to be stable before a clock edge. Hold time is similar to setup time, but it deals with events after a clock edge occurs.

124

Timing in FPGA design is critical



Timing in FPGA design is critical



- If signals do not arrive at destination on time
- Catastrophic consequences

- Always use dedicated clock networks to distribute clocks
  - Assures that clock is seen at all FFs at same time
  - Other clocking resources
    - Clock capable pins
    - Clock buffers
    - Clock Multiplexers
    - Phase Locked Loops
    - Digital Clock Managers



Do not gate or derive clocks



#### Meeting timing closure



- Place & route step will try to position registers (flip-flops) and logic so that data path delays respect setup and hold times
- Options to meet timing
  - Instruct Place & route to use higher effort level
  - Add register stages & reduce amount of logic in data path (increases latency)
  - Choose location of inputs and outputs (at board design, or through optical patch panel)
  - Placement (area) constraints (give hints to the place & route step)
- Good practice
  - Whenever possible use I/O flip –flops (i.e FFs inside input/output cells)
     Ensures timing with respect to external components is respected