

Managed by Fermi Research Alliance, LLC for the U.S. Department of Energy Office of Science

# ProtoPRM: An FPGA-Based High Performance Pattern Recognition Memory Track Finder Mezzanine

#### J. Olsen<sup>1</sup>, T. Liu<sup>1</sup>, J. Wu<sup>1</sup>, Z. Hu<sup>1</sup>, Z. Xu<sup>2</sup>

<sup>1</sup> Fermi National Accelerator Laboratory, Batavia, Illinois U.S.A. <sup>2</sup> Peking University, Beijing, CHINA

#### 29 September 2016



Data Processing and Electronics

# Outline

- L1 Tracking Trigger Introduction and Challenges
- Pattern Recognition with Associative Memory
- Pulsar II Hardware Components
- Demonstration System
- Data Delivery
- ProtoPRM Mezzanine Firmware
  - Data Organizer
  - PRAM
- Conclusion



# Level-1 Track Trigger Challenges



#### Data Formatting and Delivery

Partitioning

#### **Data Processing**

- Pattern Recognition (AM)
- Track Fitting (FPGA)

#### We use the divide and conquer approach...



J.Olsen

### **Detector Partitioning**

in n

regions

9

regions in r-phi

8







- 48 Trigger Towers
- ~400 front end modules/tower
- For demonstration purposes assume one processor shelf per tower



# **Time Multiplexing**

- A trigger tower processor consists of an array of independent engines
  - Pattern Recognition (AM)
  - Track Fitting (FPGA)
- High speed, low latency, non-blocking communication channels for efficient data delivery
- 20x time multiplexed
- Event rate 25ns  $\rightarrow$  500ns





# Pattern Recognition Associative Memory (PRAM)



- Factor of ~10x occupancy reduction
- More importantly, the hits/stubs are organized in found roads ("hits of interest") which makes the track fitting easier

🛟 Fermilab

29 Sept. 2016

J.Olsen

### **Pulsar II Hardware**







7 TWEPP-16 Karlsruhe

# **Pulsar IIb Front Board**

- Xilinx Virtex 7 FPGA
  - XC7VX690T -2 FFG1927 C
- 80 GTH serial transceivers
  - up to 11.3 Gbps (-2)
  - 40 for RTM
  - 28 for Full Mesh Fabric
  - 12 for Mezzanines
- Four FMC Mezzanine Cards
  - High Pin Count (HPC)
  - 35W, up to 60W possible
  - 34 pair LVDS/slot
  - 3 GTH lanes/slot
- Intelligent RTM / PICMG 3.8
- IPMC Mezzanine Card
- TTC timing and control over ATCA backplane





J.Olsen

# **Rear Transition Module (RTM)**



- 10 QSFP+ transceivers
- 400 Gbps full duplex
- ATCA/PICMG 3.8 spec
- MMC is a ARM Cortex-M3 micro
  - Read sensors, access QSFP registers
  - Basic IPMI functionality: hot swap, LEDs, handle, etc.





# Pattern Recognition Mezzanine (PRM)

- Designed to explore high performance and low latency PRAM architectures
- Single PRAM channel, pipelined readout
- Kintex UltraScale KU060 FPGAs
- Master FPGA
- Formatting, Data Organizer, Combiner, and Track Fitters
- Slave FPGA
  - PRAM emulation
  - 1k to 4k patterns
  - Develop new high speed FPGA-PRAM interfaces
  - Local bus is LVDS + 8 x 16 Gbps lanes
- AM ASIC (VIPRAM\_L1CMS)



#### VIPRAM\_L1CMS (130nm, two-tier) ASIC wafers in 3D processing now



J.Olsen

### **ProtoPRM Board**

QSFP+ 4 x 10Gbps

Slave FPGA Kintex UltraScale -KU040 or KU060

> 2 x FMC HPC connectors; each has 24 pair LVDS and 4 GTH (up to 16 Gbps)



VIPRAM\_L1CMS ASIC (TQFP176)

Master FPGA Kintex UltraScale KU040 or KU060

> Static RAM Cypress DDR II+ 400MHz 4MB



# **Demonstration System**



- One trigger tower = 1 ATCA shelf
- This demonstration system supports different PRMs
  - FNAL ProtoPRM
  - INFN AM05/AM06 PRM
- Time multiplexed transfers to track finder engines which process one event
- Up to 20 PRMs per shelf
- This system architecture utilizes the ATCA full mesh backplane to fullest extent...



# **Demonstration System Data Flow**

#### Pattern Recognition Board (PRB) shelf

- One Trigger Tower
- 10 Pulsar IIb
- Some boards with PRM Mezzanines



#### Data Source Board (DSB) shelf

- Emulates the output of ~400 modules
- 10 Pulsar IIb
- 100 QSFP+ fibers
- > 4Tbps







29 Sept. 2016

## **Data Delivery**



## Data Transfers on the Full Mesh Backplane



All ten boards do this continuously. First stubs to ProtoPRM by 1.5µs.

- Each Pulsar2b receives stubs on 40 links
- Stubs arrive in a "train" which contains stubs for up to 8 BX
- New train every 200ns
- Pulsar2b FPGA sorts stubs by BX and sends to 7 or 8 neighbors over the backplane
- Backplane transfers must complete in 200ns
- Each board can send up to ~100 stubs to each neighbor
- Full mesh channels are 2 x 10 Gbps, non-blocking



# **ProtoPRM Firmware Overview**



- Conversion/Lookup Functions
  - Local stub to SSID
  - Road-ID to SSID
  - Local stub to global stub
- PRAM Bank
  - 6 layer
  - Pipelined Readout
  - ASIC and FPGA emulation
- Data Organizer
  - Pipelined to match AM
  - Stores stubs at address pointer by SSID
  - Stores multiple stubs per SSID
  - Writes like FIFO
  - Reads like RAM
- Combiner generates multiple stub combinations
- Track Fitter

J.Olsen

29 Sept. 2016

🛟 Fermilab

# **Data Organizer: Overview**

- "smart database" stores stubs at the address pointed to by the SSID
  - SSID = 12 bits  $\rightarrow$  4k memory locations
  - Store up to 4 stubs per SSID
- The DO architecture is fundamentally geared towards read-modify-write operations
- A redesign of the DO was needed because:
  - Our VIPRAM/PRAM readout is pipelined
  - The data organizer must concurrently store stubs for event N while recalling stubs for event N-1
  - DO must "ping pong" dual RAM banks
- RAM "scrubbing" functions are implemented
  - Periodic clearing of the RAM is done with writes (no global reset)

🔁 Fermilab

29 Sept. 2016

Prevent stubs from old events from being read out ("masking")

# **Data Organizer: Operation**



- New design eliminates read-modify-write cycles
- Use 7-Series/UltraScale BlockRAM "read first mode"
  - As data is written into BlockRAM the *previous data* at that location is pushed to the output
- Four cascaded RAMs hold the stub information
- Simple, fast, and efficient configuration resembles an "array of FIFOs"
- Read latency is very fast, just like reading BlockRAM, stubs output in parallel
- Dual port BlockRAMs simplify the "ping pong" mechanism
  - Port A is used for writing event N stubs
  - Port B is used for reading event N-1 stubs



# **Data Organizer: Design**



JTO 2016-05-06

**7** Fermilab

As shown, DO can store up to 4 stubs per SSID. This can be increased easily by adding additional BlockRAMs.



One DO per layer is required. BlockRAMs

are wide enough to store global stubs.

19

J.Olsen

# **PRAM** in **FPGA**

- Fully synthesizable VHDL model
  of current VIPRAM\_L1CMS
  - Multi-tier pipelined readout
  - "CAM tier" processes stubs for the current event
  - "I/O tier" captures road flags and outputs road addresses for previous event
- Design optimized for 7-Series/UltraScale architecture
- Fairly close to "cycle accurate" timing
- 6 input layer buses, 12 bits per layer
- 1k to 4k patterns, fully programmable
- Option for "don't care bits"



20 TWEPP-16 Karlsruhe

J.Olsen

29 Sept. 2016

🚰 Fermilab

## **PRAM in FPGA: Road Serialization Logic**



21 TWEPP-16 Karlsruhe

J.Olsen

29 Sept. 2016

**‡**Fermilab

# **PRAM in FPGA: Interface**

- The ProtoPRM Master-Slave local bus consists of:
  - 8 GTH lanes, up to 16.3 Gbps / lane
  - 24 LVDS pairs, up to 1 Gbps / pair
- We plan to use this local bus to develop high performance interfaces for future PRAM ASICs
- Low latency is critical for this path
  - Roads must get back to the Data Organizer before next event arrives
  - GTH transceiver latency is a bit too high (~150ns)
  - Source synchronous DDR/serial has the lowest latency (~12ns with 240MHz clock)
- This interface is similar to VIPRAM\_L1CMS ASIC

# **ProtoPRM timing**

|                                              | Mahaa                                  |                                                               |                                                                                                                                                                                                          |                                                         |
|----------------------------------------------|----------------------------------------|---------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|
| Signal name                                  | Value                                  |                                                               |                                                                                                                                                                                                          |                                                         |
| Ar CLK_reg                                   | 1 to U                                 |                                                               |                                                                                                                                                                                                          | J LJ LJ L 7630 ns                                       |
| nr lesting                                   | true                                   |                                                               |                                                                                                                                                                                                          |                                                         |
| nr lest_Steps                                | 1525 to 1526                           |                                                               | <u> </u>                                                                                                                                                                                                 |                                                         |
| Ar PRM_LOAD_reg                              | 0                                      |                                                               |                                                                                                                                                                                                          |                                                         |
| ™ PRM_EN_reg                                 | 1                                      |                                                               |                                                                                                                                                                                                          |                                                         |
| # Pulsar2b_EN_reg                            | 0                                      |                                                               | ← 21 CI K                                                                                                                                                                                                |                                                         |
| nr LUT_Initializer_EN_reg                    | 0                                      |                                                               |                                                                                                                                                                                                          | 240MHz                                                  |
| nr AM_Initializer_EN_reg                     | 0                                      |                                                               |                                                                                                                                                                                                          |                                                         |
| ⊞ nr L2G_LUT_Init_reg                        | {0, 0, 0, 0, 0, 00000, 00000, 00000, 0 | Local Stubs in                                                |                                                                                                                                                                                                          |                                                         |
|                                              | 0, 000, 00000000000000                 |                                                               | Pood from                                                                                                                                                                                                |                                                         |
| nr AM_Pattern_Load_reg                       | 0                                      |                                                               | ittoau itoiti                                                                                                                                                                                            |                                                         |
| ⊞ nr AM_Pattern_reg                          | 0000000                                |                                                               |                                                                                                                                                                                                          |                                                         |
| ⊞ # mode_reg                                 | 0                                      |                                                               |                                                                                                                                                                                                          |                                                         |
| ⊞ nr LOC_reg                                 | 0000000, 0000000, 0000000, 00000       | 0000000, 0000000, 0000000, 0000000, 000000                    | 000000                                                                                                                                                                                                   | , 0000000, 000000 <mark>, 00</mark> 00000, 0000000, 000 |
| # EOE_LOC_reg                                | 0                                      |                                                               |                                                                                                                                                                                                          |                                                         |
| ⊞ nr SS_reg                                  | 000, 000, 000, 000, 000, 000           | 000, 000, 000, 000, 000, 000                                  |                                                                                                                                                                                                          | 000, 000, 000, 000, 000, C                              |
| # EOE_SS_reg                                 | 0                                      |                                                               |                                                                                                                                                                                                          |                                                         |
|                                              | 000                                    |                                                               | 00 550                                                                                                                                                                                                   |                                                         |
| 🖃 💵 GlobalStubsArray_reg                     | {{0, 00000, 00000, 00000, 00}, {0, 00  |                                                               |                                                                                                                                                                                                          |                                                         |
|                                              | {0, 00000, 00000, 00000, 00}, {0, 000  |                                                               | {0, 00000, 00000, 00000, 00}, {0, 00000, 00000, 00000, 00}, {0, 00000, 00000, 00000, 00000, 00000, 00000, 00000,                                                                                         | 000, 00000, 00;, (0, 00000, 00000, 00                   |
|                                              | {0, 00000, 00000, 00000, 00}, {0, 000  | Event                                                         | {0, 00000, 00000, 00000, 00}, {0, 00000, 00000, 00000, 00}, {0, 00 <mark>000, 00000, 20</mark> 00, 0, 00000, 0                                                                                           | 1000, 00000, 00\$, {0, 00000, 00000, 00000, 00          |
|                                              | {0, 00000, 00000, 00000, 00}, {0, 000  |                                                               | {0, 00000, 00000, 00000, 00}, {0, 00000, 00000, 00000, 00}, {0, 00000, 00000, 00000, 00000, 00000, 00000, 00000                                                                                          | 000, 00000, 00\$, {0, 00000, 00000, 00000, 00           |
| 🖃 🛯 GlobalStubsArray_reg[0]                  | {1, 23156, 00000, 21CC9, 00}, {1, 22   | {0, 00000, 000ba, 0000a, 00, 10, 10, 00000, 0000, 00000, 00}, | 0, 00000, 00000, 00000, 00;, {0, 00000, 00000, 00000, 00;, {0, 00000, 00000 <mark>,</mark> 00000, 00;, {0, 00000, 00;, {0, 00000, 00;, {0, 00000, 00;, {0, 00000, 00;, {0, 00000, 00;, {0, 00000, 00;}}} | <u>008</u> X X                                          |
|                                              | 1, 23156, 00000, 21CC9, 00             |                                                               | 0, 00000, 00000, 00000, 00                                                                                                                                                                               |                                                         |
|                                              | 1, 22F5E, 00000, 1E665, 00             |                                                               | 0, 00000, 00000, 00000, 00                                                                                                                                                                               |                                                         |
|                                              | 1, 201B8, 00000, 22ACA, 00             |                                                               | 0, 00000, 00000, 00000, 00                                                                                                                                                                               |                                                         |
|                                              | 1, 1CB32, 00000, 1C1C8, 00             |                                                               | 0, 00000, 00000, 00000, 00                                                                                                                                                                               |                                                         |
|                                              | 1, 1BD27, 00000, 1B892, 00             |                                                               | 0, 00000, 00000, 00000, 00                                                                                                                                                                               |                                                         |
|                                              | 1, 1B794, 00000, 1AA33, 00             |                                                               | 0, 00000, 00000, 00000, 00                                                                                                                                                                               |                                                         |
|                                              |                                        |                                                               |                                                                                                                                                                                                          |                                                         |
|                                              |                                        |                                                               |                                                                                                                                                                                                          |                                                         |
| Current 1                                    |                                        |                                                               |                                                                                                                                                                                                          |                                                         |
|                                              |                                        |                                                               |                                                                                                                                                                                                          |                                                         |
| <u>7582 500 ps</u> 45 ms <b>OTODAT STUDS</b> |                                        |                                                               |                                                                                                                                                                                                          |                                                         |
| Cursor3 7 527 50 28                          |                                        |                                                               |                                                                                                                                                                                                          |                                                         |

# The overhead of conversion functions, lookup tables, data organizer, and PRAM is on the order of 100ns.



# **ProtoPRM Latency Estimate**



- Target: total latency =< 4µs</li>
- Data delivery =  $1.5\mu s$ 
  - First stubs arrive at ProtoPRM input
- Stubs into Combiner/Track fitter @ 2.2µs
- Combiner/Track fitter latency on the order of 1µs



# Summary

- Our L1CMS track trigger demonstration system is based around time multiplexed data transfers over the ATCA full mesh backplane
- Pulsar IIb front boards make up the backbone of the system
- ProtoPRM mezzanine boards are the track finder engines
- New firmware designs have been optimized for high performance, pipelined, low latency operation
  - Data Organizer
  - PRAM in FPGA
- Demonstration system integration is underway
- We look forward to sharing new results at TWEPP-17!



# **Backup Slides**



J.Olsen

# Link Performance: Full Mesh Backplane

- Many different full mesh backplanes were tested at Fermilab
- In late 2014, we purchased the next generation 100G Air-/-Plane backplane from COMTEL
- ALL of the 56 bidirectional links among 8 Pulsar2b boards were tested at 10.0 Gbps (PRBS7)
- The best and most consistent link performance to date









# Link Performance: RTM



J.Olsen

# Link Performance: Pulsar IIb to protoPRM

• 10.0 Gb/s: BER<1e-14



- 6.25Gb/s and 8.0 Gb/s also tested
- All 6 channels are error free
- Pulsar2b assigns 3 GTH to each FMC, while protoPRM has 4 available
- 10Gb/s is limited by FMC connector





# Link Performance: protoPRM Local Bus

• 16.3 Gb/s, PRBS7: BER <1e-14





- 6.25, 8.0, 10.0 and 12.5 Gb/s also tested
- all 8 channels are error free



# **System Synchronization**

Intra-Shelf

- User clocks on ATCA backplane connect to Pulsar2b FPGA
- Any Pulsar2b board can be master
- 40MHz LHC clock
- TTC A/B Channel Data
- M-LVDS tested to 100MHz (Northwestern Univ.)

#### Inter-Shelf

- TTC receiver FMC mezzanines installed on Pulsar2b boards
- Source is TTCci VME board
- Passive fiber splitter







J.Olsen

29 Sept. 2016

Pulsar2h

TTC

FMC

## **Pulsar2b Backplane Synchronization Logic**



32 TWEPP-16 Karlsruhe

J.Olsen

29 Sept. 2016

**‡** Fermilab