# **Pulsar IIb Design, System Integration and Next-Generation Full Mesh ATCA Backplane Test Results**



Zijun Xu<sup>1</sup>, Zhen Hu<sup>2</sup>, Jamieson Olsen<sup>2</sup>, Tiehui Ted Liu<sup>2</sup>, Lucas Arruda Ramalho<sup>3</sup>, Vitor Finotti Ferreira<sup>3</sup>

<sup>1</sup> Peking University, Beijing, CHINA <sup>2</sup> Fermi National Accelerator Laboratory, Batavia, Illinois, USA <sup>3</sup> São Paulo Research and Analysis Center, São Paulo, BRASIL

# Abstract

and

Track Stubs

SRAM

High Speed Low Latency

Look Up Table

(optional)

The Pulsar IIb is a custom ATCA full mesh enabled FPGA-based processor board which has been designed with the goal of creating a scalable architecture abundant in flexible, non-blocking, high bandwidth interconnections. The design has been motivated by silicon-based tracking trigger needs for LHC experiments. Here we describe the Pulsar

II hardware and its demonstrated interconnection capabilities, including the test results with the ATCA 40G+ full mesh backplane. In addition we present the ProtoPRM mezzanine board which can serve as the core pattern recognition and track fitting engine for CMS L1 Tracking Trigger R&D using the associative memory approach.



**Fermilab** 

# **Pulsar IIb Architecture**



#### **Pulsar IIb Front Board Features**

- Xilinx Virtex 7 FPGA
- XC7VX690T -2 FFG1927C
- 690,000 logic cells
- 52 Mbit dual port BlockRAM
- 80 GTH transceivers up to 11.3 Gbps (-2 speed grade)



**Pulsar IIb Front Board** 

Master

**FPGA** 

KU040

or

**4** GTH

QSFP+

KU060



- 40 GTH for Rear Transition Module (RTM)
- 28 GTH for Full Mesh Fabric Interface
- 12 GTH for Mezzanines
- 256 MB DDR3-1066
- Four FMC Mezzanine Cards
- 35W per card, up to 60W possible
- 34 unidirectional LVDS pairs per card
- 3 SERDES (GTH) lanes per card
- IPMC Mezzanine Card
- Basic IPMI protocol support including hot swap for front board and RTM
- Monitors over 30 temperature, voltage, and current sensors with data records and thresholds
- 100BASE-T Ethernet on the Base Interface for slow controls and JTAG programming (XVC protocol)
- M-LVDS clock distribution on ATCA backplane
- Programmable low-jitter reference clocks
- Zone-3 connectors are PICMG 3.8 compliant

#### **ProtoPRM Features**

- Dual Kintex UltraScale FPGAs
- KU040 or KU060, -2 speed grade
- Up to 580k logic cells
- Up to 38 Mbit dual port BlockRAM
- 16.3 Gbps GTH serial transceivers
- Up to 8 lanes for communication with Pulsar2b
- 8 lane Master-Slave FPGA local bus
- 4 lanes per FPGA for QSFP+ optical modules
- 36 Mbit low latency DDR II+ static RAM
- Socket for VIPRAM ASIC (TQFP176)
- Dual high pin count FMC connectors

Reidar Hahn, Fermilab VMS

## **Demonstrated Interconnection Bandwidth**

- ✓ Pulsar IIb FPGA to full mesh backplane channels **10 Gbps**
- ✓ Pulsar IIb FPGA to FMC Mezzanine (GTH) **10 Gbps**
- ✓ Pulsar IIb FPGA to FMC Mezzanine (parallel LVDS) **1 Gbps/pair**
- ✓ Pulsar IIb FPGA to Rear Transition Module QSFP+ over fiber **10 Gbps**

Achieved performance meets all expectations and satisfies the needs for the Tracking Trigger Demonstration



#### **Prototype Pattern Recognition Mezzanine Board**

- Slave FPGA can be used for implementing PRAM ASIC functionalities for performance and optimization studies.
- ✓ ProtoPRM FPGA to QSFP+ over fiber **14 Gbps**
- ✓ ProtoPRM Master-Slave FPGA interconnections **16.3 Gbps**

# **Tracking Trigger Demonstration System**

Parallel DDR

SDR/DDR/Serial

LVDS

8 GTH

VIPRAM

ASIC

Slave

**FPGA** 

(PRAM)

KÙ040/KU060

QSFP+

🚺 4 GTH



Our CMS L1 Tracking Trigger demonstration system requires high-bandwidth non-blocking communication flexible channels between boards at the shelf level. The upper ATCA shelf consists of ten Pulsar IIb boards which are configured as Pattern Recognition Boards (PRB). These ten PRBs form an array of processing engines for a single  $\eta$ - $\phi$  trigger tower. Each PRB receives data over fiber from up to 40 front end modules at rates up to 10 Gbps per lane. High speed low latency data transfers between PRBs take place on the full mesh backplane channels. A simple time-multiplexed data

advantage of the full mesh ATCA fabric interface. Once the complete event has been received by the PRB the stubs are pushed up to one of two protoPRM boards. The protoPRM uses Pattern Recognition Associative Memory (PRAM)



devices to quickly find coarse resolution roads. These found roads makes downstream track fitting easier because the stubs of interest are already organized for track candidates. Track parameters are computed by multiple parallel track fitting engines implemented in the master protoPRM UltraScale FPGA. The total latency budget from input stubs

# **High Speed Serial Link Performance**

High speed serial links used in the demonstration system need to sustain a data rate of 7.68Gbps per lane. Using the Pulsar II hardware we have explored and characterized GTH serial transceiver performance using 8b/10b and 64b/66b based encoding methods.

#### Link Encoding

64b/66b encoding is very low overhead (~3%) when compared to 8b/10b encoding (25%). Using 64b/66b encoding we can support the target data rate with a line rate of 8 Gbps, whereas with 8b/10b encoding a line rate of 10 Gbps is required. We have achieved stable error-free operation with both 8b/10b encoding (modeled using the PRBS-7 test pattern) at 10 Gbps and 64b/66b encoding (using the PRBS-31 pattern) at 8 Gbps on all channels in the system. The eye diagrams shown below were generated with default transceiver parameters (TX output swing, TX pre/post emphasis, RX termination, RX DFE/LPM equalization, etc.) on a typical backplane fabric link.



### Link Latency

In general link latency decreases as the line rate increases as shown in the chart (right). These latency figures have been determined by measuring 400 GTH links in between the DSB and PRB boards. For a given line rate latency variation is caused by RX buffer and clock correction logic. Further optimization studies are in progress.



#### arriving at the RTM to found tracks is 4 $\mu$ s.

The lower ATCA shelf (above) consists of ten Pulsar IIb boards configured as Data Source Boards (DSB). DSBs transmit simulated module data at up to 10 Gbps per lane through the RTM boards and over 100 QSFP+ fibers as shown to the right. The aggregate data rate between the shelves in the system is >4 Tbps.

The VME crate in the short rack contains a TTCci board from CERN which generates the machine clock and emulated beam crossing TTC control signals. The TTCci optical output is passively split and sent to FMC TTC receiver mezzanine cards (below) mounted on one of the Pulsar IIb boards in each shelf. TTC timing signals are decoded in the

Pulsar IIb FPGA and re-broadcast on the backplane to allow all Pulsar IIb boards in the system to synchronize to a common 40MHz clock and bunch crossing signals.







## Full Mesh Backplane Performance



We have been working with COMTEL to characterize the full mesh backplane. In late 2014 COMTEL delivered to us their latest "Air/Plane" 100G full mesh ATCA backplane. This new backplane design, which uses an advanced low loss substrate material and careful layout to minimize the slot dependences, has to date yielded the best and most consistent link performance with all eight Pulsar IIb prototype boards running all GTH transceivers simultaneously (56 lanes) at 10 Gbps (PRBS-7 pattern). After a few days all channels reported no errors with a BER of 10<sup>-15</sup>.