CMS Views for the *Off-Detector* L1 Tracking Trigger Electronics

### Common ATLAS CMS Electronics Workshop

### Ted Liu (FNAL) March. 19, 2014



3/19/2014

### CMS L1 Tracking Trigger:

Will need to reconstruct charged particle trajectories "on-the-fly" for every beam crossing (25 ns, or 40 Million beam crossings per second), from an ocean of input data (bandwidth required to transfer up to ~ 50-100Tb/s)

This requires extremely fast high bandwidth data communication as well as massive pattern recognition power,

with lots known patterns to be compared against the multiple input data streams simultaneously with near zero latency (~ few µs)

This is challenging! 3/19/2014



Ted Liu, CMS Views on L1 Tracking Trigger



# The AM approach

### Pattern Recognition Associative Memory

- Based on CAM cells to match and majority logic to associate hits in different detector layers to a set of pre-determined hit patterns (simple working unit, yet massively parallel)
- Pattern Recognition finishes right after all hits arrive (fast data delivery important)
- Potentially good approach for L1 application (require custom ASIC)
- A PR engine naturally handles a given region: divide & conquer



# → Tracklet Based Track Finding → new concept for L1 tracking trigger

 Form track seeds, tracklets, from pairs of stubs in neighboring layers



3/19/2014

x Being explored for current geometry

Ted Liu, CMS Views on L1 Tracking Trigger

# Tracklet Based Track Finding → new concept, being explored at CMS

- Form track seeds, tracklets, from pairs of stubs in neighboring layers
- Match stubs on road defined by tracklet and IP constraint
- Fit the hits matched to the tracklet using a linearized fit
- Seeding is done in parallel in different layers
- Duplicate tracks are removed if they share 2 or more stubs



### Slide from Anders Ryd (Cornell/USA)

## Comparison of the two approaches

|                                       | AM + TF approach                                                                                                                                                         | Tracklet +TF approach                                                                                                                                                                                       |
|---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| advantages                            | <ul> <li>Proven approach for<br/>silicon based track finding</li> <li>AM pattern recognition<br/>algorithm: simple, fast and<br/>flexible</li> </ul>                     | <ul> <li>New approach for <i>hardware</i> silicon based track finding</li> <li>Software simulation promising</li> <li>Can be implemented in FPGA in principle: no need for custom designed chips</li> </ul> |
| challenges                            | <ul> <li>Requires custom ASIC:<br/>high performance AMchip</li> <li>Track Fitting in FPGA to<br/>be demonstrated for L1</li> <li>New architecture (see below)</li> </ul> | <ul> <li>It is new</li> <li>Feasibility to be demonstrated in hardware (FPGA)</li> </ul>                                                                                                                    |
| Common: Fast Data delivery/sharing to |                                                                                                                                                                          | ring to Pattern Recognition Engines                                                                                                                                                                         |



#### First take a look at: Data formatting challenges for Atlas FTK at L2



Input data from all silicon detector modules has to be formatted into  $64 \eta$ - $\phi$  trigger towers after reformatting and sharing, ready for downstream pattern recognition







3/19/2014 ATLAS FTK: see Alberto Annovi's FTK talk











How to use ATCA backplane for data sharing (1 FPGA per trigger tower, 2 trigger towers per board, 32 boards needed over 4 ATCA shelves)

### ATCA Full-mesh backplane

þ

 $\cap$ 

( )

Only data sharing links shown, not inputs/outputs Figure 8: A 3D representation of FPGA interconnects in the Data Formatter system. 64 FPGAs (green) are connected through the ATCA backplane Fabric Interface (blue), local buses (purple) and inter-shelf links (orange). Each FPGA uses one inter-shelf link. This Ted Liu, CMS Views on L1 Tracking Trigger

### Appendix P Unconstrained Data Volume Study

As previously mentioned the inner detector readout system was not originally designed for a track trigger. Modules were connected to RODs to minimize data rates and balance bandwidth. In this section we consider Data Formatter performance assuming an idealized module-ROD and ROD-DF mapping.

P.1 Data Sharing The cabling was done to optimize for DAQ readout, not trigger

Refer to Figure 15 to compare these idealized results with the "real world" module-ROD cabling constraints.



From "Data Formatter Design Specification", Fermilab-TM-2553-E-PPD, page 78. Available at: <u>http://www-ppd.fnal.gov/EEDOffice-w/Projects/ATCA/</u> /(Pulsar IIa design spec)<sub>Ted Liu, CMS Views on L1 Tracking Trigger</sub> CMS Tracker Layout and Trigger Tower (6 in eta x 8 in phi)

• 15K modules (see talk by Stefano Mersi this morning)



CMS Tracker Layout and Trigger Tower (6 in eta x 8 in phi)

• 15K modules (see talk by Stefano Mersi this morning)



### CMS Trigger Tower Data Sharing



Study done by Giovanni Bianchi (CERN)

Only with immediate neighbors



### CMS L1 tracking trigger for Phase II: 6 (in eta) x8 (in phi) = 48 Trigger towers & their interconnections





Data coming from a given trigger tower may need to be delivered to multiple trigger towers. This happens,

when a stub comes from a detector element is close to the border Between trigger towers, due to the finite curvature of charged particles in the magnetic field and finite size of the beam luminous region along the beam axis.

### Comparison: ATLAS L2 FTK and CMS L1 Track Trigger



3/19/2014

### General considerations for the tower processor Platform for silicon based tracking trigger system

- The tower processor platform must support large numbers of fiber transceivers, used for receiving input links and data sharing
- A flexible, high bandwidth backplane is desirable to quickly transfer data between boards
- The boards should be large enough to support pattern recognition engines and fiber connections, in a comfortable way
- A Full Mesh, 14 slot ATCA shelf is a natural fit as the platform with 12 slots available for processor or payload blades
- This applies to both Atlas FTK and CMS L1 TT, but architecturally they are very different: Atlas FTK: full-mesh used for data sharing CMS L1 TT: full-mesh mostly used for time-multiplexing



14 slot *full mesh* ATCA backplane:



ß

CMS Experiment at LHC, CERN Data recorded: Thu Apr 5 01:18:00 2012 CEST Run/Event: 190389 / 107592030 Lupri section: 138

> CMS Tracking Trigger Towers



For simplicity, let's assume one crate is assigned to one trigger tower

June 2013 - photo by Michael Hoch@CERN ch





CMS Experiment at LHC, CERN Data recorded: Thu Apr 5 01:18:00 2012 CEST Run/Event: 190389 / 107592030 Lupri section: 138

> June 2013 - photo by Michael Hoch⊜CERN ch



ß

CMS Tracking Trigger Towers

ATCA



An ap

AM or other track finding approaches implemented on mezzanine (PR engine)









3/19/2014

Ted Liu, L1 Track Finding Demo Proposal

### Pattern Recognition Board (PRB) data flow



### More advanced configuration

Ten Processors and the Gateway send the event to the target Processor Blade in a round robin scheme.



The full mesh based architecture is highly flexible.

Many performance and bandwidth bottlenecks can be solved/avoided/relaxed simply by better configurations.

This also makes an early technical demonstration feasible using today's technology. The flexible architecture is a good platform for a vertical slice demonstration and beyond. 3/19/2014 Ted Liu, CMS Views on L1 Tracking Trigger





System size shrinks with better AMchip performance:

If 2X more AM pattern density, or 2X higher AM speed,  $\rightarrow$  2 x less system size (48 crates  $\rightarrow$  24 crates)

### Pattern Recognition Mezzanine (PRM)

## Relaxed Performance Requirements (in the case of 10 PRBs with ~40 PRMs):

- 40MHz input handled by 40 PRM mezzanines in round robin, each handles ~1MHz event input rate
- Event Processing >= 1MHz (out of 40MHz)
- Input BW >= 16Gbps
- In the case of AM approach:
  - ~10 AM chips / PRM
  - ~200k patterns / AMchip
  - ~ 2M patterns / tower
  - (2M x 48 towers ~ 100M patterns)

The relaxed performance requirement would make early technical demonstration easier for different track finding approaches.

### First Pulsar 2a prototypes work well: "plug & play" (summer 2013)

Fermilab

**PPD** engineer



http://www-ppd.fnal.gov/EEDOffice-w/Projects/ATCA/

#### Pulsar 2b:

- Vertex 7 FPGA (XC7VX690T)
- 80 GTH lines
- Compatible with LAPP IPMC module -
- FMC TTC compatible, backplane clock dist.
- Plan to use CMS IPBus user interface
- General purpose design

3/19/2014/O ~ 1 Tbps

Ted Liu, CMS Views on L1 Trackir







### Pulsar 2b Block Diagram



I/O: ~1 Tbps



#### Northwestern (CMS) CERN B186 (CMS)



A. (10

1 1 1 1 1 1 1 1

Fermilab (CMS/Generic R&D) CERN TDAQ LAB4 (Atlas FTK)

#### CERN TDAQ LAB4 (Atlas FTK)

Existing Pulsar2 ATCA Teststands (Atlas/CMS Common Electronics)



UofChicago Atlas **Postdoc:** Yasu Okumu a

Ted Liu, CMS Views on L1 Tracking Trigger

Waseda Japan (Atlas FTK)

RTI



#### Slide from Mark Pesaresi

#### Alternative demonstrator for TM track trigger:

takes advantage of hardware & expertise developed for the L1 global calorimeter trigger upgrade: **MP7 processing card** 



Xilinx Virtex 7 690T FPGA mature system & link firmware, ready for drop in algos

72 input / 72 output optical links => 0.9Tb/s total bandwidth

#### example implementation: divide tracker up into 5 regions in phi



processors (purple) could build tracks in the FPGA, or data can be forwarded to AM ASICs



Ted Liu, CMS Views on L1 Tracking Trigger

- ~230 FEDs
- input data from tracker
- output trigger data is formatted & time multiplexed

#### **120 Processors**

- each receive data over 24BX
- each processes one phi
- sector per event

### More details see talk by Andrew Rose tomorrow morning. architecture option

make full use of the **time multiplexing technique (TMT)**, extending an architecture choice currently being implemented in the L1 calorimeter trigger

#### design motivations

- *simplifies data flow* problems, no need for track finding/fitting algorithms to share data across regional boundaries, no merging/ removal of duplicates required, no reduction of data volume by stages (maximises efficiency)

- allows **spatial pipelining** of data: essential for designing algorithms implementable in FPGAs & for optimising build times by reducing combinatorial logic requirements

- future L1 trigger may also be time multiplexed, which could simplify input of tracking data (tracks) into the global calorimeter & muon triggers



Ted Liu, CMS Views on L1 Tracking Trigger

## Vertical Slice System Demonstration over next few years

Can and will be Implemented in stages: mezzanine, board, crate and multi crate level (ATCA & uTCA) With the goals: Performance study (latency, efficiency etc) Identify issues/bottlenecks Guide future R&D, find solutions > A common platform to explore new ideas/algorithm/approaches An important step towards TDR and beyond A major undertaking ! CMS people involved: Lyon/INFN/Cornell/Northwestern/ Florida/Purdue/KIT/UK/CERN/FNAL ... Data Source stage

## With some Atlas & CMS common electronics ...

3/19/2014

*Core trigger tower* 

# Backup

- Pulsar 2b related
- Some background materials on Fermilab VIPRAM project (in case asked).

#### http://www-ppd.fnal.gov/EEDOffice-w/Projects/ATCA/



### Design philosophy: Less is more ...

# Rear Transition Module (RTM)



- Fiber optic
   transceivers
  - 8 QSFP+
  - 6 SFP+
- Up to 380 Gbps
   bi-directional
- Hot Swap
- PICMG 3.8 "Zone-3A" compliant



## Pulsar II ATCA Mini power adaptor

The Mini Backplane can power an ATCA board on the bench: A 48VDC power supply is required.

The Base Interface Ethernet port is brought out to an RJ45 connector. The I2C IPMB bus signals are brought out to a terminal block.



full-mesh backplane connector with ALL connections in loop-back mode (Tx  $\rightarrow$  Rx).

Used extensively for table-top testing/debug/firmware development



# Test stand at FNAL



Achieved maximum speed: Local bus: 10 Gb/s RTM : 10 Gb/s Fabric BP: 6.25Gb/s (limited by this backplane)

Upper limit of error rate (zero error):

| Local bus | : 1.4E-15 (~ 10 hours |
|-----------|-----------------------|
| RTM       | : 1.9E-16             |
| Fabric BP | : 4.2E-17 (1 week)    |



## Pulsar IIa Full crate testing at Fermilab



activate all the GTX at 6.25 Gb/s on 7 boards on full-mesh backplane



Achieved maximum speed: Local bus: 10 Gb/s 3/19/2014 RTM : 10 Gb/s

## Fermilab Tracking Trigger R&D Roadmap: → from generic R&D to system demonstration



"A New Concept of Vertically Integrated Pattern Recognition Associative Memory" TIPP 2011 Proceedings http://www.sciencedirect.com/science/article/pii/S1875389212019165

### fired road





## Design involved Control/interface

### In 130nm







## Different level of design simulation and optimization process



3/19/2014

Figure 8 - protoVIPRAM pad arrangement.

People involved so far from Fermilab: Engineers: G. Deptuch, J. Hoff, S. Joshi, J. Olsen, M. Trimpl, Physicists: S. Jindariani, T. Liu, N. Tran

*Power/thermal analysis by* W. Xia, P. Gui (SMU EE)



Figure 8 - protoVIPRAM pad arrangement.

| Waveform - DEV:0 I   | MyDevice                             | e0 (XC7     | K160T) UNI | F:0 MyILA                               | 0 (ILA) |      | energenergen er |      | ana |      |      | un en |         |      |        |      |          | o <sup>r</sup> | ď     |
|----------------------|--------------------------------------|-------------|------------|-----------------------------------------|---------|------|-----------------|------|-----------------------------------------|------|------|-------------------------------------------|---------|------|--------|------|----------|----------------|-------|
| Bus/Signal           | х                                    | 0           | 1205       | 1225                                    | 1245    | 1265 | 1285            | 1305 | 1325                                    | 1345 | 1365 | 1385                                      | 1405    | 1425 | 1445   | 1465 | 1485     | 1505           | 1     |
| • RowAdr             | 14                                   | 13          | 12         |                                         | 13      |      |                 | 14   | 1                                       | X    |      | 15                                        | X       |      | 16     |      | (        | 17             |       |
| • ColAdr             | 00                                   | 00          |            |                                         |         |      |                 |      |                                         |      | 00   |                                           |         |      |        |      |          |                |       |
| - runMode            | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      | _    |                                           |         |      |        |      |          |                | - 2   |
| P Data_Output        | 3FFF:                                | 3FFFF1      | XXXXXXXXX  | XIIXXIIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX |         |      |                 |      |                                         |      |      |                                           |         |      | XXXXXX |      |          |                | 800 N |
| - DataPort[85]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         | ſ    |      |                                           |         | 1    |        |      |          |                |       |
| - DataPort[86]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         | L    |      |                                           |         | U    |        |      |          |                |       |
| - DataPort[87]       | 1                                    | 1           |            |                                         |         |      | U               |      |                                         |      | [    |                                           |         | Ľ    |        |      |          |                |       |
| DataPort[88]         | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      |          |                |       |
| - DataPort[89]       | 1                                    | 1           |            |                                         | 10      |      |                 |      |                                         |      |      |                                           |         |      |        |      |          |                |       |
| DataPort[90]         | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      |          |                |       |
| - DataPort[91]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      |          |                |       |
| - DataPort[92]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      |          |                | -     |
| - DataPort[93]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      | Ţ        |                |       |
| - DataPort[94]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | 1    |        |      |          |                |       |
| - DataPort[95]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | 1    |        |      |          |                |       |
| - DataPort[96]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      | [    |                                           |         | 1    |        |      |          |                |       |
| - DataPort[97]       | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      | 8        |                |       |
| DataPort[97]         | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         |      |        |      |          |                |       |
| - DataPort[99]       | 1                                    | 1           | <u> </u>   |                                         |         |      |                 |      |                                         |      |      | 2                                         |         |      |        |      | 0        |                | 8     |
| - DataPort[100]      | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | L    |        |      |          |                |       |
| - DataPort[101]      | 1                                    | 1           |            |                                         |         | [    |                 |      |                                         |      |      |                                           |         | 1    |        |      | <u> </u> |                |       |
| - DataPort[102]      | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | 1    |        |      |          |                |       |
| - DataPort[103]      | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | 1    |        |      |          |                |       |
| DataPort[104]        | 1                                    | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | 1    |        |      |          |                |       |
| DataPort[105]        | other last statements and the second | 1           |            |                                         |         |      |                 |      |                                         |      |      |                                           |         | 1    |        |      |          |                |       |
|                      |                                      | 110 Acc. 10 |            | dir.                                    |         |      |                 |      |                                         |      | -    | -                                         |         |      |        |      |          |                | •     |
| Waveform captured Fe | 0 6, 201                             | 4 11:53:    | 37 AM      |                                         |         |      |                 |      |                                         |      | Х    | : 12                                      | 282 4 🕨 | 0:   | 1218   |      | X-0):    | 64             |       |

Recent Full Chip Test example (chipscope sampling results): pattern scanning. *Plan to present full results at TWEPP 2014* 



Original SVT system had ~400K patterns total Aim to reach ~500K patterns per chip for VIPRAM (long term goal)...

## **High Performance Computing**

→ from US "Report to the President and Congress" by President's Council of Advisors on Science and Technology, Dec. 2010 (page 65)

- Compute-intensive
  - massively parallel computation involving very large number of processing elements;
- Communication-intensive
  - high-speed transfer of data among processing elements;
- Data-intensive
  - high-speed manipulation of very large quantities of data

HL-LHC L1 Tracking Trigger is High Performance Computing (Non-von Neumann approach) but with very Low Latency and in Real Time HL-LHC requires the most advanced Real Time processing technology



