# Fermilab Dus. Department of Science



## DAQ and Level-1 Track Finding for the CMS HL-LHC Upgrade

Fabio Ravera on behalf of the CMS CollaborationThe 28th International Workshop on Vertex Detectors17 October 2019

# Outline

- Overview
- Inner tracker DAQ
- Outer tracker DAQ
- L1 track finding
- Summary



# **CMS Tracker upgrade motivations**

- Standard Model and Beyond Standard Model processes statistically limited → HL-LHC upgrade
- Hit rate and radiation damage not manageable by the current tracker
  - Full replacement of the CMS Tracker
- Today's L1 threshold at 200 PU ~ 4 MHz
  - Tracks needed for trigger decision
  - Lepton threshold improvement
  - Possibility for new triggers (e.g. displaced or disappearing tracks)
- Dedicated material on Phase II Tracker:
  - Inner Tracker (IT): P. Luukka talk
  - Outer Tracker (OT): <u>A. Rossi talk</u>
  - Serial Powering: <u>D. Koukola talk</u>
  - IT modules: B. Ristic poster



# **DAQ Overview for the CMS tracker at HL-LHC**

- Main DAQ challenges:
  - PU up to 200
  - L1-trigger rate increased from 100 kHz to 750 kHz
  - Track reconstruction at L1-trigger
  - Much more channels to handle:
    - Inner Tracker ~120 Million → ~2 Billion
    - Outer Tracker ~10 Million  $\rightarrow$  ~ 200 Million



# **Inner Tracker**



- Hit rates up to **3 GHz/cm<sup>2</sup>** in innermost layers (R ~3 cm)
  - CMS Pixel Phase I Pixel detector maximum rate ~ 600 MHz/cm<sup>2</sup>
- Triggered readout only with rate up to 1 MHz
  - higher L1 rate to stream data for luminosity monitoring

# IT chip and on-module data flow

- Chip developed by **RD53 collaboration** (Atlas-CMS common development)
- 50x50  $\mu$ m<sup>2</sup> ROC pixel size, 336 x 432 matrix  $\rightarrow$  2 billion pixels for the CMS IT
- Pixel hit: address + 4 bits for Time-over-Threshold
- Zero-suppressed data readout, compressed readout (~2x data volume reduction)
- Data shipped out from up to four @ 1.28 Gb/s output lines per ROC
- ROC in module data merging (2→1 and 4→1) with 320 MHz lines (640 MHz under investigation) to reduce data lines for modules with low occupancy



 ROC data sent to the Low power GigaBit Transceiver (LpGBT) (up to 6 links per module), converted into optical by the Versatile Link+ (VL+) and sent to the back-end electronics at 10.24 Gb/s

🚰 Fermilab



# Data, Trigger and Control (DTC) board schematic for the IT



# **Outer Tracker**

























**Strip-strip Module (2S)** CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**





**Strip-strip Module (2S)** CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**





**Strip-strip Module (2S)** CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**





**Strip-strip Module (2S)** CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**





Strip-strip Module (2S) CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**





### Strip-strip Module (2S) CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**

Short-strip ASIC (SSA) sends strip cluster and L1 data to the MPA which combines with pixel information and create stubs

#### **CIC concentrator chip**

Receives stubs and L1-data and pack them





Strip-strip Module (2S) CMS Binary Chip (CBC) reads both sensor and identify stubs

#### **Pixel-Strip Module (PS)**

Short-strip ASIC (SSA) sends strip cluster and L1 data to the MPA which combines with pixel information and create stubs

## CIC concentrator chip

Receives stubs and L1-data and pack them

🚰 Fermilab

Data from the 2 CICs  $\rightarrow$  LpGBT  $\rightarrow$  VL+  $\rightarrow$  DTC @ 5.12 (10.24) Gb/s Clock, fast-commands and programming: DTC  $\rightarrow$  module @ 2.56 Gb/s

# Data, Trigger and Control (DTC) board schematic for the OT



# **ATCA boards under development**

Two main ATCA board prototypes for CMS:



http://www.apollo-blade.info

https://serenity.web.cern.ch/serenity/overview/



# **Evolution of the DAQ SW for the final detector**

- Aim: detector control and calibration procedures run on the board SoC
  - High level of parallelization and system scalability ensured
  - Requires robust communication protocols, application monitoring system, efficient deployment, resource management



**CMS central DAQ** 

🚰 Fermilab



- L1 Tracking will provide extra handles in L1 trigger
- Goal: reconstruct tracks with p<sub>T</sub> > 2 GeV at 40 MHz
  → Particle Flow at 40 MHz



# L1-tracking constraints and requirements

- ~15,000 stubs per bunch crossing @ 200 PU → Stub bandwidth O(20) Tb/s
- ~4 μs available for track finding (12.5 μs total L1 latency)
- Present solution derived from two all-FPGA developments:
  - Time-Multiplexed Track Trigger (TMTT)
  - Tracklet algorithm
- Both tested on HW demonstrator to measure latency and estimate resource utilization and performance





# L1 track finding - Time-Multiplexed Track Trigger (TMTT)

- Time multiplexing factor (TMF): 18
- Geometrical divisions: 8 octants in  $\phi$
- Each TF board receive stubs from 2 adjacent octants: data duplication but full parallelization
- Track Finding: Hough Transform (HT)
  - Stub (r, $\varphi$ ) in the q/p<sub>T</sub>  $\varphi_0$  plane  $\rightarrow$  straight line
  - Division in 2 ( $\phi$ ) \* 18 ( $\eta$ ) sub-sector
  - 4 or more lines intersect -> track candidate
- Track fitting: Kalman filter
  - Common iterative algorithm
  - Initial estimate of track parameters from HT seed
  - Repeat until all stubs are added
  - $\chi 2$  used to reject false candidates
- Measured latency ~ 3.5 µs
- Resources compatible with < 2 Kintex UltraScale</p>





🚰 Fermilab



# L1 track finding - Tracklet method

- Time multiplexing factor (TMF): 6
- Geometrical divisions: 28  $\phi$  sectors
- No stub duplication, tracks with  $p_T > 2 \text{ GeV}$ spans over max 2 sectors: board exchange data only with the nearest neighbors
- Track Finding: Road search
  - Pair of adjacent layers used to form seed called a tracklet
  - Seeding done in multiple disk/layer pairs  $\rightarrow$  redundancy
  - Tracklets + IP projected to other layers to add matching stubs, residual calculated
- Track fitting: Linearized x2 fit
  - Complex calculations pre-computed and stored in look-up tables
  - Remove duplicates by checking for shared stubs and retain track with the lowest  $\chi^2/ndf$
- Measured latency ~ 3.3 µs

27

**Resources compatible with 1 Kintex UltraScale** 





# **HW demonstrator performance**



# **Common solution under development**

- Combining the two approaches:
  - Track finding: tracklet approach
  - Track fitting: Kalman filter (KF)
- 9  $\phi$  sectors x TMF 18 = 162 DTC
- Further improvements:
  - Pre-fit duplicate removal
  - No need to fit seeding stubs, KF integration
- Very high efficiency for ttbar (~ 95%) and muon (> 97%)
- p<sub>T</sub> and z<sub>0</sub> resolutions compatible with the two separated approaches
- Fake rate ~10% (can be reduced by tighter selection cuts)
- Displaced track fit under investigation
  - No beam spot constraint



🌫 Fermilab

# Summary

- Full replacement of the CMS tracker to address the harsh HL-LHC environment conditions
- DAQ designed to address the challenges
  - Inner tracker will face up to 3 GHz/cm<sup>2</sup> hit rate and will also serve as luminosity monitor
  - Outer tracker will provide stubs from high p<sub>T</sub> tracks to the back-end boards at 40 MHz
- Back-end electronics
  - Based on ATCA boards, two Ultrascale FPGAs, System On Chip
- L1 tracking required for maintaining high performance
  - Currently considered solution derives from two independent full FPGA solutions
    - Time-Multiplexed Track Trigger
    - Tracklet method
  - A combined approach showed high efficiency and very good track parameter resolutions
- Lots of efforts ongoing on the CMS Collaboration, stay tuned!



# **Backup slides**



31 10/17/2019 F. Ravera I DAQ and Level-1 Track Finding for the CMS HL-LHC Upgrade

# Luminosity monitor with the IT

# Tracker Endcap Pixel Detector (TEPX):

- operated during Van Der Meer scans and in all safe beam conditions
- no data taking: all bandwidth available for lumi triggers (up to ~10MHz)
- during data taking: 75 kHz of special triggers (75kHz) added to physics → Total rate TEPX: ~130 Gb/s at PU 200

# • TEPX Disk 4 Ring 1:

- fully dedicated to BRIL (Beam Radiation Instrumentation and Luminosity)
- beam background, luminosity during all unsafe beam conditions
- Availability = 100%
- Hermitic coverage not required  $\rightarrow$  failures tolerable
- Online pixel clustering done on CPUs, FPGAs or both (Zynq)



# Luminosity monitor - DAQ system architecture



for luminosity measurement:

SW: ~ 5 x 32 CPU servers

- + Common languages
- + Reuse of current algorithms
- High latency

HW: ~ 8 ATCA blades with FPGAs

- + Low latency
- + Common CMS developments
- "Expensive" firmware development

🚰 Fermilab

# **PS and 2S Modules**

#### **PS modules: Macro Pixel + Strip**

Macro Pixel: 1.5 mm  $\times$  100  $\mu m$  Strip: 2.4 cm  $\times$  100  $\mu m$  Module area: ~5  $\times$  10 cm²



#### 2S modules: Strip + Strip

Strip: 5 cm  $\times$  90 µm (both sides) Module area: ~10  $\times$  10 cm<sup>2</sup>



34 10/26/2018 F. Ravera I The CMS Outer Tracker Upgrade for the High Luminosity LHC

## **On-module Data flow - PS module**





### **On-module Data flow - 2S module**





# **Concentrator Integrated Circuit (CIC)**





# **CIC max stub outputs**

| FE_Config         | CBC (2S)  |     |              |     | MPA (PS)  |     |     |     |              |     |     |     |
|-------------------|-----------|-----|--------------|-----|-----------|-----|-----|-----|--------------|-----|-----|-----|
| Output Format     | With Bend |     | Without Bend |     | With Bend |     |     |     | Without Bend |     |     |     |
| Stub_Width        | 18        | 18  | 14           | 14  | 21        | 21  | 21  | 21  | 18           | 18  | 18  | 18  |
| Output_Freq (MHz) | 320       | 320 | 320          | 320 | 320       | 320 | 640 | 640 | 320          | 320 | 640 | 640 |
| N_Output_Lines    | 5         | 6   | 5            | 6   | 5         | 6   | 5   | 6   | 5            | 6   | 5   | 6   |
| N_Output_bits     | 320       | 384 | 320          | 384 | 320       | 384 | 640 | 768 | 320          | 384 | 640 | 768 |
| N_Usable_bits     | 292       | 356 | 292          | 356 | 292       | 356 | 612 | 740 | 292          | 356 | 612 | 740 |
| N_MaxStubs        | 16        | 19  | 20           | 25  | 13        | 16  | 29  | 35  | 16           | 19  | 34  | 40  |
| N_padding_bits    | 4         | 14  | 12           | 6   | 19        | 20  | 3   | 5   | 4            | 14  | 0   | 20  |

CIC stub processing capabilities in 8 BX with regard to the configuration

📽 🛱 🗱 🗱 🕹

# **DAQ developments for prototype test and production**

FW development for prototype test base on FC7 CMS µTCA boards.

- 2S modules
  - FW for CBC and CIC tested, slow control via optical link and GBT readout under development
- PS modules
  - FW available for MPA and SSA electrical readout
- RD53A
  - Single chip FW available for electrical readout
- Common SW development for the whole tracker to address common features.
- 2S modules
  - SW well advanced for CBC-only module testing, CIC control under development
- PS modules
  - private SW for test available, porting on the official framework ongoing
- RD53A
  - Most of main calibration procedures available



🚰 Fermilab

# **Demonstrator systems**

Both solutions validated on hardware and compared with software emulator

#### **Time-Multiplexed Track Trigger**

- 1 time multiplexed (TM) slice, 1 octant
- MP7-EX boards (µTCA with Virtex 7)
- Boards: 2 sources, 1 + 2 Hough Transform, 2 Kalman filter, 1 to receive final tracks
- Measured latency ~ 3.5 μs
- Resources expected to be compatible with < 2 Kintex UltraScale</li>

#### MP7-based demonstrator @ CERN/UK



#### Tracklet method

- 1 (TM) slice, two implementations of different z portions to validate the emulator
- CTP7 boards (µTCA with Virtex 7)
- 3 boards (1 φ sector + 2 nearest neighbors) and 1 board to source the stubs and receive final tracks
- Measured latency ~ 3.3  $\mu s$
- Resources expected to be compatible with 1 Kintex UltraScale

CTP7-based demonstrator @ CERN/Cornell

