

# An overview of the CMS HGCAL backend electronics and vertical integration system tests

S. Mallios, Imperial College London On behalf of the CMS HGCAL BE group



# The High Granularity Calorimeter readout overview





### More details on the detector architecture in the talk by Thomas French: <u>https://indi.to/WhPPf</u>

#### S. MALLIOS, Imperial College London



# An overview of the CMS HGCAL backend electronics

# **The Detector Readout Chain**





### The back-end electronics is an ATCA-based system, using "Serenity" boards:

- DAQ:
  - Distributes the slow control (configuration) and fast control (Clock and L1A distribution) to the front-end
  - Receives, buffers and forwards fully built events (fine-detailed) for every L1 accept (750 KHz average rate)
- TPG Stage-1: Receives the raw data from the front-end, selects trigger cells and adds individual module sums to partially form tower energies
- TPG Stage-2: Performs trigger cell clustering (3D cluster objects) and calculates their positions, energies, and shape
- DTH: (Common to CMS) Distributes the clock and the fast control commands and forwards DAQ and validation data to the central DAQ

\* CMS Standard Protocol (CSP) and SlinkRocket are high-speed link protocols developed for the needs of the CMS BE readout systems.

# **Detector readout challenges**







Example of configurable routing necessary for load balancing in the BE DAQ firmware

#### Variable radiation levels at the endcaps of the CMS detector:

**Highly Inhomogeneous detector** and require different types of readout electronics and different geometry making every detecting layer unique.

### High granularity of the detector:

approximately 6M channels for the DAQ and 1M for the Trigger; very high data volume

### Tight Financial and Allocated Space Budget

Detector readout is costly, and the space in the underground electronics cavern for HGCAL racks is limited.

### Impact on Backend Readout System Requirements:

→ Minimise number of fibers to the back-end (use non-uniform cabling, minimise "dark" fibers etc.)

 $\rightarrow$  Provide **load balancing** through carefully arranged fibers and highly configurable firmware.

→ Utilize full capabilities of backend processing boards (throughput, FPGA resource utilisation) to minimize their number.

The design of the back-end readout system faces significant challenges and requires non-trivial optimization efforts to address these challenges.

# The High Granularity Calorimeter readout overview





#### S. MALLIOS, Imperial College London

# Structure of the Trigger BE system for one endcap

CMS,

The HGCAL structure has a 120° azimuthal symmetry in each endcap and the two endcaps are identical.  $\rightarrow$  the FE consists of six identical 120° sectors.



# The High Granularity Calorimeter readout overview





#### S. MALLIOS, Imperial College London

# Structure of the DAQ BE system for one endcap

The HGCAL structure has a 120° azimuthal symmetry in each endcap and the two endcaps are identical. → the FE consists of six identical 120 sectors.



S. MALLIOS, Imperial College London

CALOR2024 / Tsukuba May 24th 2024

CMS,



# Vertical integration system tests

# A Vertical slice of the final system



Vertical system HGCAL electronics

The Vertical test system brought together at CERN consists of: FE hardware with real ASICs, a BE board running custom firmware and software, a DAQ and Clock distribution Hub (DTH) board, and a DAQ PC for run control, configuration and storing data to disk.

### Front-end:

- 2 LD Silicon Modules:
  - 6 HGROCs
  - 4 ECONTs (2 per module)
  - 4 ECONDs (2 per module)
  - **4 lpGBTs** (2 TPG + 2 DAQ)
- Scintillator signal sampling (external trigger)

### Back-end:

- 1 back-end ATCA board (Serenity)
  - FE configuration (Slow control)
  - DAQ packet processing, buffering
  - TPG unpackers and TC processors
  - TPG and DAQ readout
  - Clock distribution and L1A generator
- 1 DTH Board
- 1 DAQ PC
  - Data storage
  - Run control and configuration
  - DAQ software

CMS,

## Beam test setup at CERN Prevessin site





#### S. MALLIOS, Imperial College London

# Test beam summary



# Recorded more than 2 TB of TPG and DAQ data with electron, pion and muon beams through the full readout chain!



Our high-tech whiteboard that kept us motivated and focused.

- Acquired data with self trigger (scintillator)
- Acquired data with two different ECON-T algorithm:
  - Super Trigger Cell-4: Creates sums of neighboring TCs
  - Best Choice algorithm forwards only the highest energy TCs
- Tested different ECON-D data formats:
  - Standard (zero-suppression) mode with adjustable thresholds
  - Passthrough mode (no signal threshold)
- Synchronize the detector (L1A offsets, adjust Trigger data latency etc.)
- Ø Dynamic front-end configuration with slow control.
- Acquired data with the central DAQ board (DTH prototype)
  - DTH firmware bug was discovered and later fixed
- Data Quality Monitoring (DQM) for online data analysis

# **Results (Selected plots)**





Correlation of the signal between 1 channel from Si module 1 and Si module 2 from different events (left) and from same events (right)

#### Seeing the MIP peak



Example of signal distribution in 2 channels for Module 1 with  $200\mu m$  thick sensor (left) and module 2 with  $300\mu m$  thick sensor (right).

## Front end setup scintillators Si module 2 Si module 1 Si module 1

absorber

### Effects of the ECON-D Zero-Suppression algorithm



#### S. MALLIOS / Imperial College London

# **Summary and Outlook**



The successful validation of the vertical system during September 2023 marks a significant milestone for the BE readout system, demonstrating its readiness to tackle the demanding requirements of the new CMS endcap calorimeter

## → Time to scale up horizontally!

### Next beam test (August/September 2024)

- Include in the readout chain:
  - two plastic scintillator tiles
  - one high density and three low density silicon modules
  - ATCA BE Board (Serenity) version that is foreseen to be used in the final system.

→ Preparatory stage for the crucial upcoming "cassette" validation starting early next year.

### Cassette prototyping and production testing (early 2025):

- An HGCAL "cassette" is a 60° slice of one HGCAL layer requiring a significant expansion of our current test readout system. (i.e. one CE-E cassette comprises of ~20 high density and ~70 low density silicon modules).
- Requires a fully functional, near-final version of the BE firmware for both the Trigger and DAQ systems, running on the final hardware configuration.



An HGCAL "cassette": a 60° slice of one HGCAL endcap layer.





# **THANK YOU!**

# **Questions?**







# Back-end firmware for the vertical test system



### "mini" back-end firmware overview (Serenity board)



### Beam test firmware encapsulates important elements from both the DAQ and the Stage-1 Trigger systems:

- miniDAQ: The basic ECOND packet receiver unit. Validates the incoming packets (timestamp, CRC) and buffers
- Basic elements of TPG Stage-1:
  - TPG Stage-1 Data Unpackers
  - TPG stage-1 TC processors
- **TCDS2 emulator:** Provides internal (random or regular) or external (scintillator, unpacker self-trigger) triggers and fast commands calibration sequences
- Readout interface for the DAQ and the TPG paths: 2x 25Gb/s Slink Rocket: The baseline CMS DAQ link protocol; requires a DTH board
- Slink and lpGBT link implementations are part of the infrastructure firmware provided with the Serenity board.

#### S. MALLIOS, Imperial College London





# Beam test FE setup diagram





### DTH (central DAQ) firmware bug discovered:

- Problem appeared during September 2023 beam tests:  $\rightarrow$  after some time of running (max 4 hours), the DTH would "freeze", stalling data acquisition
- Recovery attempts (soft reset of DTH and Serenity) were unsuccessful •
- After investigation, the root cause was identified as a DTH firmware bug •  $\rightarrow$  DTH team informed and firmware issue was later fixed and tested at the lab

### Front-end optical receiver (VTRx+) saturation issue

Problem appeared during both August and • September beam runs:

 $\rightarrow$  high number of ECOND packet losses observed, indicating a problem on the uplink

- Investigating in the lab:
  - Noticed high Bit Error Rate (BER) on the uplink (as 0 bad as 10E-8) causing packet loss
  - No packet loss when we attenuated the VTRx+ 0 input (downlink) by loosening the LC connector
- 10<sup>9</sup> Entries: 4.328339e+09 10<sup>8</sup> 10<sup>8</sup> sLink events: 4.328339e+0 107 10<sup>7</sup> 10<sup>6</sup> Lost packets 10<sup>6</sup> 10<sup>5</sup> 10<sup>5</sup> No Lost 10<sup>4</sup> 10<sup>4</sup> packets 10<sup>3</sup> 10<sup>3</sup> 10<sup>2</sup> 10<sup>2</sup> 10 10 10<sup>-1</sup> 10<sup>-1</sup> 3 5 ECON-D status ECON-D status Packets received vs lost reported by the BE DAQ system before (left)

and after (right) applying attenuation in the downlink

- Problem was reported to the electronics team at CERN that designed the VTRx module:
  - Further investigation revealed that the high optical power of the transmitter of BE optical module (FireFly) was outside 0 the dynamic range of the VTRx+ receiver and was corrupting the downlink
  - This affected the lpGBT(Rx/Tx) output serializer and corrupted the up-link causing the high BER 0
- $\rightarrow$  The BE optical module manufacturer was contacted and provided instructions on how to configure the optical power of the TX (requires I2C access to the FireFlies). With the attenuated downlink the uplink packet loss dropped from 10E-8 to <10E-15!





# Overcoming the Challenges (1)

# CMS

### Configuring the FE ASICs

- The front-end has four types of ASICs (e.g. HGCROC, ECONs, LpGBTs) which have many tunable parameters
  - Navigating different slow control interfaces behind the LpGBT ASICs was crucial
  - Selecting the right parameters for ASICs was an iterative process
- Each ASIC sends its processed data over a 1.28 GHz serial link to next ASIC
  - Correctly aligning them and selecting the procedure (using different phase tracking modes) to keep the alignment intact also proved to be an important lesson
  - The link alignment also impacts which ECON-T event is 'tagged' as BCO. Correctly tagging the event while maintaining overall timing of the system (between DAQ and trigger) was tricky



Distribution of scintillator trigger arrival time and ECON-T derived self-trigger arrival time

### Timing in the DAQ data with trigger – matching L1As

- The internal calibration pulse along with self triggering mechanism was key to timing in the system – fixed self trigger delay and HGCROC buffer depth
- The scintillator delays were adjusted to match that of self trigger completing the whole chain
- The HGCROC sampling delays were adjusted to match both module timings

### Developing and running a custom readout solution

- A custom 10G UDP firmware was developed to allow us to capture data without a DTH board
- The UDP packet was formatted so that the payload would look like an Slink packet (Slink header/Trailer) to provide consistency between DTH and UDP runs
- Challenges and optimisations:
  - $\rightarrow$  Added a 500 Hz heartbeat (empty packet) to keep the link alive when idle
  - $\rightarrow$  Optimising the DAQ PC UDP buffer sizes to minimise packet loss

### Event tag mismatches across the system

- Captured events come with a unique event ID defined by the Event number, the Bunch Crossing counter and the Orbit Counter
- These IDs are used to match event coming from the front-end to L1As at the backend to keep the system in sync

 $\rightarrow$  Mismatches on counters were observed between the front-end and the back-end

 $\rightarrow$  Matching the Event IDs across the system was eventually achieved after understanding some subtle behavior differences of FC decoder across the ASICs



Mismatches in the Orbit counter between the BE DAQ and the ECOND

CMS,

- □ HGCAL BE DAQ system mainly moves event data from on-detector (front-end (FE)) electronics to central DAQ (DTH board)
- BE DAQ communicates with the FE electronics via optical links with lpGBT ASICs at the FE and lpGBT-specific firmware running at the BE
- Each BE Board (Serenity) will carry one VU13P FPGA; each FPGA receives data from the FE through 108 lpGBT-10G links and sends data to the cDAQ through 12 SLINK-25G links (Uplink datapath)
- BE DAQ is also responsible for distributing the clock and fast control signals from the central timing, control and distribution system (TCDS2) (Fast Control) and for configuring the FE electronics (Slow control)
- □ Fast and slow control signals are distributed to the FE electronics through **108 lpGBT-2.5G** control links (**Downlink datapath**)



# The Detector Readout Chain (detailed)



crates in a dedicated Cavern

CMS

### The on detector electronics are custom radiation-hard ASICs:

- ~120K HGROCs: Interfaces with sensors and creates data streams for the DAQ and TPG
- **~30K ECONDs:** Performs most digital processing of sensor data for events passing L1 trigger at 750 kHz. Apply zero suppression and generate reset request on error conditions
- **~30K ECONTs:** Selects or compresses HGCROC trigger data for transmission off detector at 40 MHz
- **~10K Low power GBTs**: Serialises ECON aggregated and transmits them to the back-end (BE) system through optical transceivers (VTRx+) (10.24Gb/s uplink). Receives and distributes the Slow and Fast Control commands from the BE (2.56Gb/s downlink)

### The back-end electronics is an ATCA-based system, using "Serenity" boards:

- 96 DAQ Boards:
  - Distributes the slow control (configuration) and fast control (Clock and L1A distribution) to the front-end
  - Receives, buffers and forwards fully built events (fine-detailed) for every L1 accept at an average rate of 750 KHz
- 84 TPG Stage-1 Boards: Receives the raw data from the front-end, does any residual calibration needed, selects trigger cells and adds individual module sums to partially form tower energies
- 108 TPG Stage-2 Boards: Performs trigger cell clustering to produce 3D cluster objects and calculates their positions,
- energies, and shape properties
- 36 DTH boards: (Not HGCAL specific) The DAQ and TCDS Hub: Distributes the clock and fast control commands through the ATCA backplane (TCDS2) and provides the event data collection interface to the cDAQ

# Discovering issues affecting all CMS

### DTH firmware bug discovered:

- Problem appeared during September 2023 beam tests:
  → after some time of running (max 4 hours), the DTH would "freeze", stalling data acquisition
- Recovery attempts (soft reset of DTH and Serenity) were unsuccessful
- After investigation, the root cause was identified as a DTH firmware bug
  → DTH team informed and firmware issue was later fixed and tested at the lab

### Front-end optical receiver (VTRx+) saturation issue

• Problem appeared during both August and September beam runs:

 $\rightarrow$  high number of ECOND packet losses observed, indicating a problem on the uplink

- Investigating in the lab: revealed that the high optical power of the transmitter of BE optical module (FireFly) was outside the dynamic range of the VTR+ receiver and was corrupting the uplink.
- → The BE optical module manufacturer was contacted and provided instructions on how to configure the optical power of the TX (requires I2C access to the FireFlies). With the attenuated downlink the uplink packet loss dropped from 10E-8 to <10E-15!</li>



IMS

Packets received vs lost reported by the BE DAQ system before (left) and after (right) applying attenuation in the downlink



|          | LD Hexaboard 💮 HD Hexaboard           | Tileboard / Motherboard / WB                |
|----------|---------------------------------------|---------------------------------------------|
| HGCROC   | 3 per LD Hexaboard 6 per HD Hexaboard | 1 for most Geometries / 2 for B12 Tileboard |
| GBT-SCA  | N/A N/A                               | 1 GBT-SCA per Tileboard                     |
| ECONs    | ECON Mezzanine on the Hexaboard       | 2 ECON-T + 1 ECON-D on the Motherboard      |
| RAFAEL   | 1 per Hexaboard                       | 1 per Motherboard                           |
| lpGBT    | 3 per LD Engine 6 per HD Engine       | 2 per Motherboard (DAQ + Trigger)           |
| VTRx+    | 1 per LD Engine 2 per HD Engine       | 1 per Motherboard                           |
| linPol12 | Engine                                | Motherboard                                 |
| LDO      | Hexaboard and Engine                  | 1 on Motherboard, 2 per Tileboard           |
| bPol12   | DCDC mezzanine on the Hexaboard       | 1 per Motherboard, 2 per Tileboard          |
| ALDO     | N/A N/A                               | 2 per Tileboard                             |
|          |                                       |                                             |

# Data Analysis/Results





Example of signal distribution in 2 channels for Module 1 with  $200\mu m$  thick sensor (left) and module 2 with  $300\mu m$  thick sensor (right).



Correlation of the signal between 1 channel from the 1st and from the 2nd module from different events (left) and from same events (right)



Trigger time of arriving particles with a 1 BX window



Comparison between trigger data from ECON-T and emulated data.

Effects of the ECON-D Zero-Suppression