# Progress in design and testing of the DAQ and data-flow control for the Phase-2 upgrade of the CMS experiment







#### Outline

• The CMS experiment at the CERN LHC

The CMS Phase-2 DAQ system and the DAQ and Timing Hub

• Design once, use in multiple places?

# The CMS experiment at the CERN LHC

# CMS is one of the experiments at the CERN LHC



# CMS is one of the experiments at the CERN LHC





# CMS is one of the experiments at the CERN LHC



#### The life and times of CMS and the LHC



#### The life and times of CMS and the LHC

| 1 Run 2 LH                                     | IC (2018 max.)        | HL-LHC (ultimate)                    |  |
|------------------------------------------------|-----------------------|--------------------------------------|--|
| Beam energy (TeV)                              | 13.6 TeV EYEIS        | 13.6 - 14 Te                         |  |
| Bunch charge [protons]                         | $1.15 \times 10^{11}$ | installation 2.20 × 10 <sup>11</sup> |  |
| Number of bunches                              | 2556                  | 2760                                 |  |
| β* [cm]                                        | 2x rombal Lum 30      | ATLAS - CMS Lupgrade 15              |  |
| Emittance [µm]                                 | 2.50                  | 2.50                                 |  |
| Bunch length [cm]                              | 8.3                   | 7.6                                  |  |
| Luminosity [cm <sup>-2</sup> s <sup>-1</sup> ] | $2.0 	imes 10^{34}$   | $7.5 	imes 10^{34}$                  |  |
| Events / crossing                              | 55                    | 195                                  |  |

# The Phase-2 upgrade of the CMS experiment Complete overhaul of the CMS detector:

- Full redesign and rebuild of pixel and strip trackers
- Addition of MIP Timing Detector, between tracker and calorimeter

 Replacement of end-cap calorimeters with high-granularity (silicon + scintillator) ones

- Level-1 trigger latency increases from 4.3 µs to 12.4 µs
- Replacement of barrel calorimeter front-end electronics
- All muon systems receive 'minor' upgrades to stay in step with latency and technology



The CMS Phase-2 trigger-DAQ system

and the DAQ and Timing Hub

## CMS Phase-2 DAQ and trigger control overview



## CMS Phase-2 DAQ and trigger control overview



# The DTH-400 DAQ and Timing Hub

- The DTH is the portal between the back-end electronics and the central DAQ, timing, and control and monitoring systems
- One DTH per back-end crate
- The DTH is equipped to drive standalone, single-crate data-taking runs for commissioning, calibration, etc.
- $\bullet$  DTH-400 DAQ throughput: 400  $\mathrm{Gbit/s}$



#### The DAQ-800 node board

- Per crate, one or more DAQ-800 'companion boards' can be added to increase the DAQ throughput
- DAQ-800 DAQ throughput: 800  $\mathrm{Gbit/s}$
- Can accomodate per-crate DAQ needs ranging from 10 Gbit/s (some muon systems) to 2.2 Tbit/s (inner tracker)



#### Flashback to Real Time 2018



# Design and prototyping of DTH-400 & DAQ-800



- The P2 merges all prototyping lines, and switches FPGAs from KU15P to VU35P
- Adopted in-FPGA High-Bandwidth Memory for Ethernet buffering
- The DAQ-800 is a 'creative copy-paste' of the DTH-400

#### Current state-of-the-art: the DTH-P2



Comfortably meets clock quality and DAQ throughput requirements for Phase-2 CMS

#### Current state-of-the-art: the DTH-P2



# Design once, use in multiple places?

# A new kind of optimisation challenge

#### Driven by a wish to

- reduce design effort,
- · reduce maintenance effort, and
- · reduce engineering and prototyping cost,

we were prompted to consider designing the Phase-2 DAQ hardware such that it could also serve for the Trigger and Timing Control and Distribution System (TCDS).

To note: this challenge was posed at the right time, i.e., during the design phase

### CMS Phase-2 trigger control architecture



### CMS Phase-2 trigger control architecture



#### The DTHs:

Connect all CMS back-end crates to the central trigger, DAQ, and control systems

## CMS Phase-2 trigger control architecture



#### The TCDS2 captain:

- Houses several firmware 'run controllers' to drive data-taking runs
- Contains a configurable 'switch' to assign groups of CMS back-ends to these runs

#### A switch or a tree?



- Simultaneous runs with different subdetectors are necessary for commissioning, calibration, etc.
- Only the top-level run controller can reach all end-points
- Each sub-level run controllers can reach a *fixed* subset of end-points
- Ad hoc changes in subsets require recabling

#### A switch or a tree?



- Each top-level run controller can reach all end-points, in any arbitrary combination
- Subset assignment is now 'just configuration'
- This achieves full flexibility for many simultaneous data-taking runs

So a switch it is, then!

# Using the DAQ-800 to implement the TCDS



- Two layers of DAQ-800: one with run controllers, one as 'distributed switch'
- Use the 'back-end data' Fireflys to mesh-interconnect the controller boards and the switch boards
- Use the 'DAQ QSFPs' to connect the switch to the DTHs
- Number of run controllers scales with the number of controller boards
- The number of end-points scales with the number of switch boards

The determining scale factor appears to be the FPGA resources required to implement each N × M (sub)switch

# Using the DAQ-800 to implement the TCDS

#### The good (which is beyond question)

Removes the need for a separate design, production, spares, etc.

#### The 'bad' (which complicates life)

The needs of a DAQ system are largely orthogonal to those of a timing/control system

- The DAQ functionality hinges on the High-Bandwidth Memory, a control system benefits more from logic resources
- The DAQ profits from high-density optics, e.g., CWDM QSFP28s, and the architecture of a timing distribution system is all single point-to-point links

Reusing back-end or trigger boards has similar trade-offs

#### The ugly (which makes it possible)

- Optics connectivity can be addressed with break-out fibres
- Firmware can be written with narrow(er) counters, latching and using the HBM to buffer, and the software can gather and post-process

# **Closing words**

- The CMS central DAQ hardware, both the DTH-400 and DAQ-800, is well on its way towards Phase-2
- The DTH-400 prototypes meet clock quality and DAQ throughput requirements
- First studies look promising for the re-use of the DAQ hardware for the implementation of the trigger control system
  - Greatly reduces the engineering effort, as well as the engineering and development cost
  - Does require some small un-DAQ-like additions
  - Will involve some level of compromise on the TCDS side. Studies should show how much.





### Phase-2 CMS DAQ in numbers

#### Bottom line: high rate and enormous throughput

| CMS detector                        | Phase-1               | Phase-2              |                     |
|-------------------------------------|-----------------------|----------------------|---------------------|
| Peak average pileup                 | 60                    | 140                  | 200                 |
| L1 accept rate (max.)               | 100 $ m kHz$          | <b>500</b> kHz       | <b>750</b> kHz      |
| Event size at HLT input             | $2.0\mathrm{MB}$      | <b>7.8</b> MB        | 9.9 ${ m MB}$       |
| Event network throughput            | 1.6 $\mathrm{Tbit/s}$ | 31 $\mathrm{Tbit/s}$ | $60\mathrm{Tbit/s}$ |
| Event network buffer (60 ${ m s}$ ) | 12.0 $\mathrm{TB}$    | $234\mathrm{TB}$     | <b>445</b> TB       |
| HLT accept rate                     | 1.0 $ m kHz$          | 5.0 $ m kHz$         | <b>7.5</b> kHz      |
| HLT compute power                   | <b>0.8</b> MHS06      | <b>17</b> MHS06      | <b>37</b> MHS06     |
| Storage throughput                  | $2\mathrm{GB/s}$      | $31\mathrm{GB/s}$    | $61\mathrm{GB/s}$   |
| Storage capacity needed (1 d)       | <b>0.2</b> PB         | <b>2.0</b> PB        | <b>3.9</b> PB       |

# CMS Phase-2 DAQ and Timing Hub (DTH)

- ATCA baseboard handling power, IPMC, etc., including on-board controller
- Managed Ethernet switch to all node slots and both shelf managers
- Timing and control unit handling clock recovery, cleaning, and distribution
- DAQ unit converting from custom back-end links to commercial Ethernet



# Using the DAQ-800 to implement the TCDS

