#### Commissioning and operation of the upgraded Belle II DAQ system with PCI-express-based high-speed readout

Yun-Tsung Lai\* (Univ. of Tokyo, Kavli IPMU),

M. Bessner, D. Biswas, D. Charlet, O. Hartbrich, T. Higuchi, R. Itoh, E. Jules, P. Kapusta, T. Kunigo, T.-S. Lau,

D. Levit, M. Nakao, K. Nishimura, E. Plaige, S.-H. Park, H. Purwar, P. Robbe, R. Sugiura, S.Y. Suzuki, M. Taurigna,

G. Varner, S. Yamada, and Q.-D. Zhou

IEEE Real Time 2022

1st, Aug., 2022





### **SuperKEKB**

- SuperKEKB: Upgrade from KEKB.
  - More than 30 times larger luminosity of KEKB with nano beam scheme.
- Asymmetric energy collider:
  - 7.0 GeV  $e^-$  and 4.0 GeV  $e^+$  for Y(4S)  $\rightarrow B\overline{B}$ .



- Luminosity achievement:
  - $L_{peak} = 4.65 \times 10^{34} \text{ cm}^{-2}\text{s}^{-1}$ . World record. ~Two times of KEKB record.
  - $L_{int} = ~427 \text{ fb}^{-1} \text{ up to Jun. 2022.}$



#### Belle II detector

- Belle II: Newly-designed sub-detectors set to improve detection performance.
  - Tracking: PXD, SVD, CDC.
  - PID: TOP and ARICH.
  - Calorimeter: ECL.
  - $K_{L}$  and muon chamber.
  - L1 trigger and DAQ.
- Physics target of Belle II:
  - Rare B, τ, charm physics, Dark Matter search, CP Violation.
- Requirements with high luminosity for DAQ:
  - High trigger rate (~30 kHz) from L1 trigger and high background.



#### Belle II DAQ system

- Pipeline readout system, common for each sub-detector
  - Except for PXD: data reduction system with ROI.

 Target of performance: 30 kHz trigger rate from L1, ~1% of dead time, and a raw event size of 1 MB.

See later talks by **Seokhee Park** on **Storage and Express-Reconstruction system**, and by **Takuto Kunigo** on **global system control.** 

- FPGA (FEE Readout): Use universal "Belle2Link" protocol with optical links in between.
- **Back-end servers**: readout PC, HLT and storage for online procession/reconstruction.



Distributes trigger signal from L1 to all FPGA.

### Readout system and its upgrade



- 4 Xilinx Virtex-5 receiver boards.
- PrPMC: data procession, pre event building.
- In total 203 coppers were used in Belle II.

- 48 optical links.
- 2x8 PCIe Gen3.
- In total 21 PCIe40 boards will be used in Belle II.

#### **Considerations for upgrade:**

- Difficulty of maintenance:
  - Increasing number of malfunctioning pieces.
  - Many different boards in system.
  - Parts out of production already.

- Limit of the system on further improvement:
  - Output troughput by GbE: 1Gbps.
  - CPU usage: ~60% at 30 kHz trigger rate.

# Upgrade project

- PCIe40 Firmware:
  - Interface to other system: FEE, TTD, readout PC.
  - Data procession logic for formatting and first-level building.
- Changes on other systems should be minimal.

- Software in readout PC:
  - Event building.
  - Slow control for detector systems.



# Upgrade project (cont'd)



Trigger & timing distribution. Masking, run control.



# Optical link with Belle2Link protocol

- Belle2Link protocol:
  - Line rate 2.54 Gbps.
  - Framing transmission using different 8B/10B K characters.
  - Optical link and high-speed transceivers of FEE and PCIe40.
  - Two major functionalities:
    - Transferring detector FEE data w.r.t L1 trigger. crc16 and crc32 checksum included.
    - **Slow control**: exchanging register content between FEE and readout as a combination of address and payload data.



Development by IHEP: D. Sun et al., Phys. Procedia, vol. 37, pp. 1933-1939, 2012.

- Improvement: Auto-reset on the link.
  - Link recovery via CUI/GUI is time-consuming.
  - Auto-reset on transceiver by monitoring status flags: PLL lock, decoding error, disparity, etc.
  - Reliable recovery: ~100% readiness.
  - Update is based on different transceiver of FEE:

|  | Detector FEE | Transceiver                                        |  |
|--|--------------|----------------------------------------------------|--|
|  | SVD          | Spartan-6 GTP                                      |  |
|  | CDC          | Virtex-5 GTP                                       |  |
|  | ТОР          | Kintex-7 GTX                                       |  |
|  | ARICH        | Virtex-5 GTP                                       |  |
|  | ECL          | Spartan-6 GTP                                      |  |
|  | KLM          | Virtex-6 GTX                                       |  |
|  | TRG          | UT3: Virtex-6 GTX, GTH<br>UT4: UltraScale GTH, GTY |  |

#### Yun-Tsung Lai (Kavli IPMU) @ IEEE Real Time 2022

FEE

FEE

FEE

FEE

FEE

FEE

FEE X=O

PCIe40

PCIe40

Firmware

update: Auto-reset

FEE FEE

# Slow control of the system

- Belle2Link: FEE and PCIe40 exchanges information • as address/data.
- Slow control software. •
  - Runs in readout PC, and controls Belle2Link.
  - NSM2: Network Shared Memory v.2. Define address/data as variables.
  - Configuration and monitoring for each detector.
- Integrated in Belle II global run control. •

Belle2Link

Logged by EPICS. ٠

#### GUI for Belle II global run control



Detector

system

FEE

### Interface to TTD system



- PCIe40: Information of all 48 channels needs to be reported:
  - New address scheme to merge 48x info.

- Clock, and the signals (to be driven by the same clock source) are distributed by TTD system to PCIe40:
  - Stability affected by external noise in Electronic Hut.
- Improvement: Intel Serdes IPcore with on-board clock.
  - Stable under external noise.
  - Soft-CDR to handle jitter.
  - Reduce operation down time.





# Upgrade and commissioning with PCIe40 in Belle II



#### Belle II detectors with PCIe40

- PCIe40 upgrade for each Belle II detector.
- 1 PCIe40 is connected to 1 readout PC.
- TRG has a fixed data size.
- ECL, CDC, SVD: Data size is obtained from cosmic run in 2022 summer.

| Detector | Total # of links | # of PCIe40<br>(readout PC) | Event size (kB) |                        |
|----------|------------------|-----------------------------|-----------------|------------------------|
| ТОР      | 72               | 2                           | 19.1            | From 2022ab            |
| KLM      | 32               | 1                           | 1.3             | physics data<br>taking |
| ARICH    | 72               | 2                           | 5.9             |                        |
| TRG      | ~20              | 1                           | 3.4             | Fixed data size        |
| ECL      | 53               | 3                           | 4.4             | From cosmic            |
| CDC      | 300              | 7                           | 1.5             | run                    |
| SVD      | 52               | 5                           | 7.1             |                        |

### Upgrade specific for detector: ARICH

- ARICH system:
  - 5~6 FEB  $\rightarrow$  1 Merger  $\rightarrow$  Belle2Link  $\rightarrow$  PCIe40
  - JTAG of FEB is controlled by Merger.
- Special slow control design of Belle2Link: Transferring an entire file.
  - FEB firmware bitstream is transferred to Merger for each byte one-by-one.
  - Then Merger downloads the firmware to FEB via JTAG connection.

- Original Copper readout: 4 Mergers  $\rightarrow$  1 Copper.
  - This FEB configuration process was done for each Merger one-by-one.
  - Consumed time: ~1.5 min.
- PCIe40 readout: 36 Mergers  $\rightarrow$  1 PCIe40.
  - Parallel slow control processes for all 36 Mergers.
  - The same consumed time: ~1.5 min.



# Validation for ARICH PCIe40 system

- Threshold setup of ASIC chips in FEB is changed by slow control software in readout PC.
  - Hit rate per channel for each step of threshold value.

- Threshold scan as a validation for both:
  - Slow control: check how the ASICS configuration is correctly done by PCIe40 and software.
  - Data taking: check the data from Copper and PCIe40 are consistent or not.









# Upgrade specific for detector: TRG

- L1 TRG: Different from other detector systems using single type of FEE and its firmware.
  - 2 Universal Trigger boards: UT3 and UT4.
  - 4 types of transceivers.
  - Several complicated firmwares for trigger logic.

| TRG module                  | Board | Transceiver         |
|-----------------------------|-------|---------------------|
| 2D tracker (x4)             | UT4   | UltraScale GTY      |
| 3D tracker (x4)             | UT3   | Virtex-6 GTH        |
| Neural 3D tracker (x4)      | UT3   | Virtex-6 GTH        |
| Event Timing Finder         | UT4   | UltraScale GTY      |
| Track Segment Finder (x9)   | UT4   | UltraScale GTY, GTH |
| Global Reconstruction Logic | UT3   | Virtex-6 GTX        |
| Global Decision Logic       | UT3   | Virtex-6 GTX        |
| TOP Trigger (x2)            | UT3   | Virtex-6 GTX        |





UT3 Xilinx Virtex-6 GTX, GTH UT4 Xilinx UltraScale GTH, GTY

- Based on the difference of transceiver IPcore interface and the property:
  - Belle2Link and transceiver auto-reset scheme require adaption.
- Update on all TRG modules' firmware have been complete in June 2022.

# Validation for TRG PCIe40 system

- For the validation of TRG system, we took cosmic runs with both copper and PCIe40.
  - The taken data are processed by the Data Quality Monitor (DQM) programs.
  - Check if the histograms from copper and PCIe40 are consistent.



### Performance of the new system in 2022ab

- PCIe40 upgrade:
  - TOP, KLM: from 2021c.
  - ARICH: from 2022ab.
- Overall running time fraction in 2022ab physics data taking: **92.6%**.
  - Restarting run: 3%.
  - System (detector or HV) problem: ~4%.
  - No major down time due to PCIe40.
- PCIe40 system:
  - PCIe40  $\rightarrow$  readout PC via PCI-express: ~3.9 GB/s.
  - Throughput in Belle II DAQ: 630 MB/s per readout PC.
  - Much improved from original Copper system.





#### 2022/08/01

#### Summary and plan in LS1

• The upgrade of readout system in Belle II DAQ using a new system based on PCIe40 is under progress.

• Development on PCIe40 firmware and software have been complete and validated including those specific for each sub-detector.

- TOP, KLM, and ARICH detectors have been running stable in physics data taking.
  - The rest of sub-detectors also finished the replacement in this summer, and commissioning is ongoing.

- Commissioning with entire Belle II system will be done in LS1 (up to autumn 2023).
  - Also further improvements in TTD link stability, double PCIe bandwidth, new event builder scheme, etc, will be done.
  - Plan for PXD to utilize PCIe40 to rescue slow pion is also under discussion.