### **Imperial College** London

# TRGGERAND DATA-ACQUISITION: PART II

UK Advanced Instrumentation Course 2022

Andrew W. Rose, Imperial College London awr01@imperial.ac.uk

## A SIMPLE TRIGGER SYSTEM: DIGITAL TRIGGERS







## A TRIGGER SYSTEM: MULTILAYER TRIGGERS





- Each stage reduces the rate, so later stages have longer latency ullet
  - Complexity of algorithms increases at each level lacksquare
  - Dead-time is the sum of the trigger dead-time, summed over the trigger levels, and ulletthe readout dead-time

### MULTILAYER TRIGGERS

- Adopted in large experiments
  - More and more complex algorithms are applied on lower and lower data rates
- Efficiency for the desired physics must be kept high AT ALL LEVELS, since rejected events are lost for ever

### MULTILAYER TRIGGERS

### Level-1















- Low latency
- Full event rate
- Small event fragment size
- Lower algorithmic complexity
- Access to coarse granularity information

| LHC experiments @ Run1 |                                      |  |  |
|------------------------|--------------------------------------|--|--|
| Experiment             | Number of Levels<br>(excl. analysis) |  |  |
| ATLAS                  | 3                                    |  |  |
| CMS                    | 2                                    |  |  |
| LHCB                   | 3                                    |  |  |
| ALICE                  | 4                                    |  |  |

- Longer latency
- Lower event rate
- Larger event fragment size
- Higher algorithmic complexity
- Access to higher granularity information







## A TRIGGER SYSTEM: MULTILAYER TRIGGERS



If your input rate is low enough



### A TRIGGER SYSTEM: MULTILAYER TRIGGERS



 And this is exactly what the CMS Trigger does

### OF COURSE, "LOW ENOUGH" IS RELATIVE...







### SYNCHRONOUS OR ASYNCHRONOUS?

- Synchronous: operates phase-locked with master clock
  - Data move in lockstep with the clock through the trigger chain
  - Fixed latency
  - The data, held in storage pipelines, are either sent forward or discarded
  - Used for L1 triggers in collider experiments, exploiting the accelerator bunch crossing clock

**Pro's**: dead-time free (just few clock cycles to protect buffers)

**X** con's: cost (high frequency stable electronics, sometimes needs to be custom made); maintain synchronicity throughout the entire system, complicated alignment procedures if the system is large (software, hardware, human...)



### SYNCHRONOUS OR ASYNCHRONOUS?

- Asynchronous: operations start at given conditions (when data ready or last processing is finished)
  - Used for larger time windows  $\bullet$
  - Average latency (with large buffers to absorb fluctuations)
  - If buffer size  $\neq$  dead-time  $\rightarrow$  lost events
  - Used for HLT

**Pro's:** more resilient to data burst; running on conventional CPUs

**Con's**: needs a timing signal synchronised to the FE to latch the data, needs time-marker stored in the data, data transfer protocol is more complex)



### SYNCHRONOUS OR ASYNCHRONOUS? WHY NOT BOTH?

- Pseudo-synchronous: operates locally phase-locked
  - Data move in lockstep through the trigger chain from a set of local clocks
  - Buffering required whenever you move between clocks
  - Clocks run slightly faster than source data to prevent overflow
  - Realignment to global clock only after the final trigger stage
  - Fixed latency

Pro's: dead-time free (just few clock cycles to protect buffers), no need for expensive globally-distributed clock, simpler alignment procedure

**X** con's: must propagate timing info with data, buffering required to handle clock-domain change



### At LEP, BC interval 22 µs: complex trigger processing was possible between BXs

### A NOTE ON TIMESCALES





- At LEP, BC interval 22 µs: complex trigger processing was possible between BXs
- Modern colliders chasing statistics
  - High Luminosity by high rate of BX
  - BX spacing too short for final trigger ightarrowdecision!
  - No mechanism to throttle data

### A NOTE ON TIMESCALES





- At LEP, BC interval 22 µs: complex trigger processing was possible between BXs
- Modern colliders chasing statistics
  - High Luminosity by high rate of BX
  - BX spacing too short for final trigger decision!
  - No mechanism to throttle data
- Trigger logic must be pipelined

### A NOTE ON TIMESCALES







### PIPELINED PROCESSING

| прш | 12pm | 01am | 02am | 03am |
|-----|------|------|------|------|
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |
|     |      |      |      |      |



### PIPELINED PROCESSING

| 10pm | 11pm | 12pm | 01am | 02am | 03am |
|------|------|------|------|------|------|
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |



### PIPELINED PROCESSING 11pm 12pm 01am 02am 03am 10pm That would just be stupid .-....



### PIPELINED PROCESSING

| 10pm | 11pm | 12pm | 01am | 02am | 03am |
|------|------|------|------|------|------|
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |
|      |      |      |      |      |      |

### BUT THIS IS PRECISELY WHAT A CPU DOES...

• To first order, the ALU of a CPU handles one instruction at a time

Shameless advertising for my FPGA lecture





Dynamic clustering



Jet building with pileup subtraction



Shape veto, H/E, isolation, calibration



Dynamic clustering



Jet building with pileup subtraction



Shape veto, H/E, isolation, calibration

### CONVENTIONAL ARCHITECTURE



Many, many details on time-multiplexing and conventional architectures in sections 1-3 of <a href="https://cds.cern.ch/record/1421552/files/IN2011\_022.pdf">https://cds.cern.ch/record/1421552/files/IN2011\_022.pdf</a> (although please note that the systems proposed in section 4-9 are very outdated and should be ignored)

- Each subsystem is regionally segmented
- Each region must talk to its neighbour
  - This is the root cause of requiring specialized boards for a given task!
- Each region of each processing layer compresses, suppresses, summarizes or otherwise reduces its data and passes it on to the next level which is less regionally segmented

### TIME-MULTIPLEXED ARCHITECTURE

- Buffer data and stream it out optimized for processing
- Spread processing over time
  - Stream-processing rather than combinatorial-logic
  - Maximise reuse of logic resources
  - Easiest for FPGA design tools to route  $\bullet$ and meet timing
- Costs you latency, bought back by ightarrowmore efficient processing



Many, many details on time-multiplexing and conventional architectures in sections 1-3 of https://cds.cern.ch/record/1421552/files/IN2011\_022.pdf (although please note that the systems proposed in section 4-9 are very outdated and should be ignored)





- LEP: 40 Mbyte/s
  - VME bus sufficient for bandwidth needs
- LHC: cutting-edge processors, highspeed network interfaces, high speed optical links
- Different approaches possible ightarrow
  - Network-based event building (CMS)
  - Seeded reconstruction (ATLAS)

### HIGH LEVEL TRIGGER ARCHITECTURE

|       | Levels | L1 rate                 | Event size | Readout<br>bandwidth                       | HL |
|-------|--------|-------------------------|------------|--------------------------------------------|----|
| LEP   | 2/3    | 1 kHz                   | 100 kB     | few 100 kB/s                               | ~  |
| ATLAS | 2/3    | 100 kHz<br>(L2: 10 kHz) | 1.5 MB     | 30 GB/s<br>(Incremental<br>Event Building) | ~  |
| CMS   | 2      | 100 kHz                 | 1.5 MB     | 100 GB/s                                   | ~  |



## HIGH LEVEL TRIGGER DESIGN PRINCIPLES

- Offline reconstruction too slow to be used directly ightarrow
  - Takes >10s per event
  - HLT usually needs << 1s
- Instead, step-wise processing with early rejection
  - Stop processing as soon as one step fails
  - Event accepted if any of the trigger passes ightarrow
  - Add a time-out to kill the Poisson tail!
- Fast reconstruction & L1-guided regional reconstruction first ightarrow
- Precision reconstruction as full detector data becomes available



## HIGH LEVEL TRIGGER DESIGN PRINCIPLES

- Event-level parallelism ullet
  - Process more events in parallel
  - Multi-processing or/and multi-threading  $\bullet$
- Algorithm-level parallelism
  - **GPUs** effective whenever large amount of data ulletcan be processed concurrently (although bandwidth can be a limiting factor)

- Algorithms developed and optimized offline
- Common HLT-reconstruction software framework reduces maintenance and increases reliability



- Approximately 38,000 cores
  - An equal mix of Haswell, Broadwell and Skylake
- Multithreading allowing the cores to share non-event data
  - Reduced memory footprint  $\rightarrow$  can process more events: ~20% higher performance
- Upgrades to add a GPU in every filter farm node is ruled out by cost and power
  - More likely a dedicated server sub-farm which does heavy tasks on demand
  - FPGAs acceleration also a (possibly better) option
- Boundary between trigger and DAQ is fuzzy, they are closely related
  - At CMS the "High Level Trigger" is part of the DAQ

### EXAMPLE: CMS HLT

- At the detector readout, data is fragmented
  - Readout PCs access data from some local ulletdetector region
  - Each PC buffers data from multiple events
- Software triggering & storage need all data for one event
- High-throughput network to reorganize data ullet
  - Using standard networking technology as ulletmuch as possible



Absolute numbers here are out of date!

J. Gutleber, Data Acquisition in High Energy Physics

### REAL-TIME ANALYSIS / SCOUTING

- We have discussed the typical trigger & DAQ paradigm  $\bullet$ 
  - Fast & coarse processing of raw data -> decide what events to keep -> store raw event data
- In CMS we have "scouting" today at HLT, at L1T also for Phase 2
  - Same concepts exist at LHCb (Turbo Stream) and ATLAS (Trigger-object-level analysis)
- Store objects computed by the trigger (L1T or HLT) for more events for later analysis
  - More events, smaller event content (don't keep raw detector data)

Raw data





- Based on the fact that HLT trigger rate was a bit lower than what the DAQ could handle
- Add some new, loose, trigger paths for specific analyses
- 'Park' the raw data -
  - Don't run full reconstruction on accepted events immediately, store the raw data
  - Process later when no triggers are arriving e.g. in between runs
- CMS, LHCb, ATLAS all use this

## DATA PARKING



Right orange arrow is scouting Talk on 'real time analysis' - C. Doglioni





### THE FUTURE: TRIGGERLESS READOUT?

- LHCb started with a hardware trigger
- Then decided they could get rid of that step as L0 trigger was introducing bias
- Back-end electronics and software filter see 40x higher rate





- DAQ should aim to minimize dead time and keep up with incoming rate
- Many choices when designing DAQ
  - e.g. zero-suppression on or off detector? Simple front-end with high output rate, or complicated front-end with lower output rate?
- Modern experiments are large detectors with many channels
  - DAQ systems are complicated
  - Many strategies for enhancing existing DAQ strategies scouting, parking, etc.
- Brute-force computing power can be the simplest and "cleanest" strategy

### DAQ MINI-SUMMARY

- Not as unusual as you might imagine!
- Some things to remember....

• You might well have to design a trigger for some physics channel you are interested in

- Keep it as simple as possible
  - Easy to commission
  - Easy to debug
  - Easy to understand

- Be as inclusive as possible
  - One trigger for several similar analyses
  - Your trigger should be able to discover th for!

• Your trigger should be able to discover the unexpected as well as the signal you intended it

- Make sure your trigger is robust
  - make sure you are prepared for it

  - Beam conditions change be prepared

• Triggers run tens of millions of times a second so ANY STRANGE CONDITION WILL OCCUR,

• Detectors don't work perfectly EVER! Make sure your trigger is immune to detector problems

- Build in redundancy
  - Make sure your signal can be selected by more than one trigger
  - Helps to understand biases and measure efficiencies
  - Also for safety, if rates are too high or there's some problem you still get your events

- Finally...Taking your signal events is only part of the game
  - You might well also need background samples
  - You will need to measure the efficiency of your trigger using a redundant trigger path You will need to know if it works! Monitoring!

• And remember...

### The goal is not to perform the analysis online – it is just to get the events written to tape at a manageable rate

### TRIGGERING: CONCLUSION

- Triggers are not new
  - but they are constantly evolving as the accelerators and detectors do
- important decision you will make
- Heterogeneous computing farms look likely to feature at HL-LHC  $\bullet$ 
  - but it is a brave new world!

• The design of how you structure the transfer of data around your system is the most

### TRIGGERING: CONCLUSION

- Triggers are not new
- but they The design important decision you will make
- Heterogeneous computing farms look likely to feature at HL-LHC
  - but it is a brave new world!

Oh, and be very suspicious if your supervisor plies you with strong coffee and gets you to look for scintillation light

s the most

Another shameless advert for my FPGA lecture on Friday!

### THANK YOU Any questions?

