

### **Trigger & DAQ**



Wesley H. Smith *U. Wisconsin – Madison* TIPP 2011, Chicago June 14, 2011

#### **Outline:**

- Challenges for Trigger & DAQ at the LHC
- Tools: µTCA, FPGAs, Transceivers
- LHC Experiments Trigger & DAQ
- LHC evolution & challenges
- Upgrades for LHC Experiments' Trigger & DAQ



### LHC Trigger & DAQ Challenges





Challenges\*: 1 GHz of Input Interactions

Beam-crossing every 25 ns with ~ 23 interactions produces over 1 MB of data

Archival Storage at about 300 Hz of 1 MB events

\*At L=10<sup>34</sup>, now at 10<sup>33</sup> with 50 ns bunch spacing



pulse shape



In-time pile-up: particles from the same crossing but from a different pp interaction

super-

impose

- Long detector response/pulse shapes:
  - "Out-of-time" pile-up: left-over signals from interactions in previous crossings

Tn-time

10 11 12 13 14 15 16 17

pulse

 Need "bunch-crossing identification"

9

78

t (25ns units)



3456

0 1 2

### **Challenges: Time of Flight**



#### c = 30 cm/ns $\rightarrow$ in 25 ns, s = 7.5 m



**TIPP** 2011





# HEP tools for high rate experiments: µTCA



- Advanced Telecommunications Computing Architecture ATCA
- µTCA Derived from AMC std.
  - Advanced Mezzanine Card
  - Up to 12 AMC slots
    - Processing modules
  - 1 or 2 MCH slots
    - Controller Modules
- 6 standard 10Gb/s point-to -point links from each slot to hub slots (more available)
- Redundant power, controls, clocks
- Each AMC can have in principle (20) 10 Gb/sec ports
- Backplane customization is routine & inexpensive

#### Typical MicroTCA Crate with 12 AMC slots



#### Single Module (shown): 75 x 180 mm Double Module: 150 x 180mm







### **FPGAs: Transceivers**



#### **E** XILINX.

#### Challenge:

- Increase device BW
- No increase in total device power
- XCVR gains from scaling: negligible

#### Solution:

- Careful circuit design throughout XCVR
- Increased Gbps / XCVR
- More XCVR / Device
- Low power mode for short channels
- Lanes share a PLL vs PLL per lane

#### Result:

- 60% Increased max device BW
- Device XCVR power unchanged

|                         | GTP  | GTX     | GTH  | GT28 |
|-------------------------|------|---------|------|------|
| Max Rate (Gbps)         | 3.75 | 10.3125 | 13.1 | 28   |
| Relative Power (Per GT) | .35x | .7x     | 1x   | -    |
| Max GTs per Device      | 4    | 56      | 72   | -    |



### **Challenges: Firmware**



Storage for an Experiment's Firmware

- Local repositories vs. Global repository
- Large volume of experiment firmware
- **Version Control**

**T**<sub>1</sub>**P**<sub>2</sub>**01**<sup>1</sup>

- Variety of Methods used (CVS, SVN...)
- **Documentation** 
  - No standard method for documenting code or keeping up to date
- **Verification & Testing** 
  - HDL & SW driven test-benches for simulation
  - Hardware testing requires test systems emulating experiment environment.
- **Obsolescence of OS, HW & SW environments for compiling FW** 
  - Older versions of tools become obsolete or no longer supported, platforms become obsolete, but may be required to program older FPGAs, May not be able to port older FW to newer FPGAs, licensing issues.
- **Individual Firmware Designers** 
  - FW designed by one person, maybe only one who understands design
- Experiments typically have wide variety of devices, tools, platforms
  - Difficult for engineers to collaborate on/assist each others designs
  - Complicates M&O
- **Techniques for validating downloaded FW** 
  - Test before downloading & check that was downloaded properly



### ATLAS Three Level Trigger Architecture





- LVL1 decision made with <u>calorimeter</u> data with coarse granularity and <u>muon trigger</u> <u>chamber</u> data.
  - Buffering on detector
- LVL2 uses <u>Region of Interest</u> <u>data</u> (ca. 2%) with full granularity and combines information from all detectors; performs fast rejection.
  - Buffering in ROBs
- EventFilter refines the selection, can perform event reconstruction at full granularity using latest alignment and calibration data.
  - Buffering in EB & EF

# **CMS Trigger & DAQ**



### **Overall Trigger & DAQ Architecture: 2 Levels:**



Wesley Smith, U. Wisconsin June 14, 2011

**T**<sup>†</sup>**P**<sup>2011</sup>

Trigger & DAQ - 11

# LHCb Trigger & DAQ





Level 0: Hardware Both Software Levels run on commercial PCs

#### Level-1:

- 4.8 kB @ 1.1 MHz
- uses reduced data set: only part of the sub-detectors (mostly Vertex-detector and some tracking) with limitedprecision data
- reduces event rate from 1.1 MHz to 40 kHz, by selecting events with displaced secondary vertices

#### High Level Trigger (HLT)

- 38 kB @ 40 kHz
- uses all detector information
- reduces event rate from 40 kHz to 200 Hz for permanent storage

**T**<sub>1</sub>**P**<sub>2</sub>**0**<sub>11</sub>

## Alice Trigger & DAQ



· different groups of detectors (clusters) are reading out different events at same time

**T**<sub>1</sub>**P**<sub>2</sub>**01**<sub>1</sub>



### TiPp 2011 Of the upgrades: ~2010-2020



#### Phase 1:

- Goal of extended running in second half of the decade to collect ~100s/fb
- 80% of this luminosity in the last three years of this decade
- About half the luminosity would be delivered at luminosities
   above the original LHC design luminosity
- Trigger & DAQ systems should be able to operate with a peak luminosity of up to 2 x 10<sup>34</sup>

#### Phase 2: High Lumi LHC

- Continued operation of the LHC beyond a few 100/fb will require substantial modification of detector elements
- The goal is to achieve 3000/fb in phase 2
- Need to be able to integrate ~300/fb-yr
- Will require new tracking detectors for ATLAS & CMS
- Trigger & DAQ systems should be able to operate with a peak luminosity of up to 5 x 10<sup>34</sup>



# **CMS Upgrade Trigger Strategy**



#### Constraints

- Output rate at 100 kHz
- Input rate increases x2/x10 (Phase 1/Phase 2) over LHC design (10<sup>34</sup>)
  - Same x2 if crossing freq/2, e.g. 25 ns spacing  $\rightarrow$  50 ns at  $10^{34}$
- Number of interactions in a crossing (Pileup) goes up by x4/x20
- Thresholds remain ~ same as physics interest does
- Example: strategy for Phase 1 Calorimeter Trigger (operating 2016+):
  - Present L1 algorithms inadequate above 10<sup>34</sup> or 10<sup>34</sup> w/ 50 ns spacing
    - Pileup degrades object isolation
  - More sophisticated clustering & isolation deal w/more busy events
    - Process with full granularity of calorimeter trigger information
  - Should suffice for x2 reduction in rate as shown with initial L1 Trigger studies & CMS HLT studies with L2 algorithms
- Potential new handles at L1 needed for x10 (Phase 2: 2020+)
  - Tracking to eliminate fakes, use track isolation.
  - Vertexing to ensure that multiple trigger objects come from same interaction
  - Requires finer position resolution for calorimeter trigger objects for matching (provided by use of full granularity cal. trig. info.)







# **Table for 2E34 (v. preliminary!)**Desired trigger thresholds for single, double object triggers and corresponding rates

- Threshold limiting physics is for higgs studies
  - 30-40 GeV thresholds to trigger on W, Z (assoc. or decay)

CMS Cal. Trig. Threshold

- Tau and b-jets play a role, especially MSSM
- For now, thresholds for double triggers defined as half of the single object threshold
- max total L1 rate is 100 kHz from all triggers (e.g. muon)

| Туре            | Threshold | Upgrade<br>Trigger rate | Present<br>Trigger rate | 80% Eff. Point (isoEG)<br>75% Eff. Point<br>(isoTau) |
|-----------------|-----------|-------------------------|-------------------------|------------------------------------------------------|
| isoEG (single)  | 30 GeV    | 8 kHz                   | 28 kHz                  | 37 GeV                                               |
| isoEG (double)  | 15 GeV    | 2 kHz                   | 12 kHz                  | 20 GeV                                               |
| isoTau (single) | 60 GeV    | 23 kHz                  | 29 kHz                  | 85 GeV                                               |
| isoTau (double) | 30 GeV    | 5 kHz                   | 29 kHz                  | 45 GeV                                               |

**TiPp 201**1





Fully Pipelined: Compact Calorimeter Trigger

Time Multiplexed Trigger:









### Time Multiplexed Calorimeter Trigger





#### TiPp 2011 UTCA CMS Calorimeter Trigger Demonstrators





<image>

← processing cards with 160
Gb/s input & 100 Gb/s
output using 5
Gb/s optical
links.

four trigger prototype cards integrated in a backplane fabric to demonstrate running & data exchange of calorimeter trigger algorithms →



# TiPp 2011

### CMS Muon Trigger Upgrades: Endcap Muon CSC's



#### Improve redundancy

- Add station ME-4/2 covering η=1.1 1.8
- Critical for momentum resolution

# Upgrade electronics to sustain higher rates

- New Front End boards for station ME-1/1
- Forces upgrade of downstream EM electronics
  - Particularly Trigger & DAQ Mother Boards
- Upgrade Muon Port Card and CSC Track Finder to handle higher stub rate so can process all tracks

#### Extend CSC Efficiency into η=2.1-2.4 region

 Robust operation requires TMB upgrade, unganging strips in ME-1a, new FEBs, upgrade CSCTF+MPC





Phase-1 upgrade lowers the rate and provides some control but above 30 GeV it gets flat again with L1 muon resolution  $\rightarrow$  concern for Phase 2

#### T<sub>I</sub>Pp 2011 Expected Pile-up at High Lumi LHC in ATLAS at 10<sup>35</sup>





- 230 min.bias collisions per 25 ns. crossing
- ~ 10000 particles in  $|\eta| \leq 3.2$
- mostly low p<sub>T</sub> tracks
- requires upgrades to detectors

WISCONSIN

# **Detector Luminosity Effects**



#### $H{\rightarrow}ZZ \rightarrow \mu\mu ee,\,M_{H}\text{=}$ 300 GeV for different luminosities in CMS



**TIPP** 2011



# CMS Level-1 Trigger -> 5x10<sup>34</sup>



- Degraded performance of algorithms
  - Electrons: reduced rejection at fixed efficiency from isolation
  - Muons: increased background rates from accidental coincidences
- Larger event size to be read out
  - New Tracker: higher channel count & occupancy  $\rightarrow\,$  large factor
  - Reduces the max level-1 rate for fixed bandwidth readout.

#### **Trigger Rates**

- Try to hold max L1 rate at 100 kHz by increasing readout bandwidth
  - · Avoid rebuilding front end electronics/readouts where possible
    - + Limits: (readout time) (< 10  $\mu s$ ) and data size (total now 1 MB)
  - Use buffers for increased latency for processing, not post-L1A
  - May need to increase L1 rate even with all improvements
    - Greater burden on DAQ
- Implies raising E<sub>T</sub> thresholds on electrons, photons, muons, jets and use of multi-object triggers, unless we have new information ⇒Tracker at L1
  - Compensate for larger interaction rate & degradation in algorithm performance
  - Increase Level-1 Trigger Latency 3.2 → 6.0 µsec to accommodate processing
    - New tracker removes 3.2 µsec limit, next limit is ECAL



Wesley Smith, U. Wisconsin June 14, 2011







Trigger & DAQ - 30





### Combine with L1 $\mu$ trigger as is now done at HLT:

- •Attach tracker hits to improve P<sub>T</sub> assignment precision from 15% standalone muon measurement to 1.5% with the tracker
  - •Improves sign determination & provides vertex constraints
- •Find pixel tracks within cone around muon track and compute sum  $P_T$  as an isolation criterion
  - Less sensitive to pile-up than calorimetric information if primary vertex of hard-scattering can be determined (~100 vertices total at SLHC!)
- To do this requires  $\eta \phi$  information on muons finer than the current 0.05–2.5°
  - •No problem, since both are already available at 0.0125 and 0.015°

# The Track Trigger Problem

 Need to gather information from 10<sup>8</sup> pixels in 200m<sup>2</sup> of silicon at 40 MHz

**T**<sub>1</sub>**P**<sub>2011</sub>

- Power & bandwidth to send all data off-detector is prohibitive
  - Local filtering necessary
  - Smart pixels needed to locally correlate hit P<sub>t</sub> information
- Studying the use of 3D electronics to provide ability to locally correlate hits between two closely spaced layers







### **3D Interconnection**





No "horizontal" data transfer necessary – lower noise and power

Fine Z information is not necessary on top sensor – long (~1 cm vs ~1-2 mm) strips can be used to minimize via density in interposer



## **Track Trigger Architecture**

- Readout designed to send all hits with P<sub>t</sub>>~2 GeV to trigger processor High throughput – micropipeline architecture
- Readout mixes trigger and event data
- Tracker organized into phi segments
  - Limited FPGA interconnections
  - Robust against loss of single layer hits
  - Boundaries depend on p<sub>t</sub> cuts & tracke geometry









# **Track Trigger Architecture**



#### "push" path:

- L1 tracking trigger data combined with calorimeter & muon trigger data regionally with finer granularity than presently employed.
- After regional correlation stage, physics objects made from tracking, calorimeter & muon regional trigger data transmitted to Global Trigger.
- "pull" path:
  - L1 calorimeter & muon triggers produce a "Level-0" or L0 "pre-trigger" after latency of present L1 trigger, with request for tracking information. Occurs at ~1 MHz. Request only goes to regions of tracker where candidate was found. Reduces data transmitted from tracker to L1 trigger logic by 40 (40 MHz to 1 MHz) times probability of a tracker region to be found with candidates, which could be less than 10%.
  - Tracker sends out information for these regions only & this data would be combined in L1 correlation logic, resulting in L1A combining tracking, muon & calorimeter information.
  - Only on-detector tracking trigger logic in specific tracker region would see L0 signal.
- "afterburner"path:
  - L1 Track trigger info, along with rest of information provided to L1 is used at very first stage of HLT processing. Provides track information to the HLT algorithms very quickly without having to unpack & process large volume of tracker information through CPU-intensive algorithms. Helps limit the need for significant additional processor power in HLT computer farm.





low pT

offset=2

high pT

offset=0

## Various projects being pursued:

- Track trigger
  - Fast Track Finder (FTK), hardware track finder for ATLAS (at L1.5)
  - ROI based track trigger at L1
  - Self seeded track trigger at L1
- Combining trigger objects at L1 & topological <sup>f</sup> analysis<sup>-</sup>
- Full granularity readout of calorimeter
  - requires new electronics
- Changes in muon systems (small wheels), studies of an MDT based trigger & changes in electronics
- Upgrades of HLT farms

Some of the changes are linked to possibilities that open when electronics changes are made (increased granularity, improved resolution & increased latency)





## Phase I: upgrade current L1Calo

- FPGA-based MCM replacement for PreProcessor
- Augment EM/Had and Jet/Energy processors with CMM++ to add topological algorithm capabilities
  - Replacement for present trigger data Common Merger Module

# Phase II: Replace L1Calo with 2-level system

- Full digital readout of LAr, Tile data to Readout Drivers (RODs) in underground counting room (USA15)
- "Level 0": Synchronous, fixed latency, Topological algorithms with calorimeters + muon ROIs
  - Uses trigger towers (0.1 $\times$ 0.1) w/finer  $\eta \times \phi$ , depth segmentation.
- "Level 1": Asynchronous, longer latency, access to full resolution calorimeter data, Topological algorithms with calo, muon and ID ROIs
  - Improved ID of isolated electrons, hadrons identified by L0

wesley Smith, u Ail Constitute Silanilar performance to present L2

#### **TIPP** 2011 **ATLAS Muon Trigger Upgrade**

10

8

4



#### MDT precision can be used for L1 sharpening

- Present ATLAS muon trigger based on RPCs only.
- Use RPC L1 trigger as "seed". MDTs only verify  $p_{T}$  on request from RPC
  - No stand-alone trigger of Monitored Drift Tubes
- Use RPC hits to define a search road for corresponding MDT hits
- Need extra latency of ~ 2  $\mu$ s (Phase 2)

#### **Benefits:**

- No additional trigger chambers required in Barrel
- No interference with normal readout

#### Hardware consequences: concept needs

- rebuilding of MDT electronics
- modification of parts of RPC electronics (PADs, Sector Logic).
- **Requires new chips & boards:** 
  - New front end board (mezzanine)
  - **New Chamber Service Module**
  - New architecture of RPC/TowerMaster
  - interface to RPC readout







# For Phase 1:

- Dedicated hardware processor completes GLOBAL track reconstruction by beginning of level-2 processing.
  - Allows very rapid rejection of most background, which dominates the level-1 trigger rate.
  - Frees up level-2 farm to carry out needed sophisticated event selection algorithms.

## Addresses two time-consuming stages in tracking

- Pattern recognition find track candidates with enough Si hits
  - 10<sup>9</sup> prestored patterns simultaneously see each silicon hit leaving the detector at full speed.
- Track fitting precise helix parameter &  $\chi^2$  determination
  - Equations linear in local hit coordinates give near offline resolution

# **ATLAS FTK Approach**



#### Use hardware to perform the global tracking in two steps pattern recognition and track fit



Pattern recognition in coarse resolution<br/>(superstrip $\rightarrow$ road)Track fit in full resolution (hits in a road)<br/> $F(x_1, x_2, x_3, ...) \sim a_0 + a_1 \Delta x_1 + a_2 \Delta x_2 + a_3 \Delta x_3 + ... = 0$ Design: FTK completes global tracking in 25 µsec at 3×10<sup>34</sup>.Current level-2 takes 25 msec per jet or lepton at 3×10<sup>34</sup>.

**T**<sub>1</sub>**P**<sub>2</sub>**01**<sup>1</sup>

# TiPp 2011ATLAS L1 Track TriggerDesign Options for Phase 2



# **Region Of Interest based Track Trigger at L1**

- uses ROIs from L1Calo & L1Muon to seed track finding
- has a large impact on the Trigger architecture
  - requires significantly lengthened L1 pipelines and fast access to L1Calo and L1Muon ROI information
  - could also consider seeding this with an early ("Level-0") trigger, or sending a late ("Level-1.5") track trigger
- smaller impact on Silicon readout electronics
- Self-Seeded Track Trigger at L1
  - independent of other trigger information
  - has a large impact on Silicon readout electronics
    - requires fast access to Silicon detector data at 40 MHz
  - smaller impact on the Trigger architecture







# L0 similar to current L1-Calo & L1-Muon defines regions of interest (Rols)

- There is no inner detector (tracking) information in the Rol definition
- Rol defines an eta-phi region for strips & pixel information to be extracted
- L1 uses inner detector information from Rols that were defined in L0
  - Can also do a detailed correlation with outer detector



Rol: Δφ=0.2, Δη=0.2 at Calo Δz=40cm at beam line

Wesley Smith, U. Wisconsin June 14, 2011

Trigger & DAQ - 43

# ATLAS Self-Seeded L1 Track Trigger with Doublet Layers



Track (hi pT)

- Moderate pT dependent Track (low pT) discrimination of hits using coincidences in closely Layer B spaced double layers
- High pT discrimination using coincidences between several doublet layers
- Has to operate at full BCO frequency (40 MHz)





TPP 2011

#### TiPp 2011 ATLAS Self-Seeded L1 Track Trigger: One possible solution



Split the readout chip and add an embedded fine pitch interconnection



# LHCb Upgrade Trigger



# Execute whole trigger on CPU farm Provide ~40 MHz detector readout

- Cannot satisfy present 1 MHz requirement w/o deeply cutting into efficiency for hadronic final states
  - worst state is  $\phi\phi$ , but all hadronic modes are affected
  - Can ameliorate this by reading out detector & then finding vertices
- Keep Low Level Trigger (LLT) as a crutch if HLT cannot keep up with rate, i.e. not sufficient computing. Similar to current L0
- Cut Outer Tracker occupancy >20% to preserve timing
- Timing reqm't < 20 ms, vertexing & tracking is <10 ms, leaving time for HLT2</li>
- HLT1 similar to current, but pixels speed up reconstruction due to lack of ambiguities & eliminate ghosts
- HLT2 also similar but increase to 20 kHz output rate



**T**<sub>1</sub>**P**<sub>2011</sub>



# **CMS DAQ**





## **High Level Trigger on full events** Store accepted events @ 300-400 Hz

detector Front-**End Drivers Event Building (in** two stages)

- 1 "FED-builder" assembles data from 8 frontends into one super-fragment at 100 kHz
- 8 independent "DAQ slices" assemble superfragments into full events
- 500 Inputs: 100 Gbyte/s EVB



# **CMS HLT Time Distribution**



#### Prescale set used: 2E32 Hz/cm<sup>2</sup> Sample: MinBias L1-skim 5E32 Hz/cm<sup>2</sup> with 10 Pile-up







## Phase 2 Network bandwidth at least 5-10 times LHC

- Assuming L1 trigger rate same as LHC
- Increased Occupancy
- Decreased channel granularity (esp. tracker)
- **CMS DAQ Component upgrades** 
  - Readout Links: replace existing SLINK (400 MB/s) with 10 Gbit/s
  - Present Front End Detector Builder & Readout Unit Builder replaced with updated network technology & mult-gigabit link network switch
  - Higher Level Trigger CPU Filter Farm estimates:
    - 2010 Farm = 720 Dual Quad Core E5430 16 GB (2.66 GHz)
    - 2011 Farm = add 288 Dual 6-Core X5650 24 GB (2.66 GHz)
      - 1008 nodes, 9216 cores, 18 TB memory @100 kHz: ~90 ms/event
    - 2012 Farm =  $3 \times$  present farm
    - 2016 Farm = 3 × 2012 farm
      - Requires upgrades to network (40 Gbps links now affordable)

# Tipp 2011 Extrapolating PC performance Image: Comparison of the time of time of the time of time o



#### Extrapolate performance dual-processor PCs In 2014 could have same HLT performance with 100 – 200 nodes Likely to have 10 GbE onboard

Wesley Smith, U. Wisconsin June 14, 2011



### Being developed for CMS HCAL & some of the Trigger sub-systems

A candidate for a CMS "common platform"

# Send data to central DAQ over multi-gbps serial link (6 Gbps in prototype)



# **ATLAS Upgrade DAQ**



## One project explores full capabilities of large modern FPGAs for versatile generic DAQ with its core effort named as Reconfigurable Cluster Element (RCEs), implemented on ATCA platform.

**First generation** boards in use on **SLAC LCLS** experiments, LSST DAQ, PetaCache proj. Studying possible use for **ATLAS** pixel upgrade

Board shown here with 1TB FlashRAM for PetaCache project







- Very significant challenges to operate trigger & DAQ systems for high rate experiments, particularly examples shown for the LHC
- Very substantial assets to bring to bear on these challenges from commercial world: µTCA, FPGAs, high speed links (transceivers).
- Exploiting these assets enables physics input to drive much more precise selection of events and processing of a much higher volume of data.
  - e.g. a level-1 tracking trigger

There is considerable technical difficulty involved in successfully exploiting these advances in technology and implementing them in running experiments in a controlled and adiabatic manner.