

# ATLAS Views for Off-detector Track Trigger Electronics

#### FTK in Phase-I, speed up for Phase-II

Jinlong Zhang for the ATLAS Collaboration





### **Phase-II Trigger Architecture**



- A trigger architecture with L1 tracker trigger (L1TT) proposed in the ATLAS Phase-II upgrade LoI
- The baseline requirements
  - L0 output rate (L0A) at least 500kHz , L1 output rate (L1A) at least 200kHz
  - L0 latency ~ 6 μs , L1 latency ~ 20 μs

#### **Performance Simulation**

| Object(s) | Trigger   | Estimated Rate |                        |
|-----------|-----------|----------------|------------------------|
|           |           | no L1Track     | with L1Track           |
| е         | EM20      | 200 kHz        | 40 kHz                 |
| γ         | EM40      | 20 kHz         | 10 kHz*                |
| μ         | MU20      | > 40  kHz      | 10 kHz                 |
| τ         | TAU50     | 50 kHz         | 20 kHz                 |
| ee        | 2EM10     | 40 kHz         | < 1  kHz               |
| γγ        | 2EM10     | as above       | $\sim 5  kHz^*$        |
| eμ        | EM10_MU6  | 30 kHz         | $< 1  \rm kHz$         |
| μμ        | 2MU10     | 4 kHz          | < 1  kHz               |
| au	au     | 2TAU15I   | 40 kHz         | 2 kHz                  |
| Other     | JET + MET | $\sim 100kHz$  | $\sim 100\mathrm{kHz}$ |
| Total     |           | $\sim 500kHz$  | $\sim 200\mathrm{kHz}$ |

The expected Level-1 trigger rates at 7 x 10<sup>34</sup> cm<sup>-2</sup>s<sup>-1</sup>





03/19/2014

### **Double Buffer Implementation**



- L0 derandomization buffer running at 40MHz
- L1 buffer running at L0A rate (500kHz 1MHz)
- Regional Readout Request (R3) at ~10% of LOA (50-100 kHz)
- Full detector readout at L1A rate (> 200 kHz)

### **Latency Estimation**

|                                       | Latency (µs) | Cumulative Latency (μs) |
|---------------------------------------|--------------|-------------------------|
| LOA formation                         | 3.0          |                         |
| Rol mapping and transmission to ITK   | 1.25         | 4.25                    |
| R3 readout from ITK                   | 6.00         | 10.25                   |
| Transmission to L1TT                  | 2.00         | 12.25                   |
| Tracking in L1TT                      | 6.00         | 18.25                   |
| L1A formation with L1MU, L1CALO, L1TT | 1.00         | 19.25                   |

- No L1TT processing needed for some calorimeter and muon TOBs
- Enough processing units and queues for the peak rate as well as the average rate
- Complex deadtime and rate limits dictated by queues

#### **R3 Readout Scheme**

- Doable in the latency budget for ITK pixel detector; even feasible to read out the full Pixel detector at the LOA rate
- Challenging for ITK strip tracker, mainly the endcap
- Different strategies are studied to reduce the traffic
  - Prioritization on HCC for the R3 data flow wrt the L1 data
  - Increasing HCC FIFO depth to absorb fluctuations and to fast clear the daisy chain

Chip

- Increasing the number of daisy-chain links
- Increasing the HCC output bandwidth

Chip

Chip

Redundant links



#### **R3 Latency**





160Mbps per hybrid + 4 r/o links to HCC

- 320Mbps and 4 links to ensure readout in 5 μs
- Mixed configuration possible

# L1TT System



- Processing ITK clusters , finding tracks thus refining objects associated with LO RoIs
- A FTK like system but
  - A L1 system requiring shorter timing and higher parallelism
  - Using partial event since Rol based
- The tight latency
  - Input data organized in  $\eta$ - $\phi$  (RoI)
  - Track fitting likely still needed

## **FTK-Like Implication**

FTK strategy with current ATLAS Inner Detector

- -PIX 3, SCT axial 4, one SCT stereo
- -IBL, Other SCT stereo layer extrapolation & 12-layer fit
- Phase-II Lol layout
  - -Barrel: 4 pixel layers, 3 short strip layers and 2 long strip layers
  - -Endcap: 6 pixel discs and 7 strip discs each side

#### Current ATLAS ID

-PIX: ~80M channels; SCT: ~6M channels

Phase-II Lol layout

-Pixel: ~434M channels; strip: ~49M channels

Larger throughput Larger pattern capability & fitting power Different architecture (layer combination, etc)



9

#### 03/19/2014

z=+168 mm

z=0

#### Segmentation

- FTK processing the full event with 64 towers in parallel
- L1TT processing Rol data with towers in size of the Rol as largest
  - Smaller to control the bandwidth per unit
  - Larger to increase P<sub>T</sub> cutoff
  - Overlap importance differing
- A typical Rol covering 0.3X0.3 of  $\eta \phi$
- $P_T$  cutoff critical to physics
  - 0.05X0.05 roughly corresponding to a 4 GeV track, full overlap to 2 GeV
  - 0.05X0.05 segmenting  $\varphi$  by ~126 and  $\eta$  by ~100 (12600 towers)



ATLAS Simulation

z=-168 mm

### Bandwidth

- Simulation of 200 pileups
- 500 kHz of LOA
- ITK R3 readout only
- Tower as Rol size of 0.3X0.3
- Packet length of 15 bit per cluster
- For simulated Rol size the pattern matching unit receiving 1.4 Gbs



#### bandwidth tower = 50 kHz x <#clusters/tower> x 15bit/cluster

11

# **Timing Budget**

- 1 μs for hit loading into AM chips
  - Dictated by pileup and ITK geometry
  - Guide the parallelism (segmentation) by the AM chip speed and the number of clusters
- 2 μs for road producing
  - Doable based on FTK design
- 2 μs for track fitting in DSPs
  - Dictated by the combinational of AM output
  - Parallel to road producing
  - Could be balanced by increasing AM chips
- 1 μs duplicate removal

#### **Pattern Size**

• One-step architecture (pattern matching with all ID layers) unaffordable for FTK; likely still challenging for L1TT

- Patten matching and extrapolation (current FTK architecture)
- Multi step pattern matching
- An architecture like FTK optimized with pattern size (AM chip capability) and fitting power (FPGA capability) for 3X10<sup>34</sup> luminosity
  - FTK: ~64X16M patterns, ~64X16K roads/event, ~ 64X80K fits/event
  - FTK: 8K AMchip6 chips
- Pattern extrapolation approximately as the following, road/fit extrapolation worse but ameliorated by the RoI concept

$$N_{Pattern} \propto N_{Pileup} \bullet \frac{1}{P_{t}} \bullet N_{Layer} \bullet \frac{N_{Strip}}{d_{Strip}^{2}}$$
  
FTK → L1TT X 2-3 1 GeV → 8 →? X >6



#### Pattern Matching Hardware

|               | Associative Memory | Ternary CAM             |
|---------------|--------------------|-------------------------|
| Hit address   | encoded            | Unencoded               |
| Address space | ≤18 bit            | ≤(10 bit)*              |
| Input         | sequential         | parallel                |
| Output        | sequential         | sequential or parallel  |
| Speed         | 100 MHz            | 2.4 billions per second |
| Memory size   | 128 KB             | ~10 MB                  |
| Pattern       | 128 K (AMChip6)    | ~256 K (NLA12000)       |





- Design achievable with constraints
- More capable devices foreseen from R&D, advanced algorithms to be explored (variable resolution, etc)

# P<sub>T</sub> Filter

- P<sub>T</sub> filter capability imposing great challenges on detector layout, but
- Required for self-seeded L1TT as well as critical for RoI based L1TT
  - To reduce hit data throughput (all double layers)
    - A factor of ~10 from clusters to stubs
  - To reduce the number of roads thus the fit combinatorial (2 or 3 double layers)
    - ~ 3 orders of magnitude for high  $P_{T}$  (e. g., 10 GeV)
  - To perform PT filtering off-detector possible (2 or 3 double layers with modules aligned to  $r-\phi$ )
- The ABCn130 containing Fast Cluster Finder algorithm
- Studies ongoing to see the effect on the reduction of fake combinations by correlating clusters on adjacent layers (with and without stereo layers)



#### **Preliminary Results**

#### **Efficiency of 5GeV muons**

#### No stereo layer

#### **Stereo layer**



#### **Rejection on 200 PU min-bias events**



### Summary

- A FTK like architecture is feasible for L1TT
- The latency imposes the most critical challenge, as well as the throughput
- More concrete specs require detailed simulation and further system studies