### CRU for FoCal

Ken Oyama

Nagasaki Institute of Applied Science

CRU use case for TPC

Possibility for FoCal

Mar. 8, 2019, International Workshop on Forward Physics and Forward Calorimeter Upgrade in ALICE

## TPC Upgrade

One of the most important and challenging upgrade in ALICE

- 4 GEM amplification system replaces traditional wire amplification system
- vanish "dead-time" due to ion absorption time
  - 500  $\mu$ s to **zero**  $\rightarrow$  event rate 2 kHz to **no limit**
  - 530k channels, 200 ns sampling ADC data come out





## TPC Upgrade (cont.)

■ LHC will provides above 50 kHz Pb+Pb event rate after upgrade (20 µm average event interval)

- TPC drift time (100 μs)
  - large pile-up
  - average 5
- Continuous (triggerless) data taking
- 3.5 TB/s data rate
  - large data reduction is required





## ALICE readout system after LS2

#### On-detector electronics

- controlled via GBT, sends data via GBT
- front-end electronics needs only GBT duplex fiber interface and power & cooling services

#### Common Readout Unit (CRU)

- common design for all new detectors incl. FoCal
- max. 48 duplex GBT connections
- placed in a PC server (FLP), communicate with CPUs via PCI express bus
- trigger and machine clock distribution is also via GBT
  - CTP sends trigger and fast control to CRU
  - then CRU forwards it to front-end
- detector control is also via GBT
  - DCS system will configure & acquire status from frontend via CRU and GBT



## TPC front-end readout

#### ■ FEC (Front-end Card)

- 5 SAMPAs (32 x 5 = 160 channels)
- 10 bit ADC, 5 MHz operation (8 Gbps)
- SAMPA DSPs not used for TPC (full raw data readout)
- 1 GBTrx: timing and clock reception through CRU
- 2 GBTtx: raw data sending out (4 + 4 Gbps) with GBT wide-bus mode
- 1 GBT-SCA for slow control (SAMPA configuration), GBT configuration, SAMPA power, power measurements
- total FEC: 3276







Mar. 8, 2019

K. Oyama

Mar. 8, 2019 K. Oyama

■ ALICE + LHCb joint project, commonly used in all ALICE detectors except for detectors with special setup

- 48 GBT duplex links → 3.2 Gbps x 48 = 154 Gbps (4.48 Gbps x 48 = 215 Gbps w/o FEC)
  - most of ALICE detector use up to 24 links (except for TRD: 36)
- large Intel/Altera Arria 10 FPGA  $\rightarrow$  data processing O(10) times faster than CPUs (depends on processing)
- Interface to CPU (in the same chases) via commercial PCI Express 3 x 16 lanes  $\rightarrow$  128 Gbps
  - sustainable data rate ~ 90 Gbps



### CRU internal logic development

Mar. 8, 2019 K. Oyama

■ Central CRU team supports all peripheral logic [Grenoble]

Detector CRU teams develop detector specific USER-Logic [TPC: Frankfurt, Heidelberg, Nagasaki-IAS]



### TPC User Logic

8

#### ■ raw data processing

 channel sorting / pedestal subtraction / common mode rejection / clustering /data formatting

#### DCS: forwarding DCS control command & data

 Power / SAMPA & GBT configurations / CRU FPGA setup parameters



## Channel sorting

■ in case of TPC, FEC readout unit is perpendicular to pad direction

- clustering to be performed in pad direction
- large routing matrix needed
- common firmware  $\rightarrow$  need to be configurable after firmware download

use memory inside FPGA



Test implementation (Sebastian Klewin, Dec. 2017) done

• 49% ALM (211k/427k)

Mar. 8, 2019 K. Oyama

Sebastian Klewin, https://indico.cern.ch/event/653116/

Mar. 8, 2019 K. Oyama

## Common mode rejection

- TPC GEM produces large common mode noise (cross talk via capacitive coupling)
- Adaptive filter calculate average value and subtract it from all ADC values sample-by-sample (every 200 ns)

$$O_j = I_j - I_{CM}$$
 ,  $I_{CM} = \frac{\sum I_i}{N_{cont}}$ 

- However large "true" signal bias the common mode value at large occupancy event
- **Solution 1:** reject signaled channels (threshold, rising and falling edge)
  - Always bias the  $I_{CM}$ , especially multiplicity dependence

**Solution 2:** calculate median value with generating histogram in FPGA at 5 MHz sustained speed



### Common mode rejection (cont.)



[peak rejection by Y. Takeuchi]

[median by Y. Matsuyama]

Two solutions are under evaluation for different aspects

- precision and bias (physics) ... median is better
- logic usage ... median uses more logic (under shaping)

## Clustering

small modules continuousy scan to find local maxima

- run on pad direction and time direction
- 8x8 in pad time plane
- overlapping to avoid edge effect
- if it finds peak, forward 5x5 pad time area data to cluster formatter
- calculate cluster parameters
- format data and inject into readout FIFO

further discussion later



Mar. 8, 2019 K. Oyama

#### Sebastian Klewin, https://indico.cern.ch/event/653116/

### Other filters

#### Pedestal subtraction filter

- TPC decided to do NOT subtract pedestal on SAMPA but do that on CRU
- subtracting pedestal  $\rightarrow$  chop negative values (unless we introduce sign flag  $\rightarrow$  data increase)
- with common mode, this problem will be significant
- pedestal value can be represented finer (fixed point number with half and quad LSB bits)



## FoCal PAD readout case

- assuming 64 (or 72?) PAD channels per tower
- tower cross section 2x2 cm<sup>2</sup>, 16 (or 18) layers
- readout (example) by a (modified-)SAMPA
  - larger channel density is ideal
  - two CSA (low & high gain)
    - →128 or 144 ADC, 10 MHz, 12 bits
    - data rate: 144 x 10M x 12 = 17.3 Gbps/tower if we continuously read out

- timing information?
  - $\rightarrow$  additional circuit or higher sampling + fit?
  - $\rightarrow$ higher sampling multiplies the data rate



## Data processing & selection on FEC

■ it is obvious that we need data selection and processing on FEC (factor 10)

- 17.3 Gbps/tower to (preferably) 0.8 Gbps/tower
  - four towers fit in one GBT link (3.2 Gbps)
  - total 625 links, 25 GBT/CRU  $\rightarrow$  25 CRUs ... reasonable

#### possible methods

- triggering (read all with LO)
- zero suppression
  - needs simulation, surely efficient for pp
- high/low auto selection ...  $x1/2 + \delta$
- Huffman encoding (lossless; SAMPA has)?
  - TPC decided to don't use (may lose data at high mult.)
- ? no need to see other tower's data on single FEC?



#### assuming pp L0 rate at 1 MHz

- 144 ADCs, 1 sample, 12 bits  $\rightarrow$  1.7 kbits
  - $\rightarrow$  1.7 Gbps/tower
    - factor 3 missing
    - timing information adds more
      - multisample  $\rightarrow$  4 to 8 times more
- L1, L2 not preferred as it creates deadtime
  - to be discussed with CTP, if "interleaving" foreseen
  - most probably answer is no, because it mixes "two" triggering scheme (new&old)

trigger from CTP

to/from CRU (raw data, DCS)

Mar. 8, 2019 K. Oyama **16** 

## CRU processing in FoCal

- Possibly needed processing in FoCal CRU
  - mapping/sorting
  - pedestal subtraction
  - gain, linearity correction
  - cross talk filtering
  - anything else before clustering?
  - (pre-)clustering
    - finding local maxima
    - pack associated tower information
  - encoding / formatting
- Data compression factor to be estimated by simulation
  - input to CRU: 115 Gbps
  - output to CPU (PCI Express): 128 Gbps
    - not possible to use full bandwidth
    - below 20-30 Gbps is moderate (40 Gbps Ethernet)  $\rightarrow$  factor 5-6 compression is moderate



1 CRU (direct) 25 GBT (10x10 towers)

# Clustering?

- For clustering, we need to eliminate non-fiducial area due to CRU boundary by sharing data between CRUs
  - can be done via GBT (slow)
  - or use SERDES of Arria10 at higher speed (up to 12.5 Gbps)
    - 8 Gbps x 4 + 0.8 Gbps x 4
      = 8 LVDS or optical cords among CRUs
    - counter direction is also used for other direction sharing
  - new development
- This discussion will be completely re-adjusted for the final detector arrangement
  - requirement for data exchange between CRU may stay



(this is assuming shower radius not more than 2 cm) if we need one more tower, then necessary GBT to one CRU becomes 49

## Clustering on CRU FPGA (TPC case)

Mar. 8, 2019 K. Oyama

 $\delta t_i$ 

47

+

18



- corresponds to x-y plane (without time direction) in FoCal
- "division" is done on CPU [see S. Klewin's PhD thesis coming soon]



### Misc. considerations

- If we do processing with FPGA on detector
  - present SAMPA may work?
  - automatic gain selection on FPGA?
  - radiation tolerance?
- where to put?
  - mechanical constraints
  - signal integrity constraints
- triggered readout or trigger-less continuous readout?
  - if with trigger, we need direct trigger feeding from CTP to FEC
  - is present ALICE L0 trigger contributors enough for FoCal physics [both pp and PbPb]?
- do we provide triggers to other detectors [both pp and PbPb]
  - if yes, then maybe a fast formation of trigger signal on or vicinity of detector has to be developed
    - CRU is too late for L0
    - communication among FECs needed

