



# **TELL40 VELO time ordering**

<u>Pablo Vázquez</u>, Jan Buytaert, Karol Hennessy, Marco Gersabeck, Pablo Rodríguez



### Time ordering



- Packet(34b)=ASIC(4b)+SuperpixelAddress(13b)+BCID(9b)+Hitpattern(8b)
- Data packets arrive to the TELL40 disordered in time
- Time ordering = grouping of packets by 9bit BCID value
- Grouping is done in 2 steps:
  - Router based on 4-bits of BCID (MSB or LSB, see later) Goal of this talk
  - − Memory storage → remaining 5-bits BCID



"Standard LHCb"

"VELO"



## 4-bit BCID VELO packet router



- Non blocking scheme using internal memory (FIFO's)
- Different number of input links considered:
  - 16 @160 MHz (320) for a half (full) module as described in the TDR
  - 10 @320 MHz for a full module, when reducing optical links
- 2 architectures under study:
  - "Cascade" of 1-bit router elements. Each element routes packets based on 1-bit
  - "crossbar" : single stage N x 2<sup>M</sup> ports routes packets based on all bits
- Use of altera library megafunctions: lpm\_scfifo, lpm\_compare...



2x2 router element uses 4 fifos while 1x1 router element doesn't use fifos







- Several configurations compiled for the actual device: altera stratix V 5SGXEA7N3F45C2
- Router can run @ 320 MHz with 1-5% of resources with a cross-sectional bandwidth 1.6-2.6 higher than required (2G pakets/s peak for full module)

| In x Out links                        | 16x16 (TDR)      | 10x16 (Optimized) |
|---------------------------------------|------------------|-------------------|
| Fifo depth (words)                    | 512              | 512               |
| Logic utilization (in ALMs)           | 3,582(2%)        | 2,722 (1%)        |
| Total registers                       | 3839             | 4305              |
| Total block memory bits               | 58,920 ( < 1 % ) | 1,274,940 ( 2% )  |
| M20K blocks                           | 128 ( 5 % )      | 84 (3%)           |
| Fmax 85ºC (MHz)                       | 340              | 361               |
| Bandwidth links x 320MHz (G packet/s) | 5.12             | 3.2               |



### Simulation



- Generate simulated input data using the known distribution of the event size and latency (see next slides)
- Aim: the dependency of packet loss vs fifo depth and routing strategy (MSB vs LSB bits)

– Aim: data packet loss << 0.1%?</p>

- Every 34-bit FIFO is allocated on a M20K memory block => optimal is 512 x 40bit. Could be extended
- We will start simulating the 10x16 @320 MHz case



### Event size distribution



 The 20 inputs of a full module are below 40% of peak packet rate capacity





## Simulation latency



Packets arrive at the tell40 with variable latency:

- Routing with the MSB bits generates data rate peaks 10 times higher than routing with LSB bits
  - increases packet loss probability (but maybe still acceptable),
  - but is nicer for MEP: consecutive events are stored in the same memory
  - Maybe better suited for dataprocessing (idle time between peaks)











- TELL40 VELO router block has been investigated
- Compilation of designs based on 1-bit router elements shows that with 1-5% of FPGA resources, the router can handle easily the required bandwidth
- Simulations with self-generated data will show the packet loss (with respect to fifo depth and MSB/LSB routing strategy)





#### backup



10x16 router 1 stage



