

## CMS upgrade – DTTF inputs Asynchronous Links

J.Ero, C.Foudas, N.Loukas, N.Manthos, I.Papadopoulos S.Sotiropoulos



#### Outline



- Drift Tube Track Finder overview
- Trigger Inputs needs
- Optical links
  - Synchronous to the LHC clock
  - Self-synchronous to the LHC clock
  - Asynchonous to the LHC clock
- Asynchronous Protocol
- Results
- Future plans



#### Drift Tube Track Finder overview



- At the CMS TDR for the L1 trigger upgrade (1.Aug.2013) have been decided that the DTTF will be upgraded as follows
  - 12 VME Crates will be replace by 3 uTCA
  - Old trigger cards will be replaced by new which hosts the state of art of Xilinx FPGAs (Imperial MP7)
  - The input bandwidth will be increased by 6 from 1.6Gb/s of GOL links to 9.6Gb/s
  - At the first state of the DTTF upgrade until the and of 2015 the same algorithms will run in the new FPGA framework
  - At the second state RPC hits will be included in the track finding algorithms





#### Trigger input needs



- Drift Tube trigger data will run on 9.6Gb/s (192bits@40MHz).
- Hence two DT links are enough for one Sector (384 bits).
- Hence 10 DT links will be used for one full wedge
- The new sector collector (TwinMux) will fan-out data from neighbor wedges
- Hence every DTTF processor will receive data from 3 wedges (30 DT links at 9.6Gb/s)
- After 2015 RPC data will be also used by 23 links at 9.6Gb/s

| Data                                                                                                                            | winth (bits) | - Drift Tube da                                       |
|---------------------------------------------------------------------------------------------------------------------------------|--------------|-------------------------------------------------------|
| φ <sub>r</sub> up track                                                                                                         | 12           | signed, 1-compl.                                      |
| φ <sub>b</sub> up track                                                                                                         | 10           | signed, 1-compl.                                      |
| Quality up track                                                                                                                | 3            | see table 3.4                                         |
| 1 <sup>st</sup> /2 <sup>nd</sup> up track<br>φ <sub>r</sub> down track 80 bi<br>φ <sub>b</sub> down track<br>Quality down One S |              | signed, 1-compl.<br>signed, 1-compl.<br>see table 3.4 |
| 1 <sup>st</sup> /2 <sup>nd</sup> down-track<br>θ triggers                                                                       | 8            |                                                       |
| $\theta$ quality                                                                                                                | 8            | H/L for each trigger bit                              |
| Bunch Crossing 0                                                                                                                | 3            | From up, down and $\theta$                            |
| Bunch Crossing count                                                                                                            | 2            | 2 LSbits of the bunch counter                         |
| Parity \( \phi \) data                                                                                                          | 2            | Up and down                                           |
| CCB info                                                                                                                        | 4            | Minicrate Control Board status                        |
| Trigger output                                                                                                                  | 1            | Chamber autotrigger                                   |







#### Optical links



#### Some of the most suitable rates in CERN experiments

- Synchronous all suitable with the 40 MHz
  - 1.6Gb/s(GOL), 3.2Gb/s(Spartan6-Virtex5), 4.8Gb/s(Virtex6), 6.4Gb/s(all 7series), 9.6Gb/s(only Virtex7 with GTH), 11.2Gb/s(7series with speed grade 3)
- Self-synchronous clock domains bad for the 40 MHz
  - 3.125Gb/s(Spartan6-Virtex5), 5Gb/s(Virtex6), 8Gb/s and 10Gb/s (All 7series FPGAs) 12Gb/s(7series with speed grade 3)
- Asynchronous (channel rate with a lower payload)
  - In the past rates of ~ 2 Gb/s have been used in CMS
  - 10Gb/s with a payload of 9.6Gb/s (All 7series FPGAs) Done!
  - 12Gb/s with a payload of 11.2Gb/s (7series with speed grade 3)



#### Loopback tests







## Loopback tests for Kintex7 (6.4Gb/s, 64 bit width on 80MHz)



64 bit bus @ 80 MHz Channel latency 20 clock cycles 12.5ns (250ns)





#### Synchronous links









- Implementation of 10 Gb/s links for Kintex-7 and Virtex-7 FPGAs
- Tx, Rx buffers bypassed
- One common clock source
- 32bit bus at 250 MHz
- <u>2 Days running</u>, no data error

- Synchronous links at 6.4Gb/s have been used in order to test DTTF algorithms last year
- 32bit bus at 160MHz
- 128bit bus at 40MHz
- PHTF need 110 bits



#### Self - synchronous links



- We cannot just use one oscillator in each card because they wont have exactly the same frequency (best case ±50ppm).
- The LHC clock is not always stable to drive transceivers running at 10Gb/s.
- To avoid losing the links we need to drive the QPLLs of the GTX with very good Jitter Performance clocks (about ~1ps RMS)
- There is a need for a common clock source.



- In self synchronous communication we are using a local oscillator which drives the Transmitter of the first board.
- The receiver of the second board recovers the clock from serial data.
- The clock has to be cleaned in order to driver the QPLLs.



#### Self - synchronous links



- The receiver starts from a local oscillator and when detects a CDR lock switch to the recovered clock.
- The clock from CDR goes to a jitter cleaner module. The Si5324 is fully configurable by a vhdl package.
- A bash script generates code with registers values to be written to the Jitter cleaner device.
- An I2C interface does the rest.
- The recovered and cleaned clock can drive 12 receivers without errors
- As a result the receiver FPGA is synchronous the transmitter.



```
-- This Package declares the values registers to be written to the Si5324 jitter cleaner
       This file has been AUTOGENERATED from the package builder.sh (on Linux) - do not hand edit
       First you have to use DSPLLsim tool inserting the preferred clock configuration.
       Then save the Register Map File (txt) and copy it to the this directory.
       Run the script. Done. Now you have the package. Just import it the your ISE project.
    -- Nikitas.LOUKAS@cern.ch, November 2013
   library ieee:
   use ieee.std logic 1164.all;
    use ieee.std logic unsigned.all;
    use ieee.numeric std.all;
    package register pkg is
       constant fsm cycles : integer;
       function reg data(signal j : integer range 0 to (fsm cycles-1)) return std logic vector;
       function reg addr(signal j : integer range 0 to (fsm cycles-1)) return std logic vector;
25
    package body register pkg is
28
       constant fsm cycles : integer := 43;
29
       function reg data(signal j : integer range 0 to (fsm cycles-1)) return std logic vector is
          variable byte : std logic vector(7 downto 0);
             when 0
                         => byte := X"14";
                         => byte := X"E4";
```



#### Asynchronous links



- In asynchronous communication the GTX and fiber runs in a differenced speed than the rest FPGA logic.
- For instance the GTX runs at 10Gb/s while the FPGA is processing at 9.6Gb/s.
- As in case of the self-synchronous the receiver use the CDR to get synchronized with the transmitter
- Every FPGA use additional logic to merge deferent clock domains



 RX and TX elastic FIFOs are used and they implement a padding method





#### 9.6Gb/s Asynchronous Protocol







#### Simulation results





Default.wcfg\*

×

Sim Time: 1,000,000 ps



#### Real world – The setup





- Both the cards send and receives one optical link at 9,6Gb/s
- They use the on board oscillators to drive the GTX and a common 40 MHz source (represent the LHC clock)
- The design use on board jitter cleaner which "clean" the recovered clock
- IPbus has been implemented to spy data that have been received
- Oscilloscope shows commas that have been received

# CMS proposition of the control of th

#### Testing the Async- links



domain





# How the logic works Why to choose asynchronous links



# All the complexity is in the TX side

| 9.6      | 9.6Gb/s>10Gb/s |        |  |  |  |
|----------|----------------|--------|--|--|--|
|          | 240MHz         | 250MHz |  |  |  |
| 1        | 4,2            | 4,0    |  |  |  |
| 2        | 8,3            | 8,0    |  |  |  |
| 3        | 12,5           | 12,0   |  |  |  |
| 4        | 16,7           | 16,0   |  |  |  |
| 5        | 20,8           | 20,0   |  |  |  |
| 6<br>7   | 25,0           | 24,0   |  |  |  |
| 7        | 29,2           | 28,0   |  |  |  |
| 8        | 33,3           | 32,0   |  |  |  |
| 9        | 37,5           | 36,0   |  |  |  |
| 10       | 41,7           | 40,0   |  |  |  |
| 11       | 45,8           | 44,0   |  |  |  |
| 12       | 50,0           | 48,0   |  |  |  |
| 13       | 54,2           | 52,0   |  |  |  |
| 14       | 58,3           | 56,0   |  |  |  |
| 15       | 62,5           | 60,0   |  |  |  |
| 16       | 66,7           | 64,0   |  |  |  |
| 17       | 70,8           | 68,0   |  |  |  |
| 18       | 75,0           | 72,0   |  |  |  |
| 19       | 79,2           | 76,0   |  |  |  |
| 20       | 83,3           | 80,0   |  |  |  |
| 21       | 87,5           | 84,0   |  |  |  |
| 22       | 91,7           | 88,0   |  |  |  |
| 23       | 95,8           | 92,0   |  |  |  |
| 24<br>25 | 100,0          | 96,0   |  |  |  |
| 25       | PAD            | 100,0  |  |  |  |

- Asynchronous links has the advantage that it will keep running even if the LHC clock completely collapse
- To avoid delays of startup GTX after TTC problems
- The idle characters are used for padding and contain K characters
- The disadvantage is that the elastic FIFOs need to use additional resources
- Latency has to be checked!

Asynchronal example with speed for 3series

| 11.2Gb/s>12Gb/s |        |        |  |
|-----------------|--------|--------|--|
|                 | 280MHz | 300MHz |  |
| 1               | 3,6    | 3,3    |  |
| 2               | 7,1    | 6,7    |  |
| 3               | 10,7   | 10,0   |  |
| 4               | 14,3   | 13,3   |  |
| 5               | 17,9   | 16,7   |  |
| 6               | 21,4   | 20,0   |  |
| 7               | 25,0   | 23,3   |  |
| 8               | 28,6   | 26,7   |  |
| 9               | 32,1   | 30,0   |  |
| 10              | 35,7   | 33,3   |  |
| 11              | 39,3   | 36,7   |  |
| 12              | 42,9   | 40,0   |  |
| 13              | 46,4   | 43,3   |  |
| 14              | 50,0   | 46,7   |  |
| 15              | PAD    | 50,0   |  |
|                 |        |        |  |



#### Future plans







## The end

I thank Greg Iles for the help given during the past year