

# LHC BLM System readiness Summary of FW & SW changes

Machine Protection Panel

Mathieu Saccani (SY-BI-BL) on behalf of the BLM team 10/06/2022



#### **BLM Checklist**



10/06/2022

## **BLM LHC recent issues**

- 1. VME Power supply failure on 12V (unused) in pt1
  - → Equipment replaced
- 2. Temperature issue (water cooling problem) in pt2 triggering optical link error interlocks.
  - → The alarms were not enabled, now all active
  - → CV has turned on the water flow to maximum
  - → Multiple cards have been exchanged to better resist higher temperature
  - → Need to change the temperature thresholds  $5^{\circ}$ C lower (30→ $\sim$ 25°C)
- Weak optical links could trigger interlock if both redundant fail at the same time
  → Replaced 11 BLETC at surface and a few BLECF in the tunnel (preventive maintenance)
- 4. Sanity checks issue blocking OP before injection:
  - → Workaround: always play the whole sequence not just a subset
  - → Issue in the VMW slave core: needs a BLECS FW upgrade (scheduled for YETS)



# **Blindable channels (inhibit at injection)**

- Now present in <u>all crates</u>
- Disabled by default
- Acts only on maskable channels
- <u>Programable timer</u> per crate from injection warning (from BST + delay to be at injection)
- Inhibit the interlock output to BIC only (all running sums still active, all dump requests are logged)

### Goals:

- 1. <u>Test the feature</u> with 12 bunches next week and <u>measure</u> the blind time needed
- 2. Select a first <u>set of channels</u> to blind (+adjust monitor factor)



Injection Interlock Inhibit FW Implementation



### **BLM beam test principle**

Two tests performed in parallel on the 18/05/2022 by OP:

- Test 1: Interlock request functionality of the BLM crates
  - Procedure written by BL and played by OP
  - Aim to trigger as many BLM crates as possible
  - 1 collimators/beam closed initially and opened using threading sequence.
  - Injection of pilot bunch test at injection

### Test 2: Interlock request system latency

- Latency must be less than 3 LHC turns (89 μs each)
- Post-analysis performed by BI-BL from NXCALS (automatized with a Python script)



Possible to perform both tests in parallel

Latency calculated for each triggered crate

Procedure in EDMS



### **BLM beam test result**



Example: Triggering of B1 Dump in IP5 at TCTPV

- Selection of collimator orientation arbitrary
- Same collimator type per point for B1 and B2
- Most dumps from BLM central crate, maskable channels (OK)
- BLM latency below 3 LHC turns OK





# FW & SW changes summary

### FW

On the 4 optical links reception chains on the surface processing board:

- 1. Add input delay constraints for all data lines
- 2. Improve the clock domain crossing mechanism

All the rest of the HDL code remains the same (as v1.1.7).

This new firmware v1.2.0 is deployed and tested with beam.

### SW

To avoid losing CTIM events (XPOC and PM missing data) because of CPU high activity:

- 1. CTRP IRQ priority increase
- 2. FESA RT thread priority rescaled
- 3. In the future the CPU upgrade would give more margins (profiling & statistics under development)





### More details regarding FW & SW changes





MPP - LHC BLM System Readiness

Annex

# **FW Changes**

### **Optical link reception improvement**



10/06/2022

### **LHC BLM Architecture**

• 2 redundant optical links from the tunnel to the surface electronics





## **Optical links Architecture**

- FW update in the BLETC (Threshold comparator) FPGA
- Only in the RCC (Receive, Check and Compare) block





# **Optical Link Data Reception**

- Redundant link
- Protected by CRC32
- Frame ID counter

| CID (card identity number)  |           |         |         |  |  |
|-----------------------------|-----------|---------|---------|--|--|
|                             |           |         |         |  |  |
| STATUS 1                    |           |         |         |  |  |
|                             | STATUS 2  |         |         |  |  |
| Cou                         | Count 1   |         | ADC 1   |  |  |
| ADC 1                       | Cou       | int 2   | ADC 2   |  |  |
| ADC 2                       |           | Count 3 |         |  |  |
| ADC 3                       |           |         | Count 4 |  |  |
| Count 4                     |           | ADC 4   |         |  |  |
| Coi                         | Count 5 A |         | C 5     |  |  |
| ADC 5                       | Cou       | int 6   | ADC 6   |  |  |
| AD                          | ADC 6     |         | Count 7 |  |  |
| ADC 7                       |           | Count 8 |         |  |  |
| Count 8                     |           | ADC8    |         |  |  |
| FID (frame identity number) |           |         |         |  |  |
| DAC1                        |           | DAC2    |         |  |  |
| DAC3                        |           | DAC4    |         |  |  |
| DAC5                        |           | DAC6    |         |  |  |
| DAC7                        |           | DAC8    |         |  |  |
| CRC                         |           |         |         |  |  |
| CRC                         |           |         |         |  |  |
| 10.00 640000                |           |         |         |  |  |

40us frame (20\*16bits word)

#### TLK1501 deserializer

- 40MHz clock
- 2 control bits: StartOfFrame (SOF) + RestOfFrame (ROF)
- 16 data bits



Simulation example

- Each of the four TLK1501 generates its own 40MHz
- The TLK1501-FPGA lines not skew compensated (neither on the carrier nor on the mezzanine v4.0)



# **Optical Link Data Reception**

- Most of the installed mezzanines are v4.0
- Skew compensation introduced from mezzanine v5.0



Lines length for link 1 CFC-B



Mezzanine v4.0 layout

Solution compatible

#### Skew now partially compensated in the FPGA (input delay)

| set_instance_assignment -name stratix_decrease_input_delay_to_internal_cells -to PIM_IO[11] on<br>set_instance_assignment -name decrease_input_delay_to_input_register -to PIM_IO[17] off<br>set_instance_assignment -name decrease_input_delay_to_input_register -to PIM_IO[18] off | with v5.0               |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------|
| Timing Analysis constraints added per individual lines (set_input delay)                                                                                                                                                                                                             |                         |
| # min delay: Th (from TCLK docs) + data trace delay - clock delay                                                                                                                                                                                                                    |                         |
| <pre>set_input_delay -clock_fall -clock {virt_cfcB_lkl_clk_40} -min [expr \$TLK_TCO_min + \$cfcB_lkl_ctrl_0_delay - \$cfcB_lkl_clk_delay]  </pre>                                                                                                                                    | [get_ports {PIM_IO[1]}] |
| set_input_delay -clock_fall -clock {virt_cfcB_lk1_clk_40} -min [expr \$TLK_TCO_min + \$cfcB_lk1_ctrl_1_delay - \$cfcB_lk1_clk_delay]                                                                                                                                                 | [get_ports {PIM_IO[2]}] |
| set_input_delay -clock_fall -clock {virt_cfcB_lk1_clk_40} -min [expr \$TLK_TCO_min + \$cfcB_lk1_data_0_delay - \$cfcB_lk1_clk_delay]                                                                                                                                                 | [get_ports {PIM_IO[3]}] |
| set_input_delay -clock_fall -clock {virt_cfcB_lk1_clk_40} -min [expr \$TLK_TCO_min + \$cfcB_lk1_data_1_delay - \$cfcB_lk1_clk_delay]                                                                                                                                                 | [get_ports {PIM_IO[4]}] |



# Link 1B:

## **Data reception in HDL**

- One clock domain per link for the frame detection and CRC check
- One BRAM per link for clock domain crossing
- A common error check and selection per redunded link
- Main modification => use synchronous SOF
  - Avoid asynchronous reset of counters in the main 40MHz domain
  - Reduces the number of resynchronization registers (anti-metastability)
  - Tested in simulation, in the lab for several days, then deployed in LHC on 17/05/2022





Annex

# **SW Changes**

### FESA + OS IRQ management



10/06/2022

MPP - LHC BLM System Readiness

# **Issue Description**

### Issue:

Millisecond event (PM and XPOC) missing from time to time in some random BLM crates. <u>JIRA-TIMING-4011</u> Other systems (SY-EPC-CCS) seemed to face kind of the same problem <u>JIRA-TIMING-4027</u>

### Root cause:

- The CTR loses some CTIM events (separated by 125us) when the CPU activity is high even if the CTR event queue is not full.
- Was reproduced by BE-CEM&CSS
- No notification of interrupt loss in CTR driver (gateware problem?)
- The issue appeared after the migration to FESA3, where priorities are defined on a range [0-100] and no more by category/offset (LOW|NORMAL|HIGH / -2|-1|0|+1|+2)

Special thanks to Marine Gourber-Pace, Michel Arruat, Frederic William Hoguin & Stephane Deghaye



# **Workaround 1: Priorities**

#### Increase the priority of the CTRP IRQ kernel thread

- JIRA-BIBML-2373
- This solution was implemented on the 01/06/2022
- It seems to solve the problem for now
- Need to be confirmed by N. Magnin (SY-ABT-BTC) with his XPOC logs

### **Implemented solution:**

- 1. CTRP IRQ increase to 88 (instead of 87)
- 2. FESA RT thread priority rescaled in range [0:25] (instead of [0:70]), but the FESA precedence remains (same behavior)

|                     |                  | BLMLHC_DU_M | 3006 | <del>70</del> 25 |
|---------------------|------------------|-------------|------|------------------|
|                     |                  | BLMLHC_DU_M | 3015 | 5                |
|                     |                  | BLMLHC_DU_M | 3016 | 1                |
| os -eLo comm,rtprio | gren ira         | BLMLHC_DU_M | 3017 | 7                |
| irq/9-acpi          | 87               | BLMLHC_DU_M | 3018 | 10               |
| irq/23-ehci hcd     | 87               | BLMLHC_DU_M | 3019 | 9                |
| irq/23-uhci hcd     | 87               | BLMLHC_DU_M | 3020 | 6                |
| irq/8-rtc0          | 87               | BLMLHC_DU_M | 3021 | 8                |
| irq/28-eth0         | 87               | BLMLHC_DU_M | 3022 | 5                |
| .rq/19-i801_smb     | 87               | BLMLHC_DU_M | 3023 | 8                |
| irq/14-ata piix     | 87               | BLMLHC_DU_M | 3024 | 11               |
| irq/15-ata piix     | 87               | BLMLHC_DU_M | 3032 | 10               |
| irq/16-serial       | 87               | BLMLHC_DU_M | 3033 | 1                |
| irq/17-vme brid     | 87               | BLMLHC_DU_M | 3034 | 6                |
| irq/16-ctrp.02:     | <del>87</del> 88 |             |      |                  |

#### CTR IRQ priority increased

FESA RT threads rescaled

ps -eLo comm,tid,rtprio

2990

2993

2994

3005

BLMLHC DU M

BLMLHC DU M

BLMLHC DU M

BLMLHC DU M

#### Special thanks to Stephen Jackson



grep BLMLHC DU M

70 25

70 25

1 <del>70</del> 25

# Workaround 2: CPU upgrade

#### Move the LHC BLM systems to MenA25

#### The CPU upgrade would drastically reduce the CPU activity

- 4 cores instead of 2
- More RAM
- Faster MBLT VME data throughput
- → A continuous profiling on operational A20s is developed and will be compared to A25 in the lab.

#### The upgrade of BLM CPUs could be done during the next YETS

- The new CPU behaviour will be first fully characterised in the lab
- The exchange can be done quite quickly and easily
- Will give more margins in processing time
- Will ease the future system maintenance and upgrades
- 30 CPUs (27 BLM FECs in LHC + 2 lab crates + 1 spare for piquet)







# Thank you for your attention! Questions?



