- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
We are trying to solve the issue of losing the GBT clock on the FEE coming from the CRU or the LTU when the machine decides to switch CLOCK.
The current situation is that after a CLOCK SWITCH the connection to the FEE is lost for a short period of time, side effects are:
- FEE loses the clock, takes time to recover
- Some GBT chips don't recover automatically the clock and the FEE requires power-cycle. (For this a possible solution is to use the RESET signal in the GBT-FPGA implementation, as suggested by the GBT expert in the electronic group)
We asked to LHCb colleague what is their strategy, this was the feedback
In our LHCb architecture, we have GBTs everywhere as FE master clock links connected to our timing distribution system via VTRxs and then data GBTs connected to VTTx. And we observed these behaviors: 1- the GBT loses lock and about 10% of them do not recover (even if the watchdog is enabled). 2- about 20% of the GBT who are fully fused, go into a deadlock FSM state that cannot be recovered unless a powercycle is issued. 3- even if the GBT recovers the lock, the data GBTs loses lock and on average about 1% of them end up in a similar deadlock FSM state that cannot be recovered unless a powercycle is issued. 4- sometimes the loss of lock, provokes the data GBTs to send garbage through the link all the way to the readout cards and then the GBT core in FPGA on the readout cards cannot recover the stream. This is a firmware issue and there is little to do, apart reconfigure the transceivers at the receiving end of the readout cards.
Apart keeping the clock always in EXTERNAL, if I really need to switch clock or to remove the clock completely, we have currently found two mitigations strategies:
1- disable all lights from the transmitting lasers to avoid corrupting the data stream at the GBT. When light returns, the GBT locks happily again and no configuration is lost. The goal here is simply to avoid sending garbage down the link that may corrupt the content of the configuration registers.
2- recalibrate all the transceiver after a reset of the FE electronics. This is to recover the receiving electronics and make sure the receiving part is well aligned.
The goal in ALICE is a bit different as we don't want to lose the clock in the FEE at all. We can't keep the clock in BEAM always as this is not stable outside the STABL:E BEAM period.
There is the possibility to use hitless swtiching, but this should be implemented at the clock source, meaning after or before the board RF2TTC as described in the document of CTP.
(Technical details to be added)
The trigger team will prepare some test in the lab to verify the proposed solution.
We need also to understand in which stages of the beam preparation we need to enable/disable the holdover feature.