Indico celebrates its 20th anniversary! Check our blog post for more information!

CLOCK SWITCHING

Europe/Zurich
Videoconference
CLOCK SWITCHING
Zoom Meeting ID
62759956285
Host
Filippo Costa
Useful links
Join via phone
Zoom URL

ALICE CLOCK SWITCH (BEAM/LOCAL)

We are trying to solve the issue of losing the GBT clock on the FEE coming from the CRU or the LTU when the machine decides to switch CLOCK.

The current situation is that after a CLOCK SWITCH the connection to the FEE is lost for a short period of time, side effects are:

- FEE loses the clock, takes time to recover

- Some GBT chips don't recover automatically the clock and the FEE requires power-cycle. (For this a possible solution is to use the RESET signal in the GBT-FPGA implementation, as suggested by the GBT expert in the electronic group)

We asked to LHCb colleague what is their strategy, this was the feedback

In our LHCb architecture, we have GBTs everywhere as FE master clock links connected to our timing distribution system via VTRxs and then data GBTs connected to VTTx. And we observed these behaviors: 1- the GBT loses lock and about 10% of them do not recover (even if the watchdog is enabled). 2- about 20% of the GBT who are fully fused, go into a deadlock FSM state that cannot be recovered unless a powercycle is issued. 3- even if the GBT recovers the lock, the data GBTs loses lock and on average about 1% of them end up in a similar deadlock FSM state that cannot be recovered unless a powercycle is issued. 4- sometimes the loss of lock, provokes the data GBTs to send garbage through the link all the way to the readout cards and then the GBT core in FPGA on the readout cards cannot recover the stream. This is a firmware issue and there is little to do, apart reconfigure the transceivers at the receiving end of the readout cards.

Apart keeping the clock always in EXTERNAL, if I really need to switch clock or to remove the clock completely, we have currently found two mitigations strategies:

1- disable all lights from the transmitting lasers to avoid corrupting the data stream at the GBT. When light returns, the GBT locks happily again and no configuration is lost. The goal here is simply to avoid sending garbage down the link that may corrupt the content of the configuration registers.

2- recalibrate all the transceiver after a reset of the FE electronics. This is to recover the receiving electronics and make sure the receiving part is well aligned.

The goal in ALICE is a bit different as we don't want to lose the clock in the FEE at all. We can't keep the clock in BEAM always as this is not stable outside the STABL:E BEAM period.

There is the possibility to use hitless swtiching, but this should be implemented at the clock source, meaning after or before the board RF2TTC as described in the document of CTP.

(Technical details to be added)

The trigger team will prepare some test in the lab to verify the proposed solution.

We need also to understand in which stages of the beam preparation we need to enable/disable the holdover feature.

There are minutes attached to this event. Show them.
    • 14:00 14:20
      CTP CRU clock distribution 20m
      Speakers: Filippo Costa (CERN), Marian Krivda (University of Birmingham (GB)), Olivier Bourrion (Centre National de la Recherche Scientifique (FR))

      CTP

       

      from yesterday's experience we see we cannot rely on BEAM1 clock outside of BEAMS.

      The problems we have, as I understand, are coming from the time when ALICE switches between LHC clock (BEAM1) and LOCAL ( generator on Rf2TTC board) - not seen in TTCPON monitoring, but GBT connections (either directly from LTU or CRU-FE) are sensitive.

      I would like clarify a situation and make a proposal for a possible improvements from CTP side.

      1. The clock switch LOCAL –> BC1 -> LOCAL is made by RF2TTC board inside TTCmi VME crate/interface
      2. CTP/LTUs don`t see any PLL unlock (CTP\LTU boards have PLL SI5345-D)
      3. LTU GBT links (we use only downstream i.e. from LTU to detector FEE) don`t see PLL unlock
      4. CTP GBT links (we use them for CTP readout, bidir communication with CRU) – not tested yet as CTP readout is still under development
      5. CRU GBT links – some of them see PLL unlock (I guess Loop-back bandwidth is smaller than we set for Si5345-D -> 1.09 kHz)  

         

      My proposal is to test “holdover” functionality of CTP\LTU PLL Si5345-D i.e. instead the clock switch on RF2TTC board we can enable “holdover” for all PLLs on CTP/LTUs.

      In such way we freeze the last frequency on the output of Si5345-D.

      For a disable of “holdover” we would need to make sure that a current “BC1” frequency is very close to the holdover frequency i.e. within Loop-back bandwidth of GBT PLL.

       

      CTP team never tested enable/disable “holdover” for Si5345-D, so we would appreciate a help from experts for the SI5345-D.

    • 14:20 14:40