600 contactors controls board reliability study

Europe/Zurich
CERN

CERN

    • 16:00 17:00
      Progress meeting #1 1h

      Results of the first iteration of the failure rate prediction, discussion of critical paths and most concerning aspects for reliability

      Speaker: Milosz Robert Blaszkiewicz (CERN)

      Present: M. Blaszkiewcz, D. Carrillo, S. Georgakakis, L. Feslberger.

      The presentation started by covering the sources of failure data used in the modelling process. S mentioned the idea of setting up a failure database to keep track of the component failures, where measurements could be taken on recurring basis in the future.

      The next point covered the base assumptions made in the models. The operating temperature proves to be quite elusive, as generally in some cases can be as low as 20, but in some others - 45 and even more, depending on heating from nearby components. D expressed surprise at the fact that it is an important factor for reliability and not only ageing of components. 

      D also highlighted the impact of the stops, such as Technical Stops, on capacitors: lack of powering leads to their deterioration.

      Next important aspect is radiation. The models estimate 25 Gy/year. S suggested seeking synergies with other teams dealing with the subject. He specifically listed impact on transistors energizing/deenergizing, less current, etc. L proposed to check the critical path components specifically from this angle. D further stressed that 1, 2 years may lead to degradation, but shorts of power supply can happen already from the 1st day. A contact person from section dealing with reliability in radiation has been suggested.

      The deadline for the study became more relaxed: S suggested that changes will be possible to make until the CHARM tests in April. 

      L proposed a general plan: 
      1. Standard FMECA pipeline
      2. Confirming critical path
      3. Digging into CP components in terms of radiation

      In parallel: top-level model.

      S assured that there will be testing of the board just as there is of Universal Control crate with a test bench. 1st level testing will concern electrical loads, 2nd level more functional testing (e.g., delay times). 

      Operating voltages for FET is -5V to 5V, for relay 5V and the other line is 24V. There is change compared to the drawing in S’s presentation: mains are integrated into boards, so they are also redundant. 

      End-effects:
      - Missed opening
      - Spurious opening
      - Maintenance (may be non-existent in this case)
      - No effects

      Lifetime aspects may not be necessary as components were chosen with them in mind. Also, the choice of components was usually in favour of ones used in EPC with proven operational value. 

    • 16:00 17:20
      Progress meeting 1h 20m

      Discussions of parameter change failure modes, failure rate prediction assumptions and duration of switch openings. We also attempted to comprehensively characterise connections between individual cards in EEUC up to the CCB ones.

      Present

      M. Blaszkiewicz, L. Felsberger, S. Georgakakis, D. Westermann

      Agenda

      The meeting covered several aspects which were encountered in the process of preparing FMECA.

      Connections between CCB and UCEE (mainly the controls and driver cards). 

      • There will be 3 contactor boards, each corresponding to one switch (A, B or Z).
        • Each will receive signals from both driver cards of the EE system (as DRV_1 and DRV_2).
      • Driver cards has a lot of channels, but it will be using only 2 - in this case only the ones with a transistor. 
      • Paths crossings may hide blind failures of individual paths:
        • Spyros stated that uf a failure comes from the PS of the EE’s driver - it will be seen.
        • Also, the depending on the configuration of outgoing connections of the driver card, it is possible that signal T1 of CH1 signal would go to the CCB, while T2 of CH1 would go to the interlock of another board, ensuring that more of faults would be visible.
        • The blind failures will be also seen in the feedback of the CCB, on the level of an individual channel of the CCB.
      • Syros will try to make a higher-level diagram of how things are connected between boards.  

      Parameter change failure modes of components

        • Capacitors are immune to up to 80-90% gain changes
        • Degradation for resistors "do not count". There is no environment for this to happen - and also other things will fail earlier.
          • Even 20% change wouldn’t affect the outcome (but would affect speed). 
        • Transistors - more of a problem, but we need boundaries of a given parameter change, as, for instance, when a component stops operating.
        • It was also stated that parameter change will occur only during radiation. 
        • Solution for the models: we will try 2 modes - change as no effects and change as worst-case scenario.
          • Spyros suggests using an equation to set the parameter change more detailed.

      Failure rate prediction assumptions

      • D3, D5, D17 - can those be considered as 4 diodes? Should we add some factor for one component (single points of failure)?
        • Yes, can be understood as 4 diodes.
      • Power transistors - differentiation from other types of transistors (the 217Plus standard uses operating/rated voltage, but it is possible to define a custom temperature rise).
        • Temp. rise is for small amount of time only. Also, there is only one power transistor in the design.
        • To be treated like other transistors.

      Typical opening time of a switch 

        • Previously the time of opening the switch was defined partially due to circuit electromechanical breakers that were not supposed to be switched too much. It is no longer the case with new circuit breakers.
        • However, it is difficult to determine for how long they will stay open in the future. Might be that there will be higher cycling rate than previously. 
        • Mirko Pojer might know what is planned, as it might change from previous experience.

      Fail-safe mechanisms and redundancy

      Spyros asked if for a fail-safe elements (e.g., a fail-safe relay), would redundancy of parts such as power supplies increase reliability. It was discussed that indeed, there would be no increase of reliability, and it would be possible to increase only availability.

      217Plus questionnaire

      Results of the questionnaire exercise led to a reduction of 200 FITS. We still have to figure out infant mortality and environmental factor. Environmental factor is not yet established due to lack of knowledge about the vibration factor.

      Next steps & follow up

      1. Check with Mirko for how long the switches will remain open.
      2. Create 2 prediction results: with parameter change with no effects and one with worst-case assumptions.
      3. Spyros: preparing a high-level diagram of connections between EEUC and CCB.
    • 14:00 15:00
      Progress meeting 1h

      Discussion of the tentative results of the EEUC and CCB studies. Combined model, revision of assumptions, presentation at the TSLM

      Speaker: Milosz Robert Blaszkiewicz (CERN)