Follow-up meeting of Availability Modelling Workshop II

30-5-039 (CERN)



Second follow-up meeting of the Availability Modelling Workshop, held at CERN on the 7th July.

Participants: A. Apollonio (chair), A. Fernandez, S. Hurst, M. Jonker, A. Niemi, B. Pucio, O. Rey Orozco, R. Schmidt, V. Schramm and J. Uythoven (chair).

Excused: M. Blumenschein, P. Van Trappen and W. Vigano.


Linac4 Failure Catalogue and Availability model (O. Rey Orozco)
Based on the previous work by A. Apollonio, the Linac4 Failure Catalogue is being updated with the help of systems experts. O. Rey Orozco showed the last update.

Once all systems are updated, J. Uythoven suggested to send around the Failure Catalogue so people can share opinions and give feedback. M. Jonker asked if there is a contact person per system. O. Rey Orozco answered that there are contact people assigned to specific systems but not for all. A. Apollonio mentioned that currently they are waiting for feedback from the RF system experts.

O. Rey Orozco explained the assumptions taken for the Linac4 availability model. The model follows the structure of the Failure Catalogue, no operational phases are considered, the “lifetime” is set to one year of operation and only corrective maintenance is assumed. R. Schmidt considers the term lifetime misleading when referring to simulation time.

Within the model, one reliability block models a failure mode of a group of identical components. This methodology avoids the definition of large number of identical components, but it has a small impact on the simulation results. In order to quantify this, O. Rey Orozco presented the example of 14 identical modulators present in Linac4. The difference in results for the two models can be explained because simultaneous failures are only taken into account in the case of 14 distinct systems. Analytical proof to be followed up.

The example shown brought out questions about the failure modes of the modulator. R. Schmidt mentioned that modulators are important parts of accelerators and that it would be a good idea to understand better its functioning, repair and replacement policies. A. Apollonio commented that only the powering part has been considered and that in general, all the RF equipment will be updated as soon as feedback is received from the RF contact person. J. Uythoven highlighted that the modulators failure modes, including the real powering part, and repair strategies should be followed up together with experts.

O. Rey Orozco presented some pictures of the Availability models build in Isograph and ELMAS. R. Schmidt showed concern about the fact that a single ion pump would lead to a Linac4 failure. In his opinion, there might be some kind of redundancy or tolerance in the ion pumps. This will be followed up. The Water Cooling local failure models are not considered at the moment.

O. Rey Orozco showed the excel file which contains the Linac4 failure catalogue together with the Isograph and ELMAS input files/tables. The input files are automatically updated every time a change is done into the Failure Catalogue failure and repair times. The structure/block diagram of the system is not yet updated within the input files. O. Rey Orozco explained that she learned how to define the system structure in input text files for ELMAS, but not jet for Isograph. Once both input files formats are learned, it will be easier to define a common input format. A. Apollonio suggested also to understand how planned maintenance is modelled in ELMAS.

The preliminary results obtained from the Isograph and ELMAS models were presented. Given that the data used as input for the models is being updated and some inputs were just realistic guesses for running the first simulations, the Linac4 availability is estimated to be 80%, which is rather low. The results obtained from each of the simulation packages are almost the same, apart from the simulation time which is much higher in Isograph (around 25 mins) than in ELMAS (around 3 minutes). J. Uythoven wondered why the simulation times differ that much. O. Rey Orozco commented that the similarity in the results is a good starting point to understand the functionalities and simulation engine of each of the software packages. A. Apollonio proposed to add a little complication to the model, the ion pumps redundancy scheme, and see if the results are still the same. M. Jonker commented that this is actually the approach that is being taken, starting from a simple example and gradually making it more and more complex.

Together with the results, the major contributors to the Linac4 downtime were presented. The RF cavities together with the Vacuum system are the ones contributing most to the system downtime. J. Uythoven realized that some of the failure modes considered for the RF cavities are downtime/optimization times more than failures, leading to wrong results. Failure modes and optimization times must be differentiated. R. Schmidt commented that even if the results are not realistic at all, they are a good base for discussing with system experts. A. Apollonio highlighted that the RF powering is modelled in an independent reliability block in series with the RF cavities. R. Schmidt considered important to add Interlocks Systems and Beam Instrumentation to the model. J. Uythoven agreed. A. Apollonio commented that it was in the previous version of the Linac4 Availability model but not in the Failure Catalogue. O. Rey Orozco explained that the models are built following the failure catalogue structure, therefore Beam Interlocks and Beam Instrumentation are not modelled. This systems will be added to the Failure Catalogue and the models.

Next steps include the completion of Linac4 Failure Catalogue and iterative update of the Availability models. A. Apollonio agreed that the Failure Catalogue data needs to be reviewed. J. Uythoven pointed out that the most important is to keep the models up to date, understand which are the components with most impact on the system and keep contact with experts. The Linac4 will also be modelled in AvailSim3.0 as soon as the first version is available and will be compared with the two models in Isograph and ELMAS. Following the gained knowledge from the different software packages, a common input format of models will be proposed. A. Apollonio commented that the Fault Tree for the Linac4 Accelerator Fault Tracker will be defined in line with the Failure Catalogue. M. Jonker agreed that it is important to track with all the failure modes present in the Failure Catalogue. J. Uythoven proposed to follow up the Failure Catalogue and availability models in one month. A. Apollonio suggested to follow-up before the end of September as there is the MYRRHA workshop in October.

ACTION: Follow-up Linac4 Failure Catalogue and Availability models before the end of September.


First ideas for Linac4 Reliability Run (A. Apollonio)
The Linac4 reliability run will start in spring 2017 and will last 6 months. Everybody agreed that the reliability run should be operated from the CCC. A. Apollonio explained that ideally the machine should run steadily without any change at nominal parameters, but in practice this will be very difficult. Therefore, it should be decided which reference parameters to keep or plan for systematics change. Most important parameters are beam current and emittances. The second approach will considerably affect reliability estimates. A. Apollonio commented that maintenance strategies must be checked with experts. If maintenance has to be considered, 1 month run periods together with mini technical stops of about 1 week will be a possibility. Other points to be considered are, strategy of replacements versus repair for faults, track of spare parts and identification of root cause leading to failure. J. Uythoven asked if the aim is to track the failure modes. A. Apollonio considers the reliability run a great opportunity to do so. J. Uythoven agreed provided that the people on board are aware of the importance of tracking the root cause of failures. A. Apollonio explained that the root cause should be identified as long as the responsible pushes the experts. R. Schmidt mentioned that the Linac4 reliability run workshop would be a great opportunity to motivate the experts.

A team of mixed expertise is required to manage the reliability run (ABP, RF, MPE, MPP and OP). A. Apollonio explained that discussions are ongoing to define the team. The aim is to have the team ready by December with a plan for having the AFT ready for the reliability run. For this purpose, a responsible needs to be defined to ensure consistent tracking, the fault tree must be defined (in line with the Linac4 Failure Catalogue) and discuss with equipment experts reasonable granularity.

There are minutes attached to this event. Show them.