7th Evian Workshop

Europe/Zurich
Hotel Ermitage

Hotel Ermitage

Evian-les-Bains
Mike LAMONT
Description

The 7th Evian Workshop will be held on 13-15 December 2016 in the Hotel Ermitage in EVIAN (74), France.

Attendance is by invitation only.

                                                                                                                                                                           

The principal aims of the workshop are to:

  • review 2016 performance, availability, operational efficiency and identify possible areas of improvement;
  • examine beam related issues and establish a strategy for the rest of Run 2;
  • perform a critical review of system performance;
  • review control system performance and examine possible future developments;
  • develop an outline of the operational scenario for 2017 and 2018 for input to Chamonix 2017.

                                                                                                                                                                            

Chair Mike LAMONT
Deputy Chairs Malika MEDDAHI
Brennan GODDARD
Editors of the Proceedings Brennan GODDARD
Sylvia DUBOURG
Informatics & infrastructure support Hervé MARTINET
Workshop Secretary Sylvia DUBOURG
    • 08:00 10:00
      BUS - from CERN to the Hotel Ermitage 2h
    • 10:15 10:45
      Welcome Coffee 30m
    • 10:45 11:00
      Setting the scene 15m
      Speaker: Mike Lamont
    • 11:00 12:40
      SESSION 1 : OPERATIONS

      Matteo Solfaroli and Chiara Bracco

      Conveners: Chiara Bracco, Matteo Solfaroli Camillocci
      • 11:00
        Operation of a 6 BCHF collider: do we fit the expectation? 15m

        The way a complex machinery is operated has a direct impact on the production efficiency. In the case of the large hadron collider, that required huge efforts to design and build, it is of the utmost importance to assure an adequate operation quality.
        The exceptional results obtained in 2016 prove that all LHC systems and all teams, including the operation, have reached an excellent maturity level.
        This presentation will review the present status of the operation highlighting areas where further improvements could be investigated.

        Speaker: Enrico Bravin
      • 11:20
        Injection 15m

        Losses at injection will be distinguished between the two main loss causes, transverse loss shower from the transfer line collimators and longitudinal loss shower due to satellites which are placed on the kicker field rise and thus improperly kicked into the machine. The dependence of this losses on the different beam types, TL stability and injector performance will be reviewed. A status and potential improvements of the injection quality diagnostics and new values for the SPS and LHC injection kicker rise times will be suggested.

        Speaker: Wolfgang Bartmann
      • 11:40
        Turnaround and precycle: analysis and improvements 15m

        This talk will present data of turnaround times during the previous run, give some insights in the distribution and try to spot different bottlenecks. The impact of the turnaround time on the optimal fill length will be shown and different contributing factors to the turnaround itself will be discussed. The final goal is to identify areas of improvements and give concrete proposals, based on data presented.

        Speaker: Kajetan Fuchsberger
      • 12:00
        Cycle with beam: analysis and improvements 15m

        The 2016 proton beam cycles will be analysed, and proposals for improvements will be made based on the results. Some other suggestions are proposed for reducing the beam cycle time, for example modifying the combined ramp and squeeze, and their effects quantified. The objective is to present a synthesis of quantified potential improvements as a basis for further discussion.

        Speaker: David Nisbet
      • 12:20
        Machine reproducibility and evolution of key parameters 15m

        This presentation will review the stability of the main operational parameters: orbit, tune, coupling and chromaticity. The analysis will be based on the LSA settings, measured parameters and real-time trims. The focus will be set on ramp and high energy reproducibility as they are more difficult to assess and correct on a daily basis for certain parameters like chromaticity and coupling. The reproducibility of the machine in collision will be analysed in detail, in particular the beam offsets at the IPs since the ever decreasing beam sizes at the IPs make beam steering at the IP more and mode delicate.

        Speaker: Jorg Wenninger
    • 12:40 14:00
      LUNCH 1h 20m
    • 14:00 15:40
      SESSION 2 : AVAILABLITY
      Conveners: Benjamin Todd, Laurette Ponce

      Availability - A. Apollonio

      Scope is the Proton Run 2016; data has been prepared by the AWG and fault review experts using the AFT.

      This presentation is a summary of three individual reports that were written for:

      • Restart - TS1
      • TS1 - TS2
      • TS2 - TS3

      Combination of all of these;

      • 782 faults were recorded and analysed
      • 65 parent / child relationships were identified
      • two new categories were added; access management and ventilation doors.

      213 days were considered, 153 days of which were dedicated to physics and special physics.

      • Restart - TS1; 45% downtime, 30% stable beams, 22% operations, 2% pre-cycle
      • TS1 - TS2: 20% downtime, 58% stable beams, 21% operations, 1% pre-cycle
      • TS2 - TS3: 16% downtime, 54% stable beams, 29% operations, 1% pre-cycle.
      • Overall: 26% downtime, 49% stable beams, 23% operations, 2% pre-cycle

      Availability ranged from a minimum of 30% to high of around 90%, which was stable over several weeks.  The best weeks achieve around 3fb-1.

      Over the 175 + 4 = 179 fills reaching stable beams, 47% reached end of fill, 48% were aborted, 5% were aborted due to suspected radiation effects.  The main categories of premature aborts; UFO and FMCM.

      Short duration stable beams are due to intensity ramp up.  Before MD1 and BCMS the machine was left to run for as long as possible.

      • Restart - TS1; 6.9h EOF, 8.0h aborted
      • TS1 - TS2: 16.2h EOF, 7.7h aborted
      • TS2 - TS3: 11.5h EOF, 7.8h aborted

      In the period of physics production there were 779 faults, with 77 pre-cycles due to faults.  

      • Integrated Fault Duration: 1620h
      • Machine Downtime: corrected for parallelism of faults
      • Root Cause: re-assigned for root-cause dependencies.

      Top 5 are:

      • Injector Complex: 25.4%
      • Technical Services: 22.6%
      • Cryogenics: 7.3%
      • Power Converters: 6.1%
      • Magnet Circuits: 5.6%

      The period was dominated by high-impact faults.

      Big improvers versus 2015:

      • QPS, almost invisible to OP
      • Radiation effects to electronics, significantly fewer events than predicted
      • Cryogenic system, impact of ecloud under control, and solved recurring sources of faults.

      Conclusions:

      • Several weeks 90%, 3fb-1, very re-produceable operating conditions
      • Un-Availability: typically long isolated issues
      • 2017: should be the same, unless we move from the zone in which we are now.

      J. Wenninger - what is in the operation section of the pie-chart? Can it be separated?

      A, Apollonio - we can quantify it, but not automatically. 

      L. Ponce- the column for operation mode can be extracted, but we need an automated means to correlate this.

      M. Lamont - The operational conditions of the machine being stable appeasr to influence the stability of the LHC availability.  Does keeping the operational conditions stable mean that systems will keep (or have kept) the same availability?

      A. Apollonio - we will see next year.  The comparison of 2015 to 2016 is difficult, as the things like BCMS and bunch spacing has changed the operational conditions of the machine.

      L. Ponce- 2015 was dominated by cryogenic recovery and stability, 2016 has not had the same issues.  The sources of aborted fills, which are immediately repaired, are a factor which needs to be considered, for example, a fault which leads to a beam abort, which requires no repair, but the machine to be re-filled.

      S. Redaelli - this year was one of the years where we lost the most number of operational days due to long faults.  Once this is corrected out, what is the characteristic of the fault data?

      A. Apollonio - 2016 began with poor availability, with isolated faults, having a long duration, since then it appears that "random faults" have been the driving factor?

      S. Redaelli - what about the R2E? why is it so few failures?

      S. Danzeca - The TCL settings are one of the main contributors of R2E failures.

      G. Rakness - how come at the end of the year there is high availability and yet not much physics produced?

      L. Ponce - at the end of the year there were several areas of machine exploitation that meant the machine was not producing physics, for example there were several MDs.  It was noted that on 24th October in three consecutive days, there was the highest luminosity delivery of the year.

       

      Technical Services - J. Nielsen

      The five systems which are monitored by TIOC;

      • Cooling and Ventilation
      • Electricity
      • Safety Systems
      • Access System
      • IT network

      These categories are distributed across several elements of the AFT tree.

      The events which occur are classified by groups, in the future this could be done by mapping systems and equipment instead of by group, matching the approach from the AFT.  This will help classify events more clearly.

      For example, some systems are groups of systems, the classification could be improved and AFT could show groups of systems.  To achieve this the definition of a "system" should be improved.

      TIOC meets every Wednesday to analyse the events that have occurred during the week, then recommendations are made to mitigate root causes.  TIOC coordinates the larger technical interventions.

      If an accelerator is stopped, then a major event is created.  The data for such an event is taken once, and is not subsequently synchronised, this could be improved.  The major events are presented in the weekly TIOC meeting.  The machine or service operator fills in the first part of the information, then the user and/or group then fills in more information.

      The fault information for 2016 shows major groups:

      • EN-EL = 40% - largely due to the weasel
      • EN-CV = 32%
      • BE-ICS = 17%, note that this does not include the cryogenics

      Breakdown by fault count (with duration):

      • Controls and instrumentation = 12% (15%)
      • Equipment = 31% (36%)
      • Electrical perturbations are 46% of the faults (45%)

      Controls and Instrumentation:

      Mostly PLC failures.

      Equipment Faults:

      Usually due to common-mode power supply faults, for example a failure which trips the power supply to several element (selectivity tripping at a higher level),

      certain events are due to equipment not suitable for use (old installations being re-tasked), general equipment failure, or calibration problems.

      Downtime is higher than 2015, but if you remove the weasel, it is lower (-30%).

      Electrical Perturbations:

      • 2015: 3 hours of downtime, around 15 faults.
      • 2016: 23 hours of downtime, around 45 faults.  

      A general report from the mains supply services shows that 2016 has had -19% thunderstorms than a typical year.

      Conclusions

      • TIOC is effective, and the follow-up has been good.  Several things are being worked on and followed up.
      • the next goals are to try and exploit the AFT in a better way, to align and synchronise the information.

      M. Lamont - the weasel showed that there were some spares issues.

      J. Nielsen - there were spares, but not in good condition.

      D. Nisbet - how do we close the loop with equipment groups? How can we see improvements?

      J. Nielsen  - next year we hope it the duration of fault assigned to the technical services will be lower, there have been long effect faults.  For the follow up it is the equipment groups and users.  An event is not closed in the TIOC unless it is not going to be mitigated, or that it has been mitigated.

      L. Ponce - the TIOC is doing much more follow up than the machines do for the AFT.

       

      Injector Complex - V. Kain and B. Mikulec

      Injectors were the number one cause of 2016 LHC downtime, although it should be taken into account that there are four injectors before LHC.  If this was split, then the LHC "injector" would be a shorter bar per machine.  

      It is not easy to find which accelerator is the source of LHC downtime, AFT is being discussed to be added to assist in this work.

      138 faults were attributed to the injectors, with 15 days downtime.  This analysis was very time consuming, as the connection from LHC to the injectors logbooks is not automatic.

      LINAC2 - 6h 20m as seen by LHC

      3 faults, notably replacement of an ignitron

      Booster - 11h 45m as seen by LHC 

      several faults, mainly electro-valves, longest individual fault was 4 hours.

      PS - 9 days 10 hours as seen by LHC

      power converters, MPS and POPS, vacuum and radio frequency.  Power converter is over 6 days of this, vacuum over 1 day, and RF over 15 hours.

      SPS - 4 days 19 hours as seen by LHC

      power converters (no real systematics) over 1 day, 8 hours. targets and dumps 23 hours, radio frequency over 18 hours.  A lot of systematic issues effecting beam quality, but like degraded mode, not an actual fault.

      If you contrast the overall performance, as LHC only needs beam during filling, considering each machine as a continuous operation;

      LINAC2 - 97.3% uptime, 166h downtime.

      • Source is 44.1% - new source being tested in EYETS
      • RF system is 34.6% - analysis is ongoing
      • External is 14.7% - power glitches and cooling water

      Booster - 93.9% uptime, 384h downtime

      • LINAC2 si 33.6% - 
      • RF system is 17.5% - was in a degraded mode, but incorrectly actioned
      • Beam Transfer 15.8% - septa and electro valves issues - will be replaced next year
      • Power converters 14.3% - random faults

      PS - 88% uptime, 727h downtime

      • Power Converters is 38.3% - POPS capacitors will be replaced
      • Injectors is 26.6%
      • RF is 10.7%
      • Beam Transfer is 6.9%

      availability per user varies from 79-94%

      SPS - 74.8% uptime, 1366h downtime

      • injectors & targets
      • looks random failures

      Issues with fault tracking in the injectors;

      • Not everything is captured in the injectors, perhaps automated tools can be added.
      • The concept of a destination and user is tricky to add.
      • SPS cannot distinguish between no request, or request but fault.
      • faults attributed to a timing user currently, but rather has to be LSA context
      • Root fault cause is not correctly identified
      • How to account for degraded modes?

      Injector AFT

      • will address some of these issues
      • categories are organised
      • LSA contexts will be used, so statistics by context or group of context can be done
      • an elogbook interface context dependent will be done
      • separating warnings from faults

      Injector downtime appears to be correlated by a few longer uncorrelated breakdowns.

      J. Jowett - the consideration of only the P+ run has hidden some issues which were observed during the P+ Pb run. Although there were other injectors used for the Pb injection.

      M. Lamont - how come the LHC was not adversely effected by poor LINAC availability?

      B. Mikulec - the LHC never asked for beam, and therefore no fault was logged.  

      M. Lamont - are the breakdowns really uncorrelated? could be correlated with maintenance activities needing some improvement.

      B. Goddard - note that sometimes the maintenance has led to lower availability (e.g. water valves).

      L. Ponce - how to keep track of the degraded modes? it is something that was also abandoned in the LHC, in the AFT it was too difficult to track, and so was not done.

      R. Steerenberg - having this degraded information for the whole period would make things clearer, at the moment the reality is obscured due to the incomplete capture of the degraded mode. 

      L. Ponce - agrees, in addition, injector "downtime" can be flagged in AFT, for example, "prevents injection".  Following MKI problem this was added, this was not used in 2016.  For example the 35h fill, for example, was kept so long to avoid injector issues.

       

      Cryogenics - K. Brodzinski

      There are four cryogenic islands, 8 cryogenics plants. A = Low Load, B = High Load.

      • During run 1 two cryogenic plants could be stopped.  
      • In 2015 all plants were activated to compensate electron cloud heat load, there was still some operations margin.
      • In 2016 a new configuration was used, switching off one cold compressor unit, moving capacity between A and B systems.  This can be safely done as LHC is running below optimum values.
      • During LS2 some valves will be replaced to allow even further sharing of load between A and B systems.

      Cold boxes were tuned, achieving 175W per half cell capacity on the worst performing sectors (around point 8).  In sectors 2-3 the beam screen heat load cooling capacity can reach 195W.  The general limit is 160W

      In 2016 - 94.4% availability.  If you exclude users and supply, it achieves 98.6%.

      • User = quench, Supply = Mains.

      2015 - total downtime 273 hours

      2016 - total downtime 79 hours

      this improvement comes from four effects;

      1. Feed forward logic for beam screen heating
      2. points 2 & 8 optimisation
      3. point 8 cold box repairs
      4. DFB level adjustment

      Overall around 60% of downtime was due to PLC failures, this is a known issue for some time.

      • YETS 2015/16 - an anti crash program was added, it has still some issues,
      • EYETS 2016/17 - a further upgrade will be applied by BE/ICS on 50% of the equipment.

      4.5K - 1 x human factors, 2 x PLC

      1.8K - 1 x mechanical failure, 1 x AMB CC

      Helium Losses

      • 2010 - 40 tons, 29 operational 
      • 2016 - 17 tons, 9 operational

      Beam Screen Heat Load

      • on average 120W per half cell, 160W is the general limit

      2017 plans:

      • in the EYETS, update 50% of the PLCs to attempt to deal with code crashing issues
      • Same operational scenario as 2016
      • limit is still 160W per half cell
      • inner triplet cooling will be OK provided the load is <250W per inner triplet
      • in 2016 200W per inner triplet was seen, at 6.5TeV and 1.5e34 peak luminosity.  Maximum possible is 1.7e34.

      J. Wenninger- is the triplet limit 2.0e34 or 1.7e34?

      K. Brodzinski - the limit is really 1.7e34.  After the tests carried out, a baseine of 300W heat load on triplet was expected, but once the re-calibration correcting factor was added, the actual load managed was only 240-250W.  There is still room for improvement, 1.75e34 is something that is known, and can be done.  To reach 2.0e34 tuning is needed.

       

      Sources of Premature Beam Aborts - I. Romera / M. Zelauth

      86 fills were aborted, a pareto of these has three large contributors:

      Technical Services x 27:

      • 23 x Electrical network perturbations (22 FMCM, 1 QPS XL5, XR5)
        • 12 FMCMs are installed in LHC, designed to interlock on current change.
        • 250mA change at RD1 changes the orbit 1.5 sigma.
        • 9 of the FMCM events were global, big enough to effect other parts of the complex
        • FMCM on the 18kV network observe more glitches, being closer to the 400kV line.
        • four SATURN supplies will replace the four converters on the 18kV
      • 3 x water pumps and flows
      • 1 x water inflitration/ cooling and ventilations

      Power converters x 15

      • 6 x SEU candidate events
      • 4 x internal / external converter failurs
      • 2 x communications issues
      • 2 x orbit dipole corrector issues
      • 1 x interlock interface, which has not been solved in 2015

      Beam Losses / Unidentified Falling Objects (UFOs) x 14

      • 6 x IRs
      • 3 x sector 1 - 2 since threshold changes in August
      • no magnet quenches due to UFO since July 2016 but low statistics

      Remaining

      Small counts; interesting cases are:

      • Collimation - LVDT measurement - x 3 
      • QPS - I_DCCT current measurement, likely screen grounding issue - x 3
      • Training Quenches - MQ.22L8 - x 2
      • Cryogenics - Cryogenics Maintain lost - x 2

      There is no correlation obvious.

      Conclusions

      everything looks random, bottom of the bathtub curve!

      G. Arduini - for the power converters which have a possible radiation effect, five out of six are in point 5 RRs.

      M. Zerlauth - this is the case, these are planned to be changed, first by replacing the controller (FGClite) and then the converter power part.

      S. Danzeca - the events in RR53 / 57, happen when the TCL settings are "closed", when the TCL are opened there are no events.

      A. Lechner - concerning the UFO for the IR, thresholds have already been increased for the year. 

      B. Goddard - putting the last presentations together, it's remarkable that there was only one dump from the dump kickers, and the dilution systems.  This is due to the reliability run, which has shown to be clearly beneficial.

      • 14:00
        Overview: LHC availability and outlook 15m

        The LHC exhibited unprecedented availability during the 2016 proton run, producing more than 40 fb-1 of integrated luminosity, significantly above the original target of 25 fb-1. This was achieved while running steadily with a peak luminosity above the design target of 1e34 cm-2s-1. Individual system performance and an increased experience with the machine were fundamental to achieve these goals, following the consolidations and improvements deployed during the Long Shutdown 1 and the Year End Technical stop in 2015. In this presentation the 2016 LHC availability statistics for the proton run are presented and discussed, with a focus on the top contributors to downtime.

        Speaker: Andrea Apollonio
      • 14:20
        Technical Services: Unavailability Root Causes, Strategy and Limitations 15m

        • Split the TI events leading to unavailability into the relevant systems / sub-systems
        • Breakdown the downtime with respect to each, analysing with respect as TIOC sees fit.
        • Identifying the measures that are put in place with respect to these.
        • Identifying the measures & mitigations that could be foreseen for 2017
        • Making observations versus 2015

        Speaker: Jesper Nielsen
      • 14:40
        Injectors: Unavailability By Machine, Root Causes, Strategy and Limitations 15m

        • Split the injector events leading to unavailability into the relevant systems / sub-systems
        • Breakdown the downtime with respect to each, analysing with respect to the injector machines.
        • Identifying the measures that are put in place with respect to these.
        • Identifying the measures & mitigations that could be foreseen for 2017
        • Making observations versus 2015

        Speaker: Bettina Mikulec
      • 15:00
        Cryogenics: Unavailability Root Causes, Strategy and Limitations 15m

        • Split the cryogenic events leading to unavailability into the relevant systems / sub-systems
        • Breakdown the downtime with respect to each.
        • Identifying the measures that are put in place.
        • Identifying the measures & mitigations that could be foreseen for 2017
        • Making observations versus 2015

        Speaker: Krzysztof Brodzinski
      • 15:20
        Recurring Sources of Premature Dumps 15m

        • Analysing the beam aborts from 2016
        • Breakdown the beam aborts into root causes.
        • Expand each root cause and make observations vis-a-vis their impact on operations.
        • Identifying the measures & mitigations that could be foreseen for such abort causes in 2017
        • Making observations versus 2015.

        Principally:
        FMCM, UFO, Power Converter Trips, Single Event Effects

        Speakers: Ivan Romera Ramirez, Markus Zerlauth (CERN)
    • 15:40 16:10
      Coffee Break 30m
    • 16:10 17:50
      SESSION 3: PERFORMANCE 1
      Conveners: Alessio Mereghetti, Giovanni Iadarola
      • 16:10
        Optics control in 2016 15m

        Main points to be addressed:
        • Achieved beta-beating and DQmin level
        • IP beta and waist uncertainties
        • Linear optics and coupling reproducibility and evolution (Run 1 vs Run 2)
        • Status of the automatic coupling correction (DOROS, ADT-AC dipole)
        • Possible efficiency improvements:
        o How can we speed up the process?
        o How reusable are optics corrections from one year to the next?
        o What could be performed by OP without expert support?
        • Needs for 2017 optics commissioning

        Speaker: Tobias Hakan Bjorn Persson
      • 16:30
        Non-linear corrections 15m

        Main points to be addressed:
        • Status of non-linear corrections in the LHC
        • Present and future needs for IR non-linear corrections (LHC, HL-LHC)
        o Do we need them in LHC at b*=~30cm?
        • Methods and MD results
        • IR non-linear corrections for operation in 2017:
        o Are methods and tools ready?
        o What are the requirements in terms of commissioning time?

        Speaker: Ewen Hamish Maclean
      • 16:50
        Experience with the ATS optics 15m

        Main points to be addressed:
        • MD results (aperture, optics correction, chromatic properties, b reach,
        collimation, etc.)
        • Is it ready for 2017 physics?
        o Any limitation left? (phase advance MKD-TCT, CT-PPS/AFP normalised
        dispersion)?
        • Improvements or expected-problems with other operation modes:
        o High and intermediate b
        runs (forward physics in 2017 and beyond)
        o Ion operation (IP2 squeeze)
        • Future MD plans (flat, LR beam-beam compensation with octupoles?)

        Speakers: Rogelio Tomas Garcia (CERN), Stephane Fartoukh
      • 17:10
        Collimation: experience and performance 15m

        Main points to be addressed:
        • Overview on 2016 performance (and comparison vs 2015)
        o Reminder of settings throughout the cycle
        o Cleaning (Run1 vs Run 2)
        o Aperture measurements
        o Effects of MKD-TCT change of phase advance in 2016
        • MDs (potential improvements for next year):
        o tighter collimation settings à hierarchy limit
        • Brief recap on ion experience
        • (MDs for further future: crystals, active halo control)

        Speaker: Daniele Mirarchi
      • 17:30
        Analysis of beam losses 15m

        Main points to be addressed:
        • Different methods for lifetime estimation
        • Losses in the different phases of the cycle
        • Plane decomposition
        • Comparison against Run 1

        Speaker: Stefano Redaelli
    • 19:30 21:30
      DINNER 2h