## **Controls and Machine Protection**

Joint International Accelerator School

on

**Beam Loss and Accelerator Protection** 

Enzo Carrone Newport Beach, CA – Nov. 10<sup>th</sup>, 2014









## **Off to a good start**

#### "Tous les paramètres sont normaux et la trajectoire est normale"

The software generated a number too big for the system to handle: the computer shut down and passed control to its redundant twin which, **being identical to the first**, came to the same conclusion and shut down a few milliseconds later.

The rocket, now without guidance, changed direction to compensate for an imagined error and collapsed in its own turbulence.



- **1**. Software reused from the Ariane 4 series, which has special requirements and lower horizontal velocities.
- 2. An error while converting a 64 bit floating point number to a 16 bit integer caused an overflow: a custom floating point format for which the processor could have generated an exception error.
- **3**. Some conversions (in Ada code) on the computers are protected from bad conversions, but this one was disabled.

- 4. The primary inertial sub-computer and its backup **both** shut down because of this, and the primary started a memory dump.
- 5. The main computer looked at the data dump and interpreted it as flight data. The nozzles swiveled to their extreme position to try to "right" the rocket, causing it to break apart. When the boosters broke off, the whole launcher automatically self-destructed.

#### **Controls are complex systems**



Subsystems

#### **Controls Architecture**



#### **Controls Architecture**



#### **Controls Architecture**



### What can go wrong?

SLAC

#### Anything.

- **1.** System fails unsafe;
- 2. System fails when it shouldn't have failed;
- **3**. You are trying to make changes to a program, only to discover you were working on the wrong version;
- **4.** Wrong data received from sensors (but interpreted as true);
- 5. The system was changed and cannot be brought back to a previous state;
- 6. The systems needs to be upgraded/changed, but there is not enough documentation to do it;
- 7. System compromised by a malicious piece of code, gone unnoticed for a long time.

## How to mitigate some risks (today's agenda)

Not all risks are created equal.

- 1. Redundancy;
- 2. Lifecycle management;
- **3.** System architecture;
- **4.** Configuration control;
- **5.** Quality Assurance;
- 6. Standards;
- 7. Tests;
- 8. Documentation;
- 9. Cyber Safety.

si aq

Sometimes it is two capacitors on a circuit-board in case one fails; other times it is the duplication of a whole system, such as in the US missile programs of the 1950s, where the Air Force built redundancy into each and every component of the entire missile.

It allows to design for and quantitatively demonstrate reliability.

Risk of common-mode failures b/w diverse software is hard to eliminate: any shared specification can lead to common-mode failures. Space Shuttle's extensive redundancy makes its workings "mysterious and unpredictable even to its designers".

Redundant elements also require further "managerial" systems to determine, indicate, and/or mediate failures (e.g. four engines on aircrafts).

Apollo 11 Capsule: "Whilst the primary control is automatic, for vehicle operation, man has been added to the system as a redundant component who can assume a number of functions at his discretion dependent upon his diagnosis of the state of the system. Thus, manual control is secondary" –i.e., Neil Armstrong was THE redundant element!

#### **Some constraints**

SLAC



#### Challenges:

- Diagnostics
- Maintenance
- Test Equipment
- Spare parts procurement
- Knowledge management
- Training

## 2. Lifecycle Management (Systems Engineering)

The V-Model



Modern high performance instruments

- High speed, high resolution ADCs on Front End take in transducer data
- High speed DACs on Back End drive actuators
- FPGA between ADCs and DACs digitally process signals and implement measurement/control algorithm

A tool for FPGA algorithm development:

MatLab Simulink +Xilinx System Generator (Altera DSP Builder) implements Simulink algorithms with vendor IP blocks.

(VHDL Libraries à la UNICOS)

FPGAs act as digital HW, but should be treated as SW

#### **3. Architectures**



#### **MPS**

-SLAC



Beam Containment System (BCS): keeps beam within prescribed channels and limits, preventing generation of excessive level of radiation in occupied areas.

It also protects the integrity of safety-related beam line components.

BCS performs this function by monitoring beam power and beam loss, and shutting beams down if a limit is exceeded.

#### BCS: A hybrid architecture – Cont'd



Remote I/O drop acts as a data hub:

- Slow sensors directly connect to PLC
- Custom chassis for fast sensors
- Chassis provide "fault" information to local Digital Summary chassis
- Global Digital Summary chassis for fast shut off of output devices
- PLC provides diagnostics and configuration control

**PPS** 



## **Evolution of Safety Systems at SLAC**



#### **Traditional Architecture**



### **Modified Traditional Architecture**

SLAC



#### 2<sup>nd</sup> Generation (CCR Upgrade)



SLAC

3<sup>rd</sup> Generation (BCS Upgrade)



## **Option: A Single-PLC System**

#### Identical functionality based on a different system architectures



# A single CPU operating in a **1001D** structure is certified by TUV for SIL 3.

Four layers of protection:

- **1. Failsafe input module** (monitored signal wire; local protection with lockout & reset; discrepancy analysis and time out for faulty inputs; communication watchdog);
- **2. Safe communication network** (fault detection, fault reaction, recovery);
- **3.** Safety rated controller (redundant evaluation of inputs);
- **4. Failsafe output module** (tests itself with internal test pulses; safe state failure mode for communication failure; no reliance on PLC for errors handling).

<u>s ac</u>

#### Coexistence of standard and safety program in a controller

- Separation of standard and failsafe program;
- Changes in the standard program do not affect the integrity of the safety program;
- One engineering platform for both applications;
- Protection of the Safety program from unauthorized access.



#### Field Hardware (PLC inputs) is still redundant.

## **Logic Solver Internal Redundancy**

SLAC

#### **Time Redundancy/Diversity instead of Structural redundancy**



- Two copies executed in parallel: one using 16-bit word mode, the second using single-bit binary instruction; the compiler changes one into another (diversity);
- When  $D \neq C$ , the safety program is executed. Twice (redundancy);
- Bool and Word Operations processed in different parts of the CPU;
- Two independent hardware timers.

A DOE National Laboratory, 2007:

There was a failure of a PLC;

The PLC was repaired using the incorrect version of software.

SLA0



SLAC

There were two simultaneous core dumps.

The recovered system reveals problems; the status reported on EPICS was not all correct.

**Development Software Locations:** 

Not in CVS A:\Floppy Disks C:\Desk Top Drive D:\CD Burner F:\Flash Drives V:\Group Drive Z:\Employee Drive





The recovered system required certification, but the certification document was not consistent with the previously executed version.

A hardcopy was not available, so a document was printed from Microsoft Word.

Track Changes function in Microsoft Word created two versions.

An investigation ensued.

### **Software Control**

| Requirements                                                 | Personnel Protection                                                                                                                          | Machine Protection                                                                       |
|--------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------|
| Use of Configuration Versioning<br>System (CVS) for Software | YES                                                                                                                                           | YES                                                                                      |
| Manage check-in/out of CVS with<br>procedures                | YES                                                                                                                                           | YES                                                                                      |
| Track & Check Checksum                                       | Yes; Additional "Safety Signature"<br>available for Safety-Rated PLCs                                                                         |                                                                                          |
| Software Download is password<br>protected                   | YES                                                                                                                                           | YES                                                                                      |
| Download over network?                                       | No; not allowed. Local ProfiBus connection only allowed.                                                                                      | YES                                                                                      |
| Download to wrong CPU across<br>network?                     | No; isolated networks and different CPU<br>names/IP addresses even if on same<br>network                                                      | No; isolated networks and different CPU<br>names/IP addresses even if on same<br>network |
| Protection against wrong safety<br>program load              | Hardware configuration is loaded; safety<br>modules have hardware DIP switches.<br>Hardware configuration error causes fail-<br>safe shutdown | NO                                                                                       |

## **Software Control – Cont'd**

| Requirements                                                                                              | Personnel Protection    | Machine Protection |
|-----------------------------------------------------------------------------------------------------------|-------------------------|--------------------|
| Physical Isolation from Controls Network                                                                  | NO                      | NO                 |
| Possible accidental download of Safety-<br>Critical program from controls network                         | NO; local download only | YES                |
| Possible intentional sabotage of Safety-<br>Critical program from controls network                        | NO; local download only | YES                |
| Possible accidental download of Access<br>Control (non-safety critical) progam from<br>controls network   | YES                     | YES                |
| Possible intentional sabotage of Access<br>Control (non-safety critical) program from<br>controls network | YES                     | YES                |
| Possible accidental changes to EPICS HMI panels from controls network                                     | YES                     | YES                |
| Possible intentional change to EPICS HMI panel from controls network                                      | YES                     | YES                |

When do you patch/update your Server/Industrial PC/PLC OS/firmware?

- Is it a critical fix?
- Can it wait?
- Will the code/IP still work?
- Database to track the fw update?
- Point of contact for vendors?
- Maintenance contract?
- When committed to the new version, all systems have to be changed.
- Once done the entire system has to be re-tested.

## **5. Quality Assurance**





#### **Minimum Recommended Reviewer Complement**

| Minor Modification, familiar methods | 1 external reviewer                                                                                   |
|--------------------------------------|-------------------------------------------------------------------------------------------------------|
| Minor Modification, new methods      | 2 external reviewers                                                                                  |
| Medium Change, familiar methods      | 2 external reviewers                                                                                  |
| Medium Change, new methods           | 2 or more external reviewers,<br>1 external to Control Dept                                           |
| Large Change, familiar methods       | 3 external reviewers,<br>1 external to Controls Dept                                                  |
| Large Change, new methods            | 3 or more external reviewers,<br>1 or more external to Controls Dept,<br>1 external to the laboratory |

SLAC

• Minor Modifications: adding or moving an emergency off button, BSOIC, or Ion Chamber, equivalent device substitutions such as upgraded annunciator panels, or minor logic changes that improve performance but are not changes in the logic specification;

• Medium Changes: redesigns of stopper, BTM, BSOIC, PIC Chassis, or power supply interface chassis, or minor changes in PPS logic specification;

• Large Changes: new PPS zones, new BCS regions, complete PPS rebuilds or significant logic modification.

**Fixed Programming Language (FPL):** ASIC, embedded Code... - only operating parameters may be changed. E.g., smart sensor.

**Limited Variability Language (LVL):** Program runs in restricted memory space (restricted by hardware). IEC 61131-3. E.g., PLC.

**Full Variability Language (FVL):** C, JAVA, etc. Allowed to write any value in any memory space.

SLAC

VHSIC Hardware Description Language (VHDL) –where VHSIC: Very High Speed Integrated Circuit.

Elegantly represents parallel processing. Modular language, similar in visual style to Ada. Features: inclusion of time/clock processing and the facilities for describing low-level circuits.

It can be simulated, rather than executed, and these simulations may give different results to actual implementations if VHDL is poorly constructed. This means that **simulation cannot**, in general, be considered equivalent to software testing.

## IEC 61508, Part 2 and 7

- Use of structured and modular design
- Restricted use of asynchronous constructs
- Design for testability
- Restrictive use of ambiguous constructs
- Transparent and easy to use code
- Defensive code and range checking (o pick up faults or anomalies and respond in a pre-determined way)
- Use of comments and annotations
- Limits on module sizes and number of ports to increase readability
- Avoidance of multi-dimensional arrays, goto type commands
- Avoidance of redundant logic and feedback loops
- Avoidance of latches, asynchronous reset

## 6. Standards (Hardware)





#### **Computer Automated Measurement And Control:**

- Standard bus <u>and</u> modular crate electronics standard for DAQ and Controls, defined in 1972;
- Solved the low channel density problem of NIM;
- Up to 24 modules in a crate, interfaced to PC;
- Not hot-swappable because of backplane design;



SLAO

- Dataway manages: Module power, address bus, controls bus and data bus;
- 24-bit communication b/w controller and selected module.

## VME

#### Versa Module Europa

- Standard backplane bus, defined in 1981;
- Architecture not scalable for high speed (single ended parallel bus not for Gbps);
- EMI shielding not specified;
- Developed for Motorola 68000 line of CPUs (the bus is equivalent to the pin of 68000 run out onto a backplane);
- Faster bus (from 16 to 64 bit), up to 40 MHz (VME64).



## They can get quite busy...





- First modular computer architecture with completely serial multi-Gbps backplane
- Serial ports are bidirectional pairs in star or mesh topology
- Intelligent Platform Manager Interface IPMI for cooling and thermal management, control and monitor

Modules:

- Cooling Units (CU);
- Power Modules (PM);
- Advanced Mezzanine Card (AMC) for electronics, CPU, hard drives;
- Rear Transition Module (RTM);
- μTCA Central Hub (MCH)





## Reliability

SLAC

Built-in hot-swap.

Designed for high-reliability:

- Intelligent Platform Manager Interface IPMI for cooling and thermal management, control and monitor;
- Built-in crate and component status monitoring and remote management/diagnostics;
- Independent monitoring channel within the crate.

#### AKA "Pizza Box".

Guidelines for NAD design have to be provided on a case-by-case basis, i.e., *there is not a standard to design a NAD-based system*.

Every board is unique, as it is the creation of the design engineer.

A NAD-based board can be equipped with plenty of cards from many vendors.





System and Integrity checks built into the lifecycle.

Tests have to be reviewed and documented.

System and integrity check should be performed regularly.

Test procedures can have zero-error tolerance (cold- and hot-checkout, test bench, test validation, test execution).

Executed Test procedures have to go under configuration management.

Test procedures should go through a closeout process.

## 9. Documentation

-SLAC



## **10. Cyber Safety**



## **Evolution of regulatory landscape: EO**

#### **Obama's Cybersecurity Executive Order (Feb 2013):**

- Agencies to share information with companies
- NIST to develop a framework with industries
- Review CS regulation
- Voluntary compliance

Deadline: Feb 2014

Leadership: NIST

## **Evolution of regulatory landscape**

**Obama's Cybersecurity Executive Order (Feb 2013).** 

# Framework for Improving Critical Infrastructure Cybersecurity (NIST, Feb 2014): a system of regulations and the means used to enforce them.

- 1. Core functions (activities and references);
- 2. Implementation tiers (guidance);
- 3. Framework profile (how to integrate CS functions within a CS plan).

SI AG

## Framework's core

#### SLAC

| Functions | Categories | Subcategories | Informative References |
|-----------|------------|---------------|------------------------|
| IDENTIFY  |            |               |                        |
| PROTECT   |            |               |                        |
| DETECT    |            |               |                        |
| RESPOND   |            |               |                        |
| RECOVER   |            |               |                        |

Asset Management," "Access Control," "Detection Processes".

"External information systems are catalogued," "Data-at-rest is protected," "Notifications from detection systems are investigated".

SLAC

NIST Special Publication (SP) 800-53 (Computer Security Guide) – Rev. 4 published in April 2013

#### Information Security Program:

- Risk assessment
- Policies and procedures
- Subordinate plans
- Training
- Periodic testing
- ATS
- Incident response
- Continuity of Operations

SLAC

DOE 0205.1B Cyber Security Program:

- Integrated, Enterprise-Wide
- Risk Managed Approach (vs. Controls Compliance)
- Configuration Management
- Consistent with NIST
- Line Management Accountability
- Integrated into GOCO model
- Protection of DOE information and information systems
- Mission focused
- Governance

DOE oversight is conducted through **Contractor Assurance Systems**: this provides the *Authorization Function*.

**Risk Management Approach** (RMA): Framing; Assessing; Responding; Monitoring.

What do you know, you are not alone anymore!

SI AC

### **IEC 17799:** Information Technology – **Code of practice for Information Security Management** :

High level, broad in scope, conceptual in nature and a basis to develop your own security standard and security management practices.

# **ISA-TR99: Integrating Electronic Security into the Manufacturing and Control System Environment:**

Guidance to user and manufacturers, analyzing technologies and determining applicability to securing Manufacturing and controls.



### IEC 15408 (3.1) – Information Security Management Systems (ISMS):

Framework to specify security **functional and assurance requirements** through the use of Protection Profiles. **Vendors** can implement security attributes, **testing labs** can evaluate the products.

## IEC 27001:2005 - Common Criteria for Information Technology Security Evaluation (aka CC):

System to bring information security under **explicit management control**:

- Policies and governance; Asset management; HR security;
- Access control; Incident management; Business continuity; etc.



## Key to success is to engage in a proactive, collaborative effort between management, controls engineers and IT Department. NIST 800-53 is king.

#### Along came NIST 800-82.

- Many times a CS team (an "enclave") exists already.
- "Ah, we're not sure we can share such information with you"...
- They might even tell you that it is impossible to gain access.
- You are the bridge between 800-53 and 800-82.
- You will have to provide the expertise to implement it.

## NIST SP 800-82

• Latest release: Rev.2, May 2014

• "*Defense-in-depth*" strategy: layering security mechanism so that impact to one mechanism as a result of failure is minimized.

• Includes:

- ICS policies based on DHS Threat Level;
- Implementing multi-layer network topology;
- Provide logical separation b/w corporate and ICS networks;
- Use DMZ (i.e., no direct communication b/w ICS and corporate);
- Fault-tolerant design;
- Redundancy for critical components;
- Privilege management;
- Encryption.

### NIST 800-82 Do we speak the same language?



| Requirement              | ICS                                                                                   | IT                                                                  |
|--------------------------|---------------------------------------------------------------------------------------|---------------------------------------------------------------------|
|                          |                                                                                       |                                                                     |
| Performance              | Time critical, deterministic.                                                         | High throughput. Reliable.<br>Jitters ok.                           |
| Availability             | Continuous processes. No<br>start/stop. Planned outages,<br>no rebooting. Redundancy. | Tolerable.                                                          |
| Risk Mgmt                | Human safety paramount.<br>Compliance, losses,<br>damages. Fault intolerant.          | Data integrity paramount.<br>Priority: CI(A). Recover<br>by reboot. |
| Architecture<br>Security | Protect edge systems (PLC, DCS).                                                      | Protect assets and info (often centralized).                        |

# NIST 800-82 - Do we speak the same language? – Cont'd

| Requirement               | ICS                                                                        | IT                                   |
|---------------------------|----------------------------------------------------------------------------|--------------------------------------|
|                           |                                                                            |                                      |
| Physical<br>Interaction   | Yes, certification.                                                        | Coming up<br>(internet of<br>things) |
| Time-Critical<br>Response | Yes (e.g. HMI password<br>shouldn't compromise<br>emergency actions)       | No                                   |
| System<br>Operations      | Control Engineers are not IT.<br>Legacy, proprietary systems<br>(support?) | More tools<br>available.             |
| Resources                 | Limitations on CPU, third part security solutions                          | Plenty                               |

# NIST 800-82 - Do we speak the same language? – Cont'd

| Requirement | ICS | IT |
|-------------|-----|----|
|-------------|-----|----|

| Communications          | Often proprietary                                                                   | Plenty                   |  |
|-------------------------|-------------------------------------------------------------------------------------|--------------------------|--|
| Change Mgmt             | Requires tests by vendors and<br>re-certifications. Old OSs no<br>longer supported. | Same                     |  |
| Managed Suppt           | Often single vendor.                                                                | Often same<br>challenge. |  |
| Component<br>Lifetime   | 15-20 years                                                                         | 4 years                  |  |
| Access to<br>Components | Remote, hazardous locations                                                         | Easy                     |  |

For Laboratories, risk tolerance for "generic" ICS is different than for Personnel Protection or Medical Technology (protect lives, information, asset, etc.).

Somebody will have to identify boundaries and interfaces.

In highly regulated environments, once a standard is chosen and committed, the organization is auditable against it –it's a big deal.

You might not always have the luxury of a Cyber Security team (**also knowledgeable about ICS**) at your service.

- **1. Black box attack**: feeding all possible input combinations to the FPGA chip and registering the corresponding output states. Provides the potential to reverse-engineering the FPGA electronic design integrated into the chip.
- 2. **Read-back attack**: Reading the chip configuration via the JTAG interface used in most FPGAs for debugging and maintenance. Recently, FPGA vendors have improved the protection measures against unauthorized access to chip configuration.
- **3. Cloning attack**: in SRAM FPGA chips, a configuration file is stored in a non-volatile memory external to the FPGA chip. This may allow the retrieval of bit-streams while loading the configuration in the FPGA, and later to clone the stolen FPGA electronic design. The protection against this threat is encrypting bit-streams during their transmission from a non-volatile memory to the FPGA chip. Measures have been already implemented in most modern FPGAs to prevent this possibility.

The FPGA technology has certain **beneficial properties** for assuring cyber security:

- 1. Does not rely on a complex OS and therefore does not have dormant, unused functionalities that can be attacked. The FPGA chip just works deterministically through the calculations that it was programmed for in the application development process.
- 2. No known viruses and malware for VHDL code;
- **3**. FPGA-based devices have a simple and structured design: their V&V processes will more likely detect the presence of potential malicious designs;
- 4. Physical access to the FPGA chips is strictly controlled: code is located in a flash memory (on a separate chip) without offering any physical access for modification while in on-line operating mode.
- **5.** Programming and re-programming can only be done through a specific interface.

## An example: LCLS



## **Photon Beam Parameters**

| Photon Beam<br>Parameters   | Symbol                   | hard<br>x-rays | soft<br>x-rays | unit               |
|-----------------------------|--------------------------|----------------|----------------|--------------------|
| Fundamental wavelength      | $\lambda_r$              | 6.2-1.3        | 43.7-6.2       | Å                  |
| Photon energy               | ħω                       | 2000-<br>9600  | 285-<br>2000   | eV                 |
| Final linac $e^-$ energy    | $\gamma mc^2$            | 6.7-14.7       | 2.5-6.7        | GeV                |
| FEL 3-D gain length         | $L_G$                    | 3.3            | 1.5            | m                  |
| Photons per pulse           | $N_{\gamma}$             | 2              | 20             | 10 <sup>12</sup>   |
| Peak brightness             | $B_{pk}$                 | 20             | 0.3            | 10 <sup>32 §</sup> |
| Average brightness (120 Hz) | $\langle B \rangle$      | 160            | 8              | 10 <sup>20 §</sup> |
| SASE bandwidth (fwhm)       | $\Delta \omega / \omega$ | ~0.2-0.5       | ~0.2-1.0       | %                  |
| Final pulse duration (fwhm) | $\Delta 	au_{f}$         | 50-250         | 70-400         | fs                 |

## **Electron Beam Parameters**

| <b>Electron Beam Parameters</b>       |                                |         |         |    |
|---------------------------------------|--------------------------------|---------|---------|----|
| Bunch charge                          | Q                              | 0.15    | 0.25    | nC |
| Init. bunch length (rms)              | $\sigma_{z0}$                  | 0.65    | 0.65    | mm |
| Final bunch length (rms)              | $\sigma_{zf}$                  | 7       | 20      | μm |
| Final peak current                    | Ipk                            | 3.0     | 1.0     | kA |
| Proj. emittance (injector)            | YEx,y                          | 0.4-0.6 | 0.4-0.6 | μm |
| Slice emittance (injector)            | $\gamma \varepsilon^{s}_{x,y}$ | 0.4     | 0.4     | μm |
| Proj. emittance (undulator)           | $\gamma \varepsilon^{U}_{x,y}$ | 0.5-1.6 | 0.5-1.6 | μm |
| Single bunch rep. rate                | F                              | 120     | 120     | Hz |
| UV laser energy on cath.              | <i>u</i> <sub>l</sub>          | 25      | 25      | μJ |
| UV laser diam. on cath.               | 2 <i>R</i>                     | 1.2     | 1.2     | mm |
| e <sup>-</sup> energy stability (rms) | $\Delta E/E$                   | 0.04    | 0.07    | %  |
| $e^{-}x,y$ stability (rms)            | $x/\sigma_x$                   | 15, 10  | 25, 20  | %  |
| $e^{-}$ timing stability (rms)        | $\Delta t$                     | 50      | ?       | fs |
| Peak current stab. (rms)              | $\Delta I/I$                   | 10      | 6       | %  |
| Charge stability (rms)                | $\Delta Q/Q$                   | 2.5     | 2.5     | %  |
| FEL pulse energy stability            | $\Delta N/N$                   | <10     | <10     | %  |

Joint International Accelerator School on Beam Loss and Accelerator Protection / Nov 5-13, 2014

## **Schematic View**



Joint International Accelerator School on Beam Loss and Accelerator Protection / Nov 5-13, 2014



- Turn off or limit rate of electron beam when faults are detected to prevent damage to sensitive machine components;
- Protect Undulator permanent magnets from electron beam;
- Shut off beam (detect and mitigate) within one pulse at 120 Hz (i.e., 8.33 ms for LCLS-I, 90 µs for LCLS-II).

- Protection of laser heater system from injector laser;
- Protection of photon section from FEL X-rays;
- Allow fault conditions to set different maximum rates for each mitigation device;
- Automatic beam rate recovery (after a fault is corrected, beam rate raised to before fault value);
- Securely bypass faults.

# **MPS Inputs**

- Inputs from obstructions
  - Vacuum valves
  - Screens (OTRs, YAGs)
  - Beam stoppers
- And beam loss monitors
  - Toroids
  - Protection Ion Chambers (PICs)
  - Beam Loss Monitors (BLMs)

They generate the primary MPS beam loss signals

- Watchdog
- Currently monitoring ~2100 inputs

Joint International Accelerator School on Beam Loss and Accelerator Protection / Nov 5-13, 2014





- Waterflow Status
- Magnet Power Supply Status
- Temperatures
- In-beam Diagnostics Status
- Beam Position
- Beam Charge
- RF System Status
- Beam Containment Status
- Beam Loss Monitors

SL AO

- Laser Heater Mechanical Shutter
- Photocathode Laser Mechanical Shutter
- Gun Trigger Permit (was Pockels cells)
- Pre-Undulator Fast Kicker (BYKIK)

SLA

A star network consisting of two entities: Link Processor and Link-Nodes (interconnected over private GigbE network).

The MPS determines the maximum allowed beam rate by processing device fault input signals (from **Link Nodes**, input multiplexers) with a rate limiting algorithm (executed on **the Link Processor**).

## **Components**

### SLAC

#### Link Processor:

- Runs MPS algorithmMakes decisions based on sensor states
- Interfaces to timing system



#### Link-Node:

- Sensor signal collection point
- Drives mitigation devices
- Integrates sensor subsystems



## **Overview**

SLAC



## **Architecture**



Joint International Accelerator School on Beam Loss and Accelerator Protection / Nov 5-13, 2014

# **Full Architecture**



# **Link Node Functional Block Diagram**

-SLAC



SLAC

"Pizza Box" (NAD) with configurable board arrangement for configuration and control of loss monitor devices.

It contains:

- MPS "Engine" in Virtex-4 FPGA
  MPS Digital I/O
  Embedded Coldfire CPU
  Industry Pack (IP) bus interface
  GigE Interface

- USB Interface (dev & maintenance)
- It is configured in different "flavors": Standard (MPS Digital I/O Only) BLM (Undulator Beam Loss Monitor) PIC (Beam Loss Ion Chamber) BYKIK (Fast Kicker Magnet)
- Interfaces to on Board FPGAs •
- Configuration, Control of MPS loss monitor devices •

# Link Node - A NAD



Joint International Accelerator School on Beam Loss and Accelerator Protection / Nov 5-13, 2014

## Link Node – Cont'd





- Gathers device status from Link Nodes.;
- Determines maximum allowed beam rate at each mitigation device;
- Broadcasts "Permit" message to Link Nodes;
- Handles fault bypassing, latching;
- Logs machine state changes with MPS history servers;
- Can load/unload MPS logic at run-time.

# **MPS Link Processor – Cont'd**

- Motorola MVME 6100
  - VME64 single board computer
  - 1.267 GHz MPC7457 PowerPC
  - 2 GB DDR266 RAM
  - Two Gigabit Ethernet interfaces
  - Serial port for IOC console
- Micro-Research Finland PMC-EVR-200

SL AO

## **Link Processor Modules**









- 2. BYKIK;
- 3. BLM;

### 4. PIC.

SLAC



### Temperature Vacuum Profile Monitors Wire Scanners Photon System

Up to 96 Inputs
8 Outputs

8 Outputs



- Digital I/O Only
- Lin to 06 Innuts



1. Standard



Input

Board



## 2. BYKIK



# **3. BLM**

-SLAC



# 3. BLM – Cont'd



SLAC

Loss monitors with different types of sensitivity are deployed in different locations. The signal is gated to coincide with the beam arrival time and compared to a programmable threshold and will indicate a fault if exceeded.

Two programmable threshold settings are required:

- 1. Exceeding the first threshold can allow the beam rate to be lowered by the MPS;
- 2. Exceeding the higher threshold will cause the beam to be shut off and require the MPS to be manually reset.





# 4. PIC – Cont'd



# **OUTPUTS – Mitigation Devices**

- 1. BYKIK Pulsed Magnet
- 2. Injector Mechanical Shutter
- 3. Laser Heater Shutter
- 4. Gun LLRF PAC

SLAO

# 1. BYKIK



- AKA Single Bunch Beam Dumper (SBBD), can kick beam into the dump pulse-by-pulse;
- Positioned half way down machine;
- Can limit beam rate to any rate from 0 to 120 Hz while the Linac is kept at a constant rate;
- Allows the MPS to protect the Undulator region while beam tuning and configuring is performed upstream.



If the amplitude does not fall within a specified range at trigger time, the BYKIK has failed and the MPS uses the Gun RF and injector mechanical shutter to continue rate limiting at the correct rate.

The MPS continues to trigger the BYKIK after a BYKIK fault. If the BYKIK begins to operate correctly, the MPS returns to using the BYKIK as the mitigation device.

SLAC

Pockels Cell (old): Beam rate changes would change laser profile, due to thermal effects. Replaced by:

- Injector Mechanical Shutter
  - Slower mechanical device;
  - It inhibits the Injector UV laser before it hits the gun's cathode;
  - Not a pulsed device: either full rate light or o Hz;
  - Control verified with optical position sensors.

### It inhibits the IR Laser Heater Laser Light.

The Laser Heater Optical Transition Radiation (OTR) screens, which are used to align the IR laser and electron beams, can be damaged by normal operational intensities of the IR laser and require an attenuator to be inserted when observing the IR laser on the OTR screen.

-SLAC

The RF Gun PAC is triggered at 120 Hz: MPS will inhibit it to abort the beam, triggering at standby time and allowing the RF system to stay in thermal equilibrium.

MPS will use the RF Gun to limit beam rate from 120 Hz to 10 Hz.

During a O Hz MPS rate limit both the Gun LLRF PAC and the injector mechanical shutter will be used to inhibit the electron beam.

- EPICS communication uses MVME 6100's GbE interface;
- MPS/Link Node communication uses second GbE interface and a custom real-time protocol stack (RTS).
- RTS provides deterministic communication.
- UDP/IP protocols allow familiar development, easy testing, standard hardware and software tools.

- Arcturus uC5282's 100BASE-TX interface used for EPICS communication
  - Configuration of Link Node at startup
  - Device input thresholds and debounce times
- High-speed GbE MPS communication using FPGA's RocketIO/fiber SFP transceiver

# **Communication Methods**

- Link Processor sends Synchronization and Permit messages at 360 Hz;
- Status messages returned from Link Nodes upon receipt of sync messages;
- Link Processor faults all Link Node inputs of Link Nodes that do not provide response within 8.3 ms;
- Link Nodes stop beam at mitigation devices after 3 ms without a permit message.

The Link Processor logs all fault and status messages to an MPS History server application, which stores the messages in an Oracle database in real-time.

MPS messages are stored separately from the normal logging system so that no messages are lost; they are also forwarded to the normal message logging system so that they can be correlated to other logged events.

An MPS history viewer is available to the operators via the MPS GUI.

# **History Server**

SLAC

- Events logged:
  - Device state changed;
  - Beam rate changed;
  - Destination changed.
- History Servers notify Link Processor the their existence;
- Link Processor connects to the first available History Server;
- Link Processor automatically connects to next server available server if current server fails to receive message.

# **History GUI**

|     | Live Interactive \                           |                                                           |                   |
|-----|----------------------------------------------|-----------------------------------------------------------|-------------------|
| ge  | From 2009/07/01 16:21<br>To 2009/07/27 00:26 | Search Values (not full-text)                             | Text filte        |
|     |                                              |                                                           |                   |
| - 1 | Time<br>07/22 09:03:35                       | Message LN_COM RESPONSE_LN10 changed from 1 to 0          |                   |
| 1   | 07/22 09:03:35                               | BYKIK_30HZ_ABRT_TRIG MISSED changed from 0 to 1           |                   |
|     | 07/22 09:03:35                               | Beam destination changed from INVALID to LASER MITIGATION |                   |
| l   | 07/22 09:03:35                               | Beam destination changed from LASER MITIGATION to INVALID |                   |
| l   | 07/22 09:03:35                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 0 to 1           |                   |
| I   | 07/22 09:03:35                               | Beam destination changed from INVAUD to LASER MITIGATION  |                   |
| l   | 07/22 09:03:34                               | BYKIK, 30HZ_STBY_TRIG MISSED changed from 0 to 1          |                   |
|     | 07/22 09:03:34                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 0 to 1           |                   |
|     | 07/22 09:03:34                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 0 to 1           |                   |
|     | 07/22 09:03:34                               | BYKIK_30HZ_ABRT_TRIG MISSED changed from 1 to 0           |                   |
|     | 07/22 09:03:34                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 0 to 1           | Filtered          |
|     | 07/22 09:03:34                               | LN_COM RESPONSE_LN10 changed from 0 to 1                  |                   |
| l   | 07/22 09:03:34                               | BEND_IN20_751_OFF STATE changed from 0 to 1               | history<br>messag |
| I   | 07/22 09:03:34                               | VVPG_0670 STATE changed from 0 to 1                       | inecery           |
| I   | 07/22 09:03:34                               | DUMP_LI21_FLSW_OPEN STATE changed from 0 to 1             | messag            |
| 1   | 07/22 09:03:33                               | BYKJK_30HZ_ABRT_TRIG MISSED changed from 1 to 0           |                   |
| l   | 07/22 09:03:33                               | LN_COM RESPONSE_LN35 changed from 0 to 1                  |                   |
| I   | 07/22 09:03:33                               | Beam destination changed from LASER MITIGATION to INVALID |                   |
| l   | 07/22 09:03:33                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 0 to 1           |                   |
| l   | 07/22 09:03:33                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 1 to 0           |                   |
| 1   | 07/22 09:03:33                               | BYKIK_30HZ_ABRT_TRIG MISSED changed from 0 to 1           |                   |
|     | 07/22 09:03:32                               | BYKIK_30HZ_STBY_TRIG MISSED changed from 1 to 0           | _                 |

- MPS inputs configured using "MPS Database Editor.app";
- MPS logic created with "MPS Logic.app";
- Both applications store data as SQLite3 files;
- Data is portable and easy to access;
- Python modules export data in a variety of formats.

#### **Configuration Editor**

#### 00 MPS Configuration 2009-04-22.mps Q All Type Name PV Z Position Device Area Link Nodes Link Node Channel <sup>‡</sup> BLM\_UND1\_3321\_LOS... BLM:UND1:3321 0 UND1 Link Node Cards Link Node Channel ‡ FAST\_VLV\_OPEN\_LI28 VVFS:LI28:1 0 LI28 Link Node Channels Link Node Channel & BLM\_UND1\_1421\_LOS... BLM:UND1:1421 0 UND1 External PVs Link Node Channel ‡ FLT\_20\_6\_TRP KLYS:LI20:61 0 IN20 Devices 2.00 Undulator Link Node Channel \$ HVPS\_UND1\_2621\_HIGH\_HVPS:UND1:2621 Faults Link Node Channel # BLM\_UND1\_3021\_PED\_H BLM:UND1:3021 0 UND1 Racks Link Node Channel # BLM\_UND1\_3320\_LOS... BLM:UND1:3320 0 UND1 PV Components UND1 Link Node Channel # BLM\_UND1\_1821\_LOS... BLM:UND1:1821 0 Link Node Channel ‡ BLM\_2220\_B30L BLM:UND1:2221 1.00 Undulator External EPICS F... ‡ LION404 MCC MPS:MCC0:LION404 0 Link Node Channel <sup>‡</sup> BLM\_UND1\_221\_PED\_H\_\_BLM:UND1:221 UND1 0 Link Node Channel ‡ SLOW\_VLV\_OPEN\_LI30 0 LI30 VVPG:LI30:1 × Link Node Channel ‡ BYKIK THRESHOLD U... KICK:LTU0:320 3.00 LTU + -Device Area: UND1 4 Name: BLM\_UND1\_3021\_LOSS\_L PV Device BLM \$ Position X: 0.00 PV Area: UND1 \$ Position Y: 0.00 PV Position: 3021 Position Z: 0.00 Link N...hannel: 37 (B921-2225) BLM C... 🛟 Link Node Channel 📫 Type:

# **Configuration Editor**

#### Configuration file exported to .db, .edl, .stt, .h, .tex, and .csv using **Python module**

- .CSV -
  - LN and LP boot time configuration
- .db
  - LP EPICS records (five records per input)

#### mpsEpicsFaultInputs.stt

- LP state notation used at compile time - monitors EPICS fault inputs

#### MPSFaultNumbers.h -

-LP and MPS Logic compile time configuration

- .edl
  - EDM displays for users

#### - .tex, .pdf

- LaTeX documentation converted to PDF

#### - mpsdb.sqlite<sub>3</sub>

- Copy of configuration file for MPS GUI

#### - MPSDatabase.sql, pvlist.txt

- Dumped configuration file

- List of records in .db files







### **Logic Editor**

| 00                               | <u> 2</u> | Mps   | Logic.m  | pl          |              |        |   |       |   |          |
|----------------------------------|-----------|-------|----------|-------------|--------------|--------|---|-------|---|----------|
| Truth Tables                     |           |       |          |             |              |        | _ |       | _ |          |
| LION733 Threshold                |           |       |          |             |              |        |   |       |   |          |
| LION752 Threshold                |           |       |          |             |              |        |   |       |   |          |
| LION753 Threshold                |           |       |          |             |              |        |   |       |   |          |
| LION765 Threshold                | States:   | D     | Numb 🔺   |             | PC           | MS     |   | BYKIK |   | LHS      |
| LION766 Threshold                |           |       | 0        | Moving      |              |        |   |       |   | lgnore ; |
| Laser Heater Attenuator Is In    |           |       | 1        | Out         |              |        |   |       |   | Ignore ; |
| Laser Heater Attenuator Position |           |       | 2        | In          |              |        |   |       |   | Ignore 🕯 |
| Laser Heater Photodiode Shutter  |           |       | 3        | Broken      | 0 Hz         | ‡ 0 Hz | ÷ | 0 Hz  | ÷ | lgnore ; |
| Magnet In Tolerance: BYD1        |           |       |          |             |              |        |   |       |   |          |
| Magnet In Tolerance: QE31        |           |       |          |             |              |        |   |       |   |          |
| Magnet In Tolerance: QUM1        |           |       |          |             |              |        |   |       |   |          |
| Magnet In Tolerance: QUM2        |           |       |          |             |              |        |   |       |   |          |
| Magnet In Tolerance: QUM3        |           |       |          |             |              |        |   |       |   |          |
| Magnet In Tolerance: QUM4        |           |       |          |             |              |        |   |       | + | -        |
| Magnet Power: BYD1               |           | State | Informat | ion         |              |        |   |       |   |          |
| Magnet Power: QE31               |           |       |          |             |              |        |   |       |   |          |
| Magnet Power: QUM1               |           |       |          | Default     | State        |        |   |       |   |          |
| Magnet Power: QUM2               |           |       |          | •           |              |        |   |       |   |          |
| Magnet Power: QUM3               |           | Nu    | mber:    | 0           | Name:        | Moving | 3 |       |   |          |
| Magnet Power: QUM4               |           | So    | lution:  |             |              |        |   |       |   |          |
| Mechanical Shutter Position      |           |       |          |             |              |        |   |       |   |          |
| OTR BL237 Position               |           |       |          | Pockels     | s Cell Rate: | 0 Hz   |   |       |   | •        |
| OTR BL237 Position and TDUND     |           |       |          |             |              | 0 Hz   |   |       |   | •        |
| OTR BL291 Position               |           |       | Me       | chanical Sh | utter Rate:  | UHZ    |   |       |   |          |
| OTR BL291 Position and TDUND     |           |       |          | E           | YKIK Rate:   | 0 Hz   |   |       |   | •        |
| OTR BL541 Position               |           |       | 1.000    | r Heater Ch | utter Rate:  | laner  |   |       |   | •        |
| OTR BL541 Position and TDUND 🖡   |           |       | Lase     | r Heater Sh | utter kate:  | Ignore |   | _     | _ | •        |
| + -                              |           |       |          |             |              |        |   |       |   |          |
|                                  | Constant. |       |          |             |              |        |   |       |   |          |
| Truth Table                      | Condition | s   D | estinati | ons   Con   | tacts Dev    | vices  |   |       |   |          |



#### Logic file exported to .h, .tex, .txt, .pdf and .csv using Python module

- Algorithm.h
  - Loadable and unloaded from LP at run time Built against a specific MPS Input configuration
  - Logic Table Definitions
  - Can contain C specific logic for more complex logic
- \*.pdf, \*.txt
  - Logic documentation



#### **MPS Global Panel**



Joint International Accelerator School on Beam Loss and Accelerator Protection / Nov 5-13, 2014

- Java-based

Truth Table

Flowswitch BSY RAD LCW Slit 10/30

Mechanical Shutter

0

N/

120.0

Ηz

А

12

State

BYKIK

120.0

Mech Shutter

Gun RF Permit

0 Hz

MPS Rate Limits

Requested Beam Rates (slow estimates, not the same as SCP!)

Actual Beam Rates (slow estimates, not the same as SCP!)

Min Rate

Some logic is being masked because A-Line B1/B2 Is Not On AND A-Line Kicker Is Not Enabled, A-Line SL10 Is Closed, BX01/BX02 Is On, BXG Is Off, M3H Mirror Is Out of Beam, Stopper HFP-MPA-02 Is
Summary State History

Currently showing states 1 of 1

10/17 21:00:57

Current State

Ok

- Main user lfc in control room

Main Fault UI

Gun RF Permit

120 Hz

120.0

120.0

Name

Bypassed Faults

Exp Date

12/09 11:38:00

Summary / Faults / Logic / Ignore Logic / History

SLAC

Current

Current rate

limiting truth

History slider

rates

tables

Active

bypasses

Laser Heater

120 Hz

N/A

120.0

Heater Shutter

Show Live

-> Log Book

Bypass..

Flowswitch BSY RAD LCW Slit 10/30 🔻

BYKIK

#### **User Interface - Logic**

| blanca                                                                               | 0.44              | X MpsGui Beta [Built 2009          |              | March Chuller  | BADAIN | United During  |
|--------------------------------------------------------------------------------------|-------------------|------------------------------------|--------------|----------------|--------|----------------|
| Name                                                                                 | State             | Min Rate                           | Pockels Cell | Mech Shutter   | BYKJK  | Heater Shutter |
| MPS Beam Permit: Mechanical Shutter                                                  | Not Permitted     | 0 Hz                               |              | 0 Hz           |        |                |
| TDUND Position                                                                       | in                | 10 Hz                              |              |                | 10 Hz  |                |
| BLM/PIC Watchdog: Upstream of TDUND                                                  | OK                | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz |                |
| Bypass Recover                                                                       | OK                | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz | 120 Hz         |
| Laser Heater Photodiode Shutter Position                                             | In                | 120 Hz                             |              |                |        | 120 Hz         |
| MPS Beam Permit: BYKJK                                                               | Permitted         | 120 Hz                             |              |                | 120 Hz |                |
| MPS Beam Permit: Heater Shutter                                                      | Permitted         | 120 Hz                             |              |                |        | 120 Hz         |
| MPS Beam Permit: Pockels Cell                                                        | Permitted         | 120 Hz                             | 120 Hz       |                |        |                |
| OTR BL237 Position                                                                   | Out               | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz |                |
| OTR BL291 Position                                                                   | Out               | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz |                |
| OTR BL541 Position                                                                   | Out               | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz |                |
| OTR BL571 Position                                                                   | Out               | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz |                |
| OTR BL621 Position                                                                   | Out               | 120 Hz                             | 120 Hz       | 120 Hz         | 120 Hz |                |
|                                                                                      |                   | 07/26 00:21:21<br>Selection Detail |              |                |        |                |
| Name TDUND Position<br>Current State TDUND Position<br>Solution                      |                   |                                    |              |                |        |                |
| Current State TDUND Position<br>Solution                                             | Min Pate          | Packels Call                       | Mash Shuttar | PVVIV          |        | lastar Chuttar |
| Current State TDUND Position                                                         | Min Rate          | Pockels Cell                       | Mech Shutter | ВҮКІК<br>10 Нz | н      | eater Shutter  |
| Current State TDUND Position Solution Truth Table In Out                             | 10 Hz             |                                    |              | 10 Hz          | н      |                |
| Current State TDUND Position<br>Solution<br>Truth Table State<br>In<br>Out<br>Broken | 10 Hz<br><br>0 Hz | <br><br>0 Hz                       | <br>0 Hz     | 10 Hz          | н      |                |
| Current State TDUND Position Solution Truth Table In Out                             | 10 Hz             |                                    |              | 10 Hz          | н      |                |

All truth tables and states

SLAC

#### History slider

# Selected truth table details

#### **User Interface - Inputs**



Device faults can be bypassed via an EPICS display by selecting a fault, choosing its bypass state, and supplying a bypass duration.

For example, an operator can choose to bypass a flow switch for one day by selecting the flow switch input, selecting its OK state, and giving a bypass duration of 24 hours.

All bypasses are logged and automatically timed by the MPS system. The operator is alerted when the bypass time is reached, forcing the operator to re-evaluate bypasses.

# Just for fun...

**Power Supply Failure** 

• Commercial module (internal flaw)

Ion Chamber noise pickup

- New electroncs 100x more sensitive than old
- Fixed by shielding

Radiated EMI from Link-Node

• Getting into Cell Phone repeater in Ctrl Room Bldg

# Pockels Cell rate limit caused laser beam profile changes $\rightarrow$ affected emittance



Timing system is very flexible, and easily allows trigger timing to be moved away from beam time.

BLMs and BYKIK rely on beam time triggers to detect beam loss and kick beam.

Lots of misfiring.

# **Timing verification**

- Solution 1:
  - EVR trigger is injected to QADC analog input via directional coupler;
  - If EVR is not seen by QADC, faults LN virtual input.
- Solution 2:
  - Use BPMs and toroids to detect beam time;
  - Compare BPM and toroid beam time with timing system's beam time triggers;
  - Output OK status to Link Node digital input while trigger timing is correct.

## **Timing Verification Chassis**

-SLAC



#### **Coming soon: LCLS-II**



Similar to LCLS-I, except:

• LCLS-II beam loss must not interfere with LCLS-I MPS (and vice-versa);

SLAO

- LCLS-II MPS must be capable of independently rate-limiting the beam in either of its two undulator beam lines;
- LCLS-II MPS is a stand-alone system separate from LCLS-I MPS, such that outages in one system do not effect the other;
- There will be two TDUND stoppers in LCLS-II. MPS must suppress faults in the two undulator beam lines according to which TDUND stopper is in;
- MPS must monitor an additional pulsed magnet in LCLS-II that switches beam between two undulator beam lines.





Special thanks to

#### **Matt Boyes**

Of SLAC, for his invaluable contribution and the stimulating conversations.





## Thank you!

# I hope you enjoyed it!