# Experience from using SRAM based FPGAs in the ALICE TPC Detector and Future Plans Johan Alme (johan.alme@hib.no) – for the ALICE TPC Collaboration FPGA workshop, CERN 21.03.2014 ### The ALICE detector ### ALICE TPC sub-detector - A gasfilled barrel with an electric field and a magnetic field. - > Particles from the collision reacts with the gas releasing electrons - Electrons drift in the fields and hit a **Multi Wire Proportional Chamber** by the end-plates. - ~600 000 charge sensitive pads are mounted on the endplates of the barrel. - Electric pulses are handled by complex readout Electronics located directly behind the padplane. - > Close to the *LHC* beam interaction point. - Radiation is a major problem! Low multiplicity event from Run 1 High multiplicity event is completely crowded Run 2 will have even *higher* multiplicity ### **ALICE TPC** ### ALICE TPC Readout Electronics in Numbers 1000 samples/event (10bit) 4356 \* 128 channels 700MByte/event 200 Hz/1kHz eventrate 142-710 GByte/s (Raw) Data compression: 5-20 Gbyte/s (~x30) In total: 5220 FPGAs & 35064 ASICs - 216 Readout Control Units - > Read data from Front End Card - Sends data over fiber for analysis and permanent storage - > 2 SRAM based FPGAs & 2 flashbased FPGAs per system - Front End Cards - Amplifies/shapes the signal - Analog to digital conversion - > Digital signalprocessing - > Buffer data ### RCU main FPGA - The RCU main FPGA sits in the datapath - Data readout is handled by the Readout Node - > 92% CLBs - > 75% BRAM blocks (Remaining 25% BRAM can not be used due to the Active Partial Reconfiguration) - > Result: TMR or any other mitigation techniques are not applicable # Reconfiguration Network (I) - Consists of: - > A radiation tolerant **flash memory**, a radiation tolerant **flash based FPGA** and **the DCS board** an Embedded PC with Linux. - Corrects SEUs in the configuration memory of the Xilinx Virtex-II pro vp7 - Why it works: - Active Partial Reconfiguration - How it works: - > RCU support FPGA reads one frame at the time from the flash memory and Xilinx configuration memory. - > The frames are compared bit by bit. - > If a difference is found, the faulty frame is overwritten. - Keyword: Flexibility # Reconfiguration Network (II) - When an SEU is detected it is automatically counted - This has given us valuable information on the radiation environment - •The *Fault Injection Technique* has been used in a *lab environment* in order to predict the functional failure rate with different run conditions. ### SEU measurement results from Run 1 - Figures show integrated Luminosity vs. SEU count for four different run periods - •All RCUs are included - A clear linear dependence can be seen. # Cumulative SEU for 2011 pp √s=7 TeV ### Radial and Sectorial Distribution - Higher SEU count nearer to the interaction point. - Data from 2011 pp √s=7 TeV # What is Fault Injection? - In context of FPGA design: - > Fault injection means injecting bitflips in the configuration memory of the FPGA. - Purpose: Simulation of radiation related effects. ### Pros: - > Low cost - Simple - Great tool to heighten radiation tolerance during development phase ### Cons - Sensitivity of the technology is not possible to measure - A systematic test including all possible bitlocations takes time - Not all elements in the FPGA can be tested. # Purpose of the ALICE TPC RCU Fault Injection Test - Estimate the radiation sensitivity of the RCU main FPGA design - Estimate an expected rate of functional failures in the RCU main FPGA as a function of integrated luminocity - Two categories of functional failures are recognized: - Reliability faults: System crashes leading to a stop in the data taking for the complete ALICE detector. - Performance faults: Errors in the datastream from the RCU experiencing an SEU. ### Fault Injection Results (I) | Type of Error | Total # Faults | Fault/SEU[%] | SEUPI<br>[SEUs/fault] | |----------------|----------------|--------------|-----------------------| | All | 10341 | 5,02 | ~19,9 | | Reliability | 2210 | 1,07 | ~93,5 | | Performance | 8131 | 3,94 | ~25,4 | | Loss of data | 2499 | 1,21 | ~82,6 | | Data corrupted | 5632 | 2,73 | ~36,6 | - Number of bitflips injected: 206151 - > Coverage 6.5% - Xilinx conservative estimate: SEUPI = 10 SEUs/fault - > Result is in the expected range - SEUPI<sub>Reliability faults</sub> = ~93.5 SEUs/fault - Most functional faults are not critical for the operation of the ALICE detector! Plot shows bitflips leading to observable performance faults. # Fault Injection Results (II) - Equal distribution for each fault type. - 53 SEUs gives\*: - >>90% risk to get any functional failure - >>15% risk to get a reliability fault \*Run 2010 (09. Aug 2011): Integrated Luminocity 107.816 nb<sup>-1</sup> $\rightarrow$ Number of SEUs = 107.816 nb<sup>-1</sup> \* 0.49 SEUs/nb<sup>-1</sup> = 52.8 SEUs # Conclusion – SEU analysis Run 1 - 2011 Pb-Pb run: - > Peak Luminocity: 300-400 Hz/b (300 400 x 10<sup>24</sup> cm<sup>-2</sup>s<sup>-1</sup>) - >~5 SEUs/h for all 216 FPGAs - Run 2 scenario\*: - > Peak luminosity 1 4 x 10<sup>27</sup> cm<sup>-2</sup>s<sup>-1</sup> - >8 30 kHz interaction rate - >~45 SEUs/h for all 216 FPGAs - >~0.5 Reliability faults/h - Run 3 Scenario\*: - Assuming an interaction rate of 50kHz and a 30% multiplicity increase - >~100 SEUs/h for all 216 FPGAs - >~1 Reliability fault/h - Clearly an upgrade is needed - > It does not rule out SRAM based FPGAs - BUT we would need a larger device with room for mitigation techniques \*Assuming no upgrades to the Electronics # ALICE TPC Run 2 Upgrades We make a «simple» Upgrade! - Splits a «slow» parallell bus: - > Doubles the speed! - Upgrades the RCU -> RCU2 - New «state of the art» System on Chip FPGA – Microsemi smartFusion2 - > Faster, bigger, better in radiation! - First flashbased FPGA with SERDES ### RCU2 — The ALICE TPC readout electronics consolidation for Run2 #### Journal of Instrumentation > Volume 8 > December 2013 J Alme et al 2013 JINST 8 C12032 doi:10.1088/1748-0221/8/12/C12032 #### RCU2 — The ALICE TPC readout electronics consolidation for Run2 #### OPEN ACCESS TOPICAL WORKSHOP ON ELECTRONICS FOR PARTICLE PHYSICS 2013 (TWEPP-13) J Alme<sup>a</sup>, T Alti, L Bratrud', P Christianseni, F Costa<sup>a</sup>, E Davidh, T Gunjik, T Kissh, R Langøy', J Lien', C Lippmann<sup>a</sup>, A Oskarssoni, A Ur Rehman<sup>a</sup>, K Røed<sup>b</sup>, D Röhrich<sup>c</sup>, A Tarantola<sup>l</sup>, C Torgersen<sup>c</sup>, I Nikolai Torsvik<sup>c</sup>, K Ullaland<sup>c</sup>, A Velure<sup>c</sup>, S Yang<sup>c</sup>, C Zhao<sup>b</sup>, H #### Hide affiliations - Bergen University College, P.O. Box 7030, NO-5020 Bergen, Norway - University of Oslo, P.O. Box 1048, Blindern, NO-0316 Oslo, Norway - University of Bergen, P.O. Box 7800, NO-5020 Bergen, Norway - GSI Helmholtzzentrum für Schwerionenforschung, Planckstr. 1, D-64291 Darmstadt, Germany - CERN, CH-1211, Genève 23, Switzerland - Vestfold University College, Postboks 2243, NO-3103 TØnsberg, Norway - COMSATS Institute of Information Technology, Park Road, Chak Shahzad, Islamabad, Pakistan - Cerntech, Petzvál J. u. 44, H-1119 Budapest, Hungary - University of Lund, Box 117, 221 00 LUND, Sweden - Goethe University Frankfurt, Senckenberganlage 31, 60325 Frankfurt am Main, Germany - K University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-0033, Japan ### Microsemi smartFusion2 – The Perfect Choice? - The Microsemi smartFusion2 is a brand new device. - Only enginering sampled released when we started to use it. - A few surprises were encountered with this new device. - Especially given our positive experience with Actel/Microsemi devices # SmartFusion2 Family Reliability - Single Event Upset (SEU) Immune - Zero FIT FPGA Configuration Cells - Junction Temperature: 125°C Military Temperature, 100°C – Industrial Temperature, 85°C – Commercial Temperature - Single Error Correct Double Error Detect (SECDED) Protection on the Following: - Ethernet Buffers - CAN Message Buffers - Cortex-M3 Embedded Scratch Pad Memory (eSRAMs) - USB Buffers - PCIe Buffer - DDR Memory Controllers with Optional SECDED Modes - Buffers Implemented with SEU Resistant Latches on the Following: - DDR Bridges (MSS, MDDR, FDDR) - Instruction Cache - MMUART FIFOs - SPI FIFOs - NVM Integrity Check at Power-Up and On-Demand - No External Configuration Memory Required—Instant-On, Retains Configuration When Powered Off #### Security · Design Security Features (Available on all Devices) - Intellectual Property (IP) Protection Through Unique Security Features and Use Models New to the PLD Industry - Encrypted User Key and Bitstream Loading. Enabling Programming in Less-Trusted Locations - Supply-Chain Assurance Device Certificate - Enhanced Anti-Tamper Features - Zeroization #### Low Power - Low Static and Dynamic Power - Flash\*Freeze Mode for Fabric Up to 50% lower total power than competing SoC devices. #### High-Performance FPGA - Efficient 4-Input LUTs with Carry Chains for High-Performance and Low Power - Up to 236 Blocks of Dual-Port 18 Kbit SRAM (Large SRAM) with 400 MHz Synchronous Performance (512 x 36, 512 x 32, 1 kbit x 18, 1 kbit x 16, 2 kbit x 9, 2 kbit x 8, 4 kbit x 4, 8 kbit x 2, or 16 kbit x 1) - Up to 240 Blocks of Three-Port 1 Kbit SRAM with 2 Read Ports and 1 Write Port (micro SRAM) - High-Performance DSP Signal Processing - Up to 240 Fast Mathblocks with 18 x 18 Signed Multiplication, 17 x 17 Unsigned Multiplication and 44-Bit Accumulator LICENSED DPA ### Monitoring of Radiation Levels - On the present RCU we have the Reconfiguration Network acting as a radiation monitor - This is an interesting feature to keep for the RCU2: - Additional SRAM memory and Microsemi proASIC3 250 added to the RCU2 - Not enough user-IOs on the smartFusion2 for this feature - >Low risk design already done and proven\* - >Cypress SRAM same as used for the latest LHC RadMon devices - Extensively characterized in various beams (n,p,mixed) and compared/benchmarked to FLUKA MC simulations by the CERN EN/STI group <sup>\*</sup> Arild Velure "Design, implementation and testing of SRAM based neutron detectors", Master Thesis 2011 # Irradiation test plans for RCU2 smartFusion2 (I) - Oslo Cyclotron (~30 MeV protons) April - > Preparation of High Energy test at TSL - Single Event Upset Rate in SRAM and at register level - > Single Event Transients - This has been reported to be a potential problem in Flash based devices with increasing clock frequency\* - Single Event Latchup - We have been informed that Microsemi have observed non-destructive SELs at a low rate - > PLL stability ### Irradiation test plans for RCU2 smartFusion2 (II) - •TSL Uppsala (180 MeV Protons) May - >Full system test executing all interfaces - Ethernet - SERDES - etc - Redo the Oslo tests with the higher energy. TSL Uppsala appartement in 2005 # Conclusion – Run 2 Upgrades - We are in an exiting period. - The outcome of the forthcoming irradiation campaigns are very important - Earlier experiences with Actel/Microsemi are very good in our radiation environment - > We hope that the smartFusion2 will live up to our expectations # ALICE TPC - Run3 Upgrades - In Run2 only parts of the Electronics are upgraded - In Run3 EVERYTHING will be changed - Only the empty TPC barrell is left - → MWPC → GEM - Run3 demands faster and more radiation tolerant Electronics - High energy hadron flux: ~3.4 kHz/cm² - > Dose < 2.1 krad - → 1 MeV neutron equivalent fluence: 3.4 x 10<sup>11</sup> cm<sup>-2</sup> - •The dose and the 1 MeV n.eq. are not at worrying levels ALICE TPC Technical Design Report: https://cds.cern.ch/record/1622286/ LTU ### Run3 – Planned Readout Electronics - Conservative approach No FPGAs in the radiation zone - SAMPA new Radiation tolerant (in our environment) ASIC! - GBTx & Versatile Link Radiation Tolerent ASIC & high speed optical links \* - Common Readout Unit – First component with FPGA – NOT in radiation zone! <sup>\*</sup> https://espace.cern.ch/GBT-Project/default.aspx ALICE TPC Technical Design Report: https://cds.cern.ch/record/1622286/ ### Thank you! K. Røed, J. Alme, and D. Fehlker et al., First measurement of single event upsets in the readout control FPGA of the ALICE TPC detector, Journal of Instrumentation, vol. 6, no. 12, p. C12022, 2011 J. Alme, D. Fehlker, and C. Lippmann et al., Radiation tolerance studies using fault injection on the Readout Control FPGA design of the ALICE TPC detector, Journal of Instrumentation, vol. 8, p. C01053, 2013 K. Røed, J. Alme et al. Single event upsets in the readout control FPGA of the ALICE TPC detector during the first LHC running period, Poster presentation TWEPP 2013 J Alme et al RCU2 — The ALICE TPC readout electronics consolidation for Run2, Journal of Instrumentation, Vol. 8, p. C12032, 2013