# Mezzanine Cards for the EMU CSC System Upgrade at the CMS

## M.Matveev, P.Padley

Rice University, Houston, TX 77005, USA matveev@rice.edu

#### Abstract

In this paper we discuss two ideas related to the design and application of mezzanine cards in the Endcap Muon (EMU) Cathode Strip Chamber (CSC) electronic system at the CMS experiment at CERN. The first is a proposal to upgrade the FPGA-based mezzanines using the most advanced Xilinx Virtex-5 family of FPGA. The second is related to design of a simple and compact mezzanine card with a commercial serializer/deserializer (SERDES) device and industry standard pluggable optical or copper transceiver module. Such a card could be a basic element of the general purpose gigabit data transmission link.

### I. INTRODUCTION

The CSC detector comprises 468 six-layer multi-wire proportional chambers arranged in four stations in the Endcap regions of the CMS [1]. Wires run azimuthally and define track's radial coordinate. Strips are milled on cathode panels and run lengthwise at constant width. The goal of the CSC system is to provide muon identification, triggering and momentum measurement.

The numbers of cathode and anode channels in the CSC system are 218K and 183K correspondingly [1]. There are almost 15,000 electronic boards with approximately 5,000 Xilinx FPGA [2] in the entire CSC system. More than 1,000 FPGA are mounted on small mezzanine cards that have been produced and installed on host boards of five types. The host boards reside directly on the chambers, in 9U crates on the periphery of the return yokes and in the Track Finder crate in the underground counting room. The mezzanine approach allows us to independently design, develop and upgrade the FPGA-based processing logic while preserving the host board interface part. The present electronic system is based on mature Virtex-E and Virtex-2 technologies. The new and most advanced family of Xilinx FPGA, the Virtex-5, offers several advantages over previous generations. We have targeted two of our existing FPGA projects, the Muon Port Card (MPC) and the Muon Sorter (MS), to the Virtex-5 XC5VLX family of FPGA.

The mezzanine approach can also be applied to data transmission links. The main parts of a typical serial digital link include the serializer (SER) and optical or copper transmitter on a transmission end and the optical or copper receiver and deserializer (DES) on a receiver end. In many cases the SER and DES functions are combined in a single SERDES device. Optical modules are typically transceivers; among industry standards in the range from 1Gbps to 4Gbps the most popular is the Small Form-factor Pluggable (SFP) standard [3]. In addition to optical, copper SFP modules (either passive or active) are also available. The idea of combining a SERDES device and a pluggable transceiver on a mezzanine card is not new, but existing implementations

usually require relatively large space. Since both the Texas Instruments TLK family of SERDES devices [4] and the SFP standard have significant potential for future projects, including the CMS upgrade, we have decided to build a simple, small and inexpensive mezzanine card using these components. We describe this mezzanine card in detail in the paper.

#### II. CSC ELECTRONIC SYSTEM

The CSC electronic system consists of: (1) on-chamber anode and cathode front-end boards (AFEB and CFEB); (2) Trigger and DAQ boards in sixty 9U crates on the periphery of the return yoke of CMS; and (3) one Track Finder (TF) and four Front-End Driver (FED) 9U crates located in the underground counting room (Fig.1).



Figure 1: EMU CSC Electronic System

The Level 1 CSC Trigger Electronics provides four trigger candidates to the CMS Muon Trigger within 80 bunch crossing (2 us) latency.

There are three types of electronic boards mounted on each chamber: Anode Front End Boards (AFEB), Cathode Front End Boards (CFEB), and one Anode Local Charged Track (ALCT) board. The AFEBs (12..42 per chamber, depending on chamber size) amplify and discriminate the anode signals. The CFEBs (4 or 5 per chamber) amplify, shape and digitise the strip charge signals. The anode patterns provide more precise timing information than the cathode signals, and also provide coarse radial position and angle of passing particle for the trigger chain. The FPGA-based processing unit in the ALCT searches for patterns of hits in six planes that would be consistent with muon tracks originating from the interaction point. The patterns are considered valid, if hits from at least four planes are present in the pattern.

Two valid anode patterns, or ALCT's, are sent to the Trigger Motherboard (TMB). Based on comparator half-strip hits sent from CFEBs, the TMB searches for two patterns of hits from at least four planes and then matches these two CLCT patterns with two ALCT ones, making a correlated two-dimensional LCT.

Up to nine TMBs, in pairs with Data Acquisition Motherboards (DMB), one Clock and Control Board (CCB), and one MPC reside in the peripheral crates mounted along with the outer rim of the endcap iron disks. Every bunch crossing, the MPC receives up to 18 LCTs from 9 TMB boards, sorts them and sends the three best selected track stubs via optical links to the Sector Processor (SP) residing in the TF crate in the underground counting room.

The TF comprises 12 SP boards, the MS and the CCB. Each SP receives 15 data streams with trigger primitives from five MPCs and performs track reconstruction. The decision, three selected tracks, is sent to the MS over custom backplane. The MS sorts out 36 incoming tracks and selects the four best ones and transmits them over copper links to the Global Muon Trigger receiver in the Global Trigger crate.

The four Front End Driver (FED) crates include 36 Detector Dependent Unit (DDU) boards and 4 Data Concentrator Cards (DCC). They assemble the data from all the 468 DMBs for transfer to the main CMS DAQ system as well as to local DAQ farm for real time monitoring.

### III. UPGRADE OF THE FPGA MEZZANINE CARDS

The design and construction of the CSC Trigger electronic system was a collaborative project lasting approximately 10 years from 1997 to 2006 with participants from several US universities, Fermilab, CERN, and PNPI (St. Petersburg, Russia). Given stringent requirements on latency and elaborate track reconstruction algorithms, it was decided from the very beginning to build a flexible Trigger and DAQ system based on programmable logic devices. The Xilinx family of Virtex FPGA devices was chosen as the most advanced in the industry (Table 1). It was also proposed to put the FPGAs on relatively small mezzanine boards to allow independent development and future upgrades of the FPGA-based processing logic.

Table 1: Evolution of the Xilinx Family of Virtex FPGA

| Xilinx Family | Virtex | Virtex-E | Virtex-2 | Virtex-4 | Virtex-5 |
|---------------|--------|----------|----------|----------|----------|
| Year          | 1999   | 2000     | 2001     | 2004     | 2006     |
| Technology    | 220 nm | 180 nm   | 130 nm   | 90 nm    | 65 nm    |
| Core power    | 2.5V   | 1.8V     | 1.5V     | 1.2V     | 1.0V     |

The two main requirements for the FPGA and its mezzanine card are defined by: (1) the amount of logical resources (configurable blocks, memory) for a given functionality and (2) the required number of input/output (i/o) pins. It was estimated that for the ALCT, TMB and MPC boards the number of i/o of less than 500 is sufficient, while the numbers of i/o for the SP and MS are very similar and significantly higher, close to 750. So, in 2001-2002, when the mezzanine idea was adopted, it was decided to build two custom mezzanine cards: one (108x104 mm) for the ALCT, TMB and MPC, and another one, (140x80 mm), with more i/o, for both the SP and the MS. The same family of highdensity 4-row Samtec 100-, 140-, 160- and 200-pin connectors was chosen for both mezzanines, with sockets installed on the motherboard, and shrouded pins on the mezzanine.

For the first mezzanine, the Xilinx XCV600E/1000E pin compatible FPGAs were selected. While the design of all motherboards continued to evolve (all the host boards undergone typically two or three revisions), it became clear that even the XCV1000E FPGA does not have enough logical resources for the TMB functionality. So the second version of the mechanically compatible mezzanine based on XC2V4000-5FF1152 FPGA was built specifically for the production TMB2005 board. Due to lower power voltages required by the Virtex-2 FPGA, this mezzanine, however, is electrically incompatible with the initial version. Both versions were designed and built at the UCLA and PNPI.

Another mezzanine for the TF boards was designed at the University of Florida (Gainesville) and PNPI. It is based on the same XC2V4000-5FF1152 FPGA, but has six connectors to provide more i/o connections with the host boards. All three boards are shown in Fig.2. In addition to FPGA (that resides on the top side for the TF mezzanine and on the bottom side for another two boards) each board carries 3, 4 or 5 PROMs of the XC18V04 type.

| SP05, MS2005     | TMB2005          | MPC2004, ALCT                    |
|------------------|------------------|----------------------------------|
| XC2V4000-5FF1152 | XC2V4000-5FF1152 | XCV600E/1000E-<br>FG680 (-7; -8) |
| BPOZ HC          |                  |                                  |

Figure 2: FPGA Mezzanines for the CSC Trigger Boards

The FPGA configuration mode is set to "SelectMAP" which provides a parallel 8-bit path between the FPGA and EPROM and the fastest reconfiguration time (25 ms for XCV600E, 40 ms for XCV1000E, 100 ms for XC2V4000).

All the mezzanines have four large mounting holes. The ALCT/MPC/TMB mezzanines are 63 mil thick printed circuit boards and have an additional thick metal plane on the bottom side for rigidity. The TF mezzanine is 93 mil thick.

#### A. Advantages and Limitations of Virtex-5

Virtex-5 [3], the most recent addition to Virtex family of Xilinx FPGA, has several advantages over previous generations of Virtex-2 and Virtex-4, including better performance due to advanced 65 nm technology, more flexible basic slice that contains four LUT and four flip-flops (previously it was a slice with two LUT and two flip-flops), better clocking routing, more embedded memory, ability to detect Single Event Upsets (SEU) and correct single errors, and, potentially, shorter configuration time from the PROM.

Virtex-5 comprises four sub-families: general purpose LX, serial connection oriented LXT, signal processing oriented SXT, and embedded applications oriented FXT. Out of these four sub-families, the general purpose LX is the most suitable

for our applications. Among the disadvantages of the Virtex-5 compared to the Virtex-2 are fewer package options and reduced (on average) number of i/o pins (Table 2). For example, for the XC2V4000 FPGA there are two packages available, FF1152 and FF1517, with 824 and 912 i/o pins correspondingly. For a comparable Virtex-5 FPGA, the XC5VLX110, two large packages, FF1153 and FF1760 are available as well, but the maximum number of i/o is only 800 for both. This limitation is not critical for the ALCT, TMB and MPC boards, but could be important for the SP and MS.

Table 2: Selected package options for Virtex-2 and Virtex-5

| Package                     | 27 x 27 mm<br>FG676/FF676 | 35 x 35 mm<br>FF1152/FF1153 |
|-----------------------------|---------------------------|-----------------------------|
| Virtex-2 XC2V1500/2000/3000 | 392/456/484               |                             |
| Virtex-5 XC5VLX30/50/85/110 | 400/440/440/440           |                             |
| Virtex-2 XC2V4000/6000/8000 |                           | 824/824/824                 |
| Virtex-5 XC5VLX50/85/110    |                           | 560/560/800                 |

#### B. FPGA Choice for the New Mezzanine

Based on requirements listed above, for the new TF mezzanine the number of available i/o should be in order of 800. 7 members of the LX family of Virtex-5 FPGA are offered in 4 packages [2], and either FF1153, or FF1760 meets our requirement. The FF1760 package would be best due to pin compatibility of the three largest devices (XC5VLX110, XC5VLX220, XC5VLX330), but its layout, obviously, is more challenging. The XC5VLX110 chip in the FF1153 package is the second option.

For resource estimate the most recent Muon Sorter project was targeted to the XC5VLX110 FPGA and compiled using the Xilinx ISE 9.1 development system. A comparison with the XC2V4000 FPGA is shown in Table 3. As one can see, the resource usage is 30..50% lower for the XC5VLX110 FPGA while the performance is about the same for the speed grade -1 (slowest) device and ~48% higher for the middle grade -2 device. So, the XC5VLX110 FPGA seems to be the optimal solution, unless a significant increase in design functionality and resource usage is expected.

Table 3: Results of simulation, Muon Sorter

| Target FPGA                            | XC2V4000-5FF1152  | XC5VLX110-FF1153       |
|----------------------------------------|-------------------|------------------------|
| Number of occupied slices              | 13,558/23,040=58% | 6,734/17,280=38%       |
| Number of slice flip-flops used        | 11,605/46,080=25% | 12,066/69,120=17%      |
| Number of slice LUTs used              | 17,226/46,080=37% | 15,365/69,120=22%      |
| Number of Block RAM used               | 108/120=90%       | 54/128=42%             |
| Total equivalent gate count for design | 7,331,787         | 7,330,278              |
| Maximum design performance, MHz        | 42.13             | 62.51 (speed grade -2) |
|                                        |                   | 43.51 (speed grade -1) |
| Version of Xilinx ISE software         | 6.2.03i           | 9.1.03i                |

For the ALCT, MPC and TMB the number of i/o pins is in order of 500, so the FF1153 package is the most suitable. Three lower end pin compatible devices in the family, the XC5VLX50, XC5VLX85 and XC5VLX110 have 560, 560 and 800 i/o pins respectively.

For resource evaluation the Muon Port Card project was targeted to the XC5VLX50 FPGA and compiled using the Xilinx ISE 9.1 development system. A comparison with the XCV600E-8FG680C FPGA that is being used on the present MPC mezzanine is shown in Table 4. As we can see, the resource usage is lower for the XC2VLX50 while the performance is ~47% higher for the speed grade –1 (slowest) device and ~66% higher for the grade -2 device. So, the XC5VLX50 or XC5VLX110 FPGA seem to be the optimal solution, unless a significant increase in design functionality and resource usage is expected.

Table 4: Results of simulation, Muon Port Card

| Target FPGA                            | XCV600E-8FG680   | XC5VLX50-FF1153                                  |
|----------------------------------------|------------------|--------------------------------------------------|
| Number of occupied slices              | 5,407/6,912=78%  | 3,664/7,220=50%                                  |
| Number of slice flip-flops used        | 5,439/13,824=39% | 8,123/28,800=28%                                 |
| Number of slice LUTs used              | 6,128/13,824=44% | 7,438/28,800=25%                                 |
| Number of Block RAM used               | 42/72=58%        | 21/48=43%                                        |
| Total equivalent gate count for design | 788,044          | 2,888,553                                        |
| Maximum design performance, MHz        | 43.83            | 72.81 (speed grade -2)<br>64.68 (speed grade -1) |
| Version of Xilinx ISE software         | 6.2.03i          | 9.1.03i                                          |

The total number of configuration bits for the XC5VLX50, XC5VLX85 and XC5VLX110 devices is 12.6M, 21.8M, and 29.1M bits correspondingly, so only one XCF32P Flash PROM is required for any of these devices. The speed of downloading for the XCF PROMs is twice higher than for the XC18V PROMs. Using the SelectMAP configuration option and an 8-bit parallel path at 33MHz, configuration time will be 54, 80 and 100 milliseconds correspondingly for these devices. It is possible to use 16- and 32-bit configuration options and reduce the configuration time twice or even four times, but then either two, or four PROMs would be needed. For comparison, the configuration time of the XC2V4000 FPGA from four XC18V04 PROMs is 100

### IV. MEZZANINE GIGABIT LINK

Low-cost low-power TLK SERDES devices [4] available from Texas Instruments have proven to be reliable in many applications, including existing LHC sub-systems. Seven pin compatible devices support serialization and deserialization of 16- or 18-bit parallel data patterns from 25MHz to 156.25MHz with the industry standard 8B/10B or start/stop encoding, provide either current- or voltage-mode serial interface and have an embedded PRBS generator (Table 5).

Table 5: Texas Instruments TLK Family of SERDES Devices

| Part<br>Number | Parallel<br>Bus, bit | Serial<br>Interface | Bit Rate,<br>Gbps | Reference Clock<br>Frequency, MHz | Encoding<br>Method | Embedded<br>PRBS generator |
|----------------|----------------------|---------------------|-------------------|-----------------------------------|--------------------|----------------------------|
| TLK1501        | 16                   | CML*                | 0.6-1.5           | 30-75                             | 8B/10B             | 27-1                       |
| TLK2501        | 16                   | CML*                | 1.5-2.5           | 75-125                            | 8B/10B             | 27-1                       |
| TLK3101        | 16                   | VML*                | 2.5-3.125         | 125-156.25                        | 8B/10B             | 27-1                       |
| TLK1521        | 18                   | VML*                | 0.5-1.3           | 25-65                             | Start/Stop         | None                       |
| TLK2521        | 18                   | VML*                | 1.0-2.5           | 50-125                            | Start/Stop         | None                       |
| TLK2701        | 16                   | CML*                | 1.6-2.7           | 80-135                            | 8B/10B             | 27-1                       |
| TLK2711        | 16                   | VML*                | 1.6-2.7           | 80-135                            | 8B/10B             | 27-1                       |

\* CML - Current Mode Logic, VML - Voltage Mode Logic

The SEFDES card [5] design is optimised for minimal width. Its dimensions are 103 mm in length, 23 mm in width and 13.7 mm in height (Fig.3). We have chosen the Samtec MOLC-120-31-S-Q 80-pin 4-row high density (1.27 mm pitch) thru-hole connector for connection to the host board.

The host board requires the FOLC-120-01-S-Q mating socket. The TLK device, SFP connector and cage are all assembled on one side of the mezzanine card which is facing the host board when plugged in. There is a small (~1.5 mm) clearance between the SFP cage and a host board.





Figure 3: SERDES Mezzanine Board (top and bottom views)

All seven TLK devices are pin compatible, but there are minor differences in serial and control/status interfaces. While three TLK1501/2501/2701 devices have current-mode (CML) output drivers, the others use voltage-mode logic (VML). The VML drivers have the advantage that they do not need to have pull-up or pull-down resistors, so the two corresponding resistors on a mezzanine card are not required for the VML-compliant devices. The TLK3101 also has an embedded biasing and termination circuits for the serial receiver, so even fewer external components are required. More technical details are available in [6]. The coupling between the TLK transceiver and SFP module is always AC-type; serial capacitors on the receiver and transmitter data paths are embedded into the SFP optical or copper module.

The host board interface supports the following signals: 16-bit transmit and 16-bit receive paths for the TLK device, reference clock for the transmitter and recovered clock from the receiver; 6 control inputs and two status outputs to/from the TLK, 5 control outputs and 2 status inputs to/from the SFP module. Note that all the TLK devices require a very stable reference clock with a jitter below 40 ps. The host board also provides the +3.3V and +2.5V supply voltages. Alternatively, the +2.5V can be produced on the mezzanine from +3.3V using the on-board voltage regulator. This option is selected by a jumper.

Typical power dissipation at a maximal data rate is 250mW, 362mW, and 450mW for the TLK1501, TLK2501 and TLK3101 devices respectively. Power dissipation of the SFP optical modules varies from vendor to vendor, but usually is approximately 500mW (typ) and 750mW (max).

#### V. CONCLUSION

The mezzanine approach proved to be a valuable solution in large electronic systems, such as the CSC EMU system at CMS, requiring extensive maintenance and upgrades. It could be applied to such parts as FPGA, data transmission links and other parts requiring flexibility and modifications.

The results of simulation of two of our designs targeted to Virtex-5 FPGA, show an increase in performance of ~50% for the mid-grade speed devices while more logic resources are still available for additional functionality. The ability of the Virtex-5 FPGA family to self-correct single event errors and report double errors using the Internal Configuration Access Port (ICAP) is essential for the future SLHC upgrade.

We have designed and built a small mezzanine card that houses one of the Texas Instruments pin compatible gigabit transceivers of TLK series, pluggable SFP optical or copper module and a high density 80-pin connector that provides parallel 16- or 18-bit interfaces to transmitter and receiver and all the required control and status signals to a host board. Seven Three pin compatible TLK devices operating at data rate from 25MHz to 156.25MHz can be used. The low height of below 14 mm allows to place the mezzanine on most standard carrier boards. For example, up to 8 mezzanines can be placed on a 6U VME or CompactPCI card; and up to 14 on 9U VME board. Sample boards with TLK1501/2501/3101 devices are available for evaluation.

#### VII. REFERENCES

- [1] B.G.Bylsma et al. The Cathode Strip Chamber Data Acquisition System for CMS. TWEPP-07 Proceedings, CERN 2007-007, 6 November 2007, Pp.195-198.
- [2] http://www.xilinx.com
- [3] http://www.schelto.com/SFP/index.html
- [4] http://focus.ti.com/lit/ml/sszt009c/sszt009c.pdf
- [5] Mezzanine SERDES Board Specification. <a href="http://bonner-ntserver.rice.edu/cms/SERDES/SERDES">http://bonner-ntserver.rice.edu/cms/SERDES/SERDES</a> spec.pdf
- [6] Interfacing Between LVPECL, VML, CML, and LVDS Levels. TI Application Report SLLA120. December 2002. Available at http://focus.ti.com/lit/an/slla120/slla120.pdf