# Development of a Waveform Sampling ASIC With Femtosecond Timing for a Low Occupancy Vertex Detector

Peter Orel, Gary S. Varner Department of Physics and Astronomy, University of Hawai'i at Manoa, Honolulu, Hawaii, USA email: porel@hawaii.edu

## INTRODUCTION

UNIVERSITY of HAWAI'I

MĀNOA

Increasing luminosities of particle colliders result in ever higher hit rates of the innermost vertex detectors [1], thus increasing their occupancies. The TVD [2] sensor architecture relies on using an asynchronous digital pixel matrix along with the transmision line and a readout



The pixel position is encoded in the time of flight of voltage pulses on a micro-strip line. The arrival times of the pulses are measured by a waveform sampling ASIC called the RFpix, the architecture of which is based on differential switched capacitor (SCA) arrays



## SAMPLING CELL



The above figure shows the RFpix sampling cell (SC) schematic. The SC structure follows a differential configuration with the two SC switches denoted as P and N respectively. Both switches are driven by a local differential clock driver, which ensures fast rise times (31ps) and prevents clock skew.

The RFpix switch track-mode resistance variance is 14.3 · 103. In addition, the lowest tracking bandwidth is approximately 4.21GHz, leading to a significant improvement of the signal response. Simulations show that the input capacitance of the SCC in hold mode is 28fF, while the total input capacitance seen at the input of the SCC in track mode is 50fF. The extrapolated SCC



By observing the differential output, the sampling error becomes almost symmetric. By considering the original signal amplitude versus the reconstructed sampled signal amplitude, the amplitude dependent offset voltage turns into a virtual gain of the SC.



### DELAY LOCKED LOOP

The best compromise between added jitter and power consumption has been achieved with a two-level (4 x 16) topology. At the top of the

first-level delay line (L1DL) the input clock is divided by 4 using two fully differential D flip-flops (DDFFC)

The resulting divided clock drives another DDFFC and a fully differential logic and gate. This circuit decreases the duty cycle to 25%. This signal is then fed into the L1DL, which is composed of four delay elements, each with a delay of 800ps, thus totaling an overall delay of 3.2ns. The input and output taps of each L1DL delay element are used to drive a second-level delay line (L2DL) that contains sixteen delay elements. Each L2 delay element (starved inverters) has a delay of 50ps, with an overall delay of the L2DL equal to the delay of a single L1DL element. Four of this L2DLs cover the entire delay range of 3.2ns. Each of the L2DL delay lines has its own feedback loop



## Timing & Control Trigger From FPGA CLK ControlLogic Sample & Hold Circuit Analog Storage and Digitization Block A 20GS/s Buffers 128MS/s Block B LVDS Driver Block A Slope ADC Data Transfer to FPGA Block B Parallel Transfer Block A Block B LVDS Driver Slope ADC Data Transfer Parallel Digi to FPGA Block B 128MS/s

RFpix Functional Block Diagram

## ANALOG TO DIGITAL CONVERSION

The slope ADC is designed to be a distributed circuit. The comparator is integrated in the storage array, with every storage cell having its own comparator. The ramp used to trigger the comparators is sourced from the ramp generator which is composed of an input/output rail-to-rail operational amplifier (OPAMP) configured as an integrator. That is, a DAC is used to create a voltage difference between the OPAMP inputs. This voltage difference is then integrated in time creating the ramp. The speed and slope polarity depend on the magnitude and polarity of the voltage difference. This way the ramp is highly adjustable. In our case ramps from 0.5 µs to 2µs are generated. Every channel has its own ramp

The digitizing logic triggers the ramp generation at the same time as it triggers a 12-bit gray code counter. The counter outputs are fed into 64 latches (one for each storage cell in the window). This latches are triggered by the comparators, thus latching the counter value.

The OPAMP has a complementary NMOS and PMOS differential pairs to achieve input rail-to-rail operation with a circuit for transconductance linearization. The input stage is followed by a folded cascode stage. Finally, the output stage is a basic CMOS inverter driven in a AB class



## RFpix BASELINE SPECIFICATIONS

Parameter Sampling speed 20GS/s Analog bandwidth 3GHz Input referred noise  $\leq 0.5 \text{mV}_{\text{RMS}}$ Added jitter per channel ≈ 40fs Number of bits 12 **ENOB** 10 126 Number of channels Buffer depth 32 Power consumption per channel TBD LVDS date rate ≥128Mbits/s/ch Technology node TSMC 130nm

Each channel has two unity gain amplifier buffers to decouple the capacitive loading of the SCA arrays from the inputs, thus providing the necessary analog bandwidth of 3GHz in a wire bonded package. The differential configuration of the SCA helps in terms of crosstalk mitigation and noise coupling. At the same time, it turns the amplitude dependent voltage error, due to charge injection, into a virtual gain of the sampling cell. The strobe signals are generated by a two-level delay locked loop (TLDLL), which ensures a worst case added jitter of 41fs. Two adjacent channels share one TLDLL, which has several advantages: avoiding loading on the strobe lines and thus providing with fast strobe rise times (app. 30ps), lowering of the power consumption, and providing with the possibility of interleaving. With the TLDLL tap delay of 50ps, the 20GS/s sampling speed is achieved. Every SCA block has a trigger logic that issues a transfer cycle, which latches the SCA block for 3.2ns (64 cells x 50ps) upon signal detection. Within this time, the SCA cell values are transferred in parallel to an analog storage array. Each storage array has a depth of 32 storage cells. Each storage cell has an integrated comparator, which is a part of a parallel slope ADC, which runs at a speed of 128MS/s/channel due to its parallel configuration. The digitized data is transmitted off-chip by a serializer logic through a dedicated LVDS driver per channel. With a 12-bit ADC, buffer depth of 5µs, and the system trigger of 30kHz, the overall data throughput is around 2Mbits/s/channel. The average power consumption without the input buffers is estimated at 25mW per

## ANALOG STORAGE ARRAY

The storage array is composed of 64 x 32 storage cells. Every sampling cell is linked to 32 storage cells. That is, the analog buffer depth is 32. Each storage cell has an input switch, a storage capacitor, and a comparator. The buffer depth has been determined by simulating the TVD response in the Belle II environment. To completely cover all the events, the buffer depth should be 36 cells. With a depth of 32 cells only a small fraction of the events is lost, however address logic is greatly simplified.





## REFERENCES

[1] T. Abe and et al., "Belle II Technical Design Report," ArXiv e-prints (2010), arXiv:1011.0352 [physics.ins-det].

[2] P. Orel and et al., "Exploratory study of a novel low occupancy vertex detector architecture based on high precision timing for high luminosity particle colliders, Nucl. Instrum. Meth. A857, 31-41 (2017).