



# Energy-Efficient Dual-Port 32-word 8-bit ERSFQ Register File

Alex Kirichneko, Max Miller, Igor Vernik, Oleg Mukhanov

HYPRES, United States of America

Lucian Albu, and Gerald Gibson

IBM, United States of America

# C3 Program

### Cryogenic Computing Complexity Program

- called C3 (not CCCP!)
- 🗆 5 years

### To demonstrate fully functional cryogenic computer

64-bit processors
cryogenic RAM
> 2 GHz
< 1 nJ / FLOP</li>

### 64-bit CPU Block Diagram



### **64-bit CPU Parameters**

|                                    | # of JJs | Energy/op | Latency | Delay          |
|------------------------------------|----------|-----------|---------|----------------|
| ALU                                | 32,000   | 3.8 fJ    | 1       | 35 ps          |
| Register File                      | 160,000  | 10.2 fJ   | 2       | 100 ps         |
| Instruction Memory                 | 72,000   | 5.4 fJ    | 3       |                |
| Buffer, decoders, memory interface | 17,000   | 1.3 fJ    |         |                |
| Bit Shifter                        | 4,200    | 3.6 fJ    | 4       | 20 ps - 400 ps |
| Total                              | 281,000  | ~20-24 fJ |         |                |

Clock – 10 GHz

### **Design of an 8-bit ERSFQ CPU**





## 8-bit CPU on a 5x5-mm chip



#### 8-bit CPU comprises

- 8-bit ALU
- 31x8-bit register file
- 12x21-bit instruction memory
- External serial i/o interfaces to the register file and the instruction memory
- 28,000 JJs (including bias JJs)

# 8-bit 32-word Register File

Address input block

Register file i/o block

matrix

file

register

8x32



**Three 5-bit decoders** 

# **Current recycling in the Register File**



- Modular design:
  - All 32 registers have exactly the same bias current value
  - Each clock cycle, only three registers are active
- Current recycling:
  - Factor of 32 dc bias current reduction
  - Factor of ~10 power dissipation reduction

### **Dual-port NDRO cell (ND<sup>2</sup>)**

ND<sup>2</sup> cell schematics

ND<sup>2</sup> low-speed test



Layout in 8-layer MIT-LL process Size: 12 um x 26 um





dc bias current margins: ± 15.8%

### **Inductive AND element**



dc bias current margins: ± 18.0%



### **RO-SQUID based Current Driver**



Relaxation Oscillations SQUIDSwitching energy:  $E_s = 0.5 \cdot L \cdot l_b^2$  (~ 20 aJ/bit)Total inductance L ~ 200 pH $\tau_s \approx L \cdot l_b / V_c + L / R_s$  (~ 60 ps)Serially biased (current recycling)



# **Current-steering driver**



 $I_b$  operational margins: ± 11 %



### **RO SQUID-based Merger**



- □ Very compact design
- Large fan-out

SFQ

- □ Bias current is recyclable
- Inputs and the output are decoupled (allows bias current recycling between the merged circuits)
- Switching energy for 32-to-1 merger is ~ 20 aJ
- Speed is limited by L/R (unlike of binary tree merger)

# Simple Single-Column Design

#### dual-port register file cell



- ND<sup>2</sup> dual-port NDRO cell
- Only two half-select ports (w0/w1)
- Single bit slice 32 cells
- Three 5-bit decoders

#### dual-port register file slice



### **Synchronizing Register File with ALU**

#### □ Common line select scheme:

□ FIFO buffers are used for synchronization

**Switching energy:**  $E_s = L \cdot I_b^2$  (~ 150 aJ)

$$\Box \tau_{s} \approx (L \cdot I_{b} / V_{c} + L / R_{s}) \cdot \Lambda$$

□ All line drivers are serially biased

#### □ Pipelined select scheme:

- Directly synchronized to ALU
- **Switching energy:**  $E_s = L \cdot I_b^2$  (~ 150 aJ)
- $\Box \tau_{\rm s} \approx (L \cdot I_b / V_c + L / R_{\rm s})$
- □ All line drivers are serially biased





# **Decoder Block Diagram**

Address input



# **Decoder Cell Array**





# 8-bit 32-word Register File



8x32 register file matrix

# **4-bit Decoder Low-Speed Test**

| <u>F</u> ile ( | ptions |           |                 |          |       |         |           |           |          |                                         |           |           |           |           |           |         |          |
|----------------|--------|-----------|-----------------|----------|-------|---------|-----------|-----------|----------|-----------------------------------------|-----------|-----------|-----------|-----------|-----------|---------|----------|
| 0.50<br>0.00   |        |           |                 |          |       |         |           |           |          |                                         |           |           |           |           |           |         | select   |
| 0.25           | 0      | 0         | 0               | 0        | 0     | 0       | 0         | 0         | 1        | 1                                       | 1         | 1         | 1         | 1         | 1         | 1       | Ā3       |
| 0.25           |        |           |                 |          |       |         |           |           | ]        |                                         |           |           |           |           |           | _       | A3       |
| 0.25           | 0      | 0         | 0               | 0        | 1     | 1       | 1         | 1         | 0        | 0                                       | 0         | 0         | 1         | 1         | 1         | 1       | Ā2       |
| 0.25           |        |           |                 |          | 1     | _       |           |           |          |                                         |           |           | 1         |           |           |         | A2       |
| 0.00           | 0      | 0         | 1               | 1        | 0     | 0       | 1         | 1         | 0        | 0                                       | 1         | 1         | 0         | 0         | 1         | 1       | Ā1       |
| 0.00           |        |           | ] 🔺             | <b>±</b> |       |         |           | -         |          |                                         | ] _       | <b></b>   |           |           | ] –       | - L     | A1       |
| 0.00           |        | 1         |                 | 4        | •     | 4       |           | 1         | •        | 1                                       |           | 1         | •         | 1         |           | 1       | Ā0       |
| 0.00           | 0      |           | 0               |          | U     | L       | <u> </u>  |           |          |                                         |           | 1         | 0         | 1         |           | L L     |          |
| 0.23           |        |           | ]               |          |       |         |           |           | ]        |                                         |           |           |           |           |           |         |          |
| 0.05           |        |           | <del>::::</del> |          | ****  | ::::    |           |           | Ā        | ======================================= |           |           |           |           |           |         |          |
| -0.05<br>-0.10 |        |           |                 |          |       |         |           |           |          |                                         |           |           |           |           |           |         | out14    |
|                |        |           |                 |          |       |         |           |           |          |                                         |           | out13     |           |           |           |         |          |
| ₩              |        |           |                 |          |       |         |           |           |          |                                         |           | out12     |           |           |           |         |          |
|                |        |           |                 |          |       |         |           |           |          |                                         |           | out11     |           |           |           |         |          |
|                |        |           |                 |          |       |         |           |           |          |                                         | out10     |           |           |           |           |         |          |
| 0.25           |        |           |                 |          |       |         |           |           |          |                                         | out9      |           |           |           |           |         |          |
| 0.25           |        | • • • • • | ••••••          |          | ••••• |         | • • • • • |           |          |                                         |           | • • • •   |           |           | ••••      |         | out8     |
| 0.00           | ••••   |           |                 |          |       | -•••    |           |           |          |                                         |           | ••••      |           | • • • •   |           | • • • • | out7     |
| 0.25           |        |           |                 |          |       |         |           |           |          |                                         |           |           |           |           |           |         | out6     |
| 0.00           |        |           | ••••            |          |       | •       |           | ••••      | ••••     | · · · · ·                               |           |           | • • • • • | · · · · · | ••••      | • • • • |          |
| -0.25          |        |           | ••••            |          |       |         | ••••      | · · · · · | ••••     | • • • •                                 | • • • • • | <b></b>   | ••••      | • • • •   | <b></b>   | ••••    |          |
| -0.25          |        |           |                 |          |       | <u></u> |           | • • • •   |          |                                         | ••••      | • • • • • | <b></b>   | ·····     | ••••      | ••••    | out4     |
| -0.25          |        |           |                 |          |       |         |           |           | ····     | ····                                    |           | ••••      | ••••      | • • • • • |           | ••••    | out3     |
| 0.00           |        |           |                 |          |       |         |           | • • • • • |          |                                         | • • • • • |           |           | • • • • • | · · · · · | ••••    | out2     |
| 0.00           |        |           |                 |          |       |         |           |           |          |                                         |           |           |           |           |           |         | out1     |
| 0.00           |        |           |                 |          |       |         |           |           |          |                                         |           |           |           |           |           |         | out0     |
| Line-1 2       | 0.00   | 10.00     |                 | 20.00    | ••••  | 30.00   |           | 40.00     | Y0 079   | 50.00                                   | ••••      | 60.00     |           | 70.00     | 80        | .00     | 90.00    |
| 11RC-1.24      |        |           |                 |          |       |         |           |           | 1- 0.010 |                                         |           |           |           |           |           |         | Time, as |

DC bias current margins: ±8%

# **4-bit Decoder High-Speed Test**

- 13 GHz
- Low-speed dc/SFQ converter
- Fast clock (13 GHz)
- Low-speed address pattern



# **Register File write block test**



# A single bit slice of 32-word Register File



### **Summary**

#### We have designed a dual-port Register File

- size 32 words x 8 bit (future target 64w x 64b)
- Dual-port read-out
- Wave-pipelined
- Access time 100 ps (target speed 10 GHz)
- Based on dual-port NDRO RSFF
- Energy per 1-bit read operation ~ 50 aJ
- Energy per 1-bit write operation ~ 60 aJ

#### All components of the Register file were experimentally demonstrated

- Dual-port NDRO RSFF (+/- 10%)
- Half-select "dc/sfq" cell (+/- 18%)
- □ Current drivers (+/- 11%)
- 2-bit Register File block (+/- 6%)
- Register file write/read block (+/- 5%)

#### A single bit slice of the 32-word register file has been tested

- **Yield ~70%**
- First port operates better than the second port (work in progress)
- The whole 8-bit 32-word Register File is being fabricated