# Advanced FPGA design

## Andrea Borga andrea.borga@nikhef.nl



A. Borga Electronics Technology Department

ISOTDAQ 2015 – Rio de Janeiro

## Outline

- First part: theory
  - ... from the previous lesson
  - Considerations on Hardware Description
  - Gateware workflow
  - Takeaway thoughts
- Second part: practice
  - Eye diagrams

February 4, 2015

• Pseudo Random Bit Sequences (PRBS)

ISOTDAQ 2015 - Rio de Janeiro

• FPGA serializers and deserializers



Electronics Technology Department

A. Borga

## ... from the previous lesson



A. Borga Electronics Technology Department

3

ISOTDAQ 2015 – Rio de Janeiro

### FPGAs : Field Programmable Gate Arrays

- Array (Matrix) like structure made of:
  - Look-Up-Table (LUT) to implement combinatorial logic
  - Flip-Flops (FF) to implement sequential logic
  - Routing network to interconnect the logic resources
  - I/O logic to communicate with outside logic
  - Clock Management: Phase Locked Loops (PLLs), Digital Clock Managers (DCMs)
  - Hard-Macros: Digital Signal Processing (DSP) cells, SRAMs, PCIe, Gigabit Transceivers, etc.





A. Borga Electronics Technology Department

Configurable Logic Block (CLB)

February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro

### Example: Xilinx Virtex-7 development board



## Digital (Gateware) Design is NOT programming



#### Programming

- Code is written and translated into instructions
- Instructions are executed **sequentially** by the CPU(s)
- Parallelism is achieved by running instructions on multiple threads/cores
- Processing structures and instructions sets are **fixed by the architecture of the** system VS.

#### **Digital (Gateware) Design**

- **No fixed** architecture, the system is built according to the task
- Building is done by **describing/defining** system elements and their relations
- Intrinsically parallel, sequential behaviour is achieved by Finite-State-Machines (FSMs) and registers
- Description done by schematics or a hardware description language (HDL)



A. Borga **Electronics** Technology Department

## Hardware **Description** Language (HDL)

- As the name suggests it is a language used to describe hardware: so you have to use it to do so!
- Let's discuss the simple example of a wait statement
- In C (Unix, #include <unistd.h>)

sleep(5); // sleep 5 seconds

• In VHDL this is **not** synthesizable, but you can use it in test benches

wait for 5 sec; -- handy for TB clocks

• This is (one) way to do it in synthesizable VHDL

simple delay counter : process (delay rst, delay clk, delay ena) begin -- process if delay rst = '1' then <= delay ld value; s count s delay done <= '0'; elsif rising edge(delay clk) then if delay ena = '1' then if delay ld = '1' then s count <= delay ld value;</pre> else s\_count <= s\_count - 1;</pre> end if; end if; if s count = 0 then s delay done <= '1';</pre> else s\_delay\_done <= '0';</pre> end if; end if: end process;





February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro





#### Gateware design workflow... a la carte!



### Gateware design workflow... a la carte!



I) Implementation flow:

what turns a line of code into a blinking LED?

2) Verification flow:

why is the statement above not (always) true!

3) Design constraining:

how to force your game rules



## Implementation flow



A. Borga Electronics Technology Department

10

**NKIHEF** 

#### Implementation flow: synthesis

#### **Register Transfer Level (RTL)**

 a design abstraction which models a synchronous digital circuit in terms of the flow of digital signals (data) between registers and logical operations performed on those signals. (http://en.wikipedia.org/wiki/Register-transfer\_level)



### Implementation flow: synthesis

#### **Synthesis**

- translates the schematic or HDL code into elementary logic functions
- defines the connection of these elementary functions
- uses Boolean Algebra and Karnaugh maps to optimize logic functions
- generates a device independent net list







### Implementation flow: synthesis

#### **Synthesis**

- translates the schematic or HDL code into elementary logic functions
- defines the connection of these elementary functions
- uses Boolean Algebra and Karnaugh maps to optimize logic functions
- generates a device independent net list







ISOTDAQ 2015 - Rio de Janeiro

Electronics Technology Department

## Implementation flow: mapping and routing

#### <u> Translate / Mapping</u>

- translates the device independent net list into technology specific elements
- checks the content of black boxes (e.g. IP cores)
- checks if the design can fit the target device
- maps these elements into the FPGA logic cells

#### Place and Route (P&R)

- places the basic elements on the logic cell grid
- routes the signals between the logic cells
- can be "guided" by constraints: •
  - location constraints
  - timing constraints 0



Interconnect Block (Switch Box)



A. Borga Electronics Technology Department

14

Xilinx Vivado 2014.4 design flow



ISOTDAQ 2015 - Rio de Janeiro

#### Implementation flow: routing the counter



Xilinx Vivado 2014.4 design flow



HEF

0

ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department

#### Implementation flow: routing the counter

Perfect example of a badly constrained design!



## Verification flow





° 0

> A. Borga Electronics Technology Department

17

ISOTDAQ 2015 – Rio de Janeiro

#### Verification flow: simulation

- Verification of a design by an HDL simulator.
  - Industry standard → MentorGraphics Modelsim (or Questasim)
  - Try out some free alternatives  $\rightarrow$  lcarus Verilog (<u>http://iverilog.icarus.com/</u>)
  - Try out some free alternatives  $\rightarrow$  GHDL (http://ghdl.free.fr/)
- Event-based simulation to recreate the parallel nature of digital designs
  - The simulator time is sliced in delta delays
  - At each step of the delta delay all clauses (e.g. clock rising edge) are evaluated
  - The outcome of an event is computed and the logic updated
- Different levels of simulation:
  - **behavioral:** fastest, simulates only the behavior of the design







#### Verification flow: simulation

- **functional:** fast, uses realistic functional models for the target technology the least used by HDL designers ... why?
  - Mostly because these days you can (almost) trust your tools (a bit) more
  - What happens if you use the VHDL statement?

signal <= `X'; -- unknown (misused to connect to anything)</pre>

- **post translate and map simulation models:** similar to the above but with information (about the actual primitives) of the translation and mapper steps
- timing: slow, most accurate. Uses Place & Route design + SDF (Standard Delay Format)
  - in the past was used to detect routers errors in placing designs...
     when routers where not so smart and FPGAs where not so fast!
  - what if the propagation delays of the bits of our counter where not equal?
  - or greater than the clock speed?



A. Borga Electronics Technology Department



## Verification flow: debugging

- Your desing is up... and also running?
- Most FPGA vendors provide internal logic analyzer cores
  - ISE ChipScope, Vivado Set up Debug (Xilinx)
  - SignalTap (Altera)
- Can be embedded into the design and controlled by JTAG
- Allow also the injection of signals
- It is at times extremely useful to spy inside the FPGA... but this doesn't replace an oscilloscope... as signal integrity issues can be on the PCB
- Remember... it's hardware!

| Bus/Signal       | х  | 0  | 0-92 | I -84 | 0 -760 | ) -68(<br> | 0 -60 |       | 520 | -440 | -360 | -280 | -200    | -120 | -40        | 40     | 120   | 200 2  | 80 360    | 0 440              | 520 |   |               |                                     |     |      |                |     |    |            |
|------------------|----|----|------|-------|--------|------------|-------|-------|-----|------|------|------|---------|------|------------|--------|-------|--------|-----------|--------------------|-----|---|---------------|-------------------------------------|-----|------|----------------|-----|----|------------|
| 00B_State        | 0  | 0  | 0    | χ     |        | 1          |       | X     |     |      | 2    |      |         | 3    | χ 4        | X      |       |        | 6         |                    |     |   |               |                                     |     |      |                |     |    |            |
| RX_Status        | 0  | 0  | 1    |       | (      | 0          |       | χ     | 4   | X    | (    | )    | X       | 0    | χ 2        | X      |       |        | 0         |                    |     |   |               |                                     |     |      |                |     |    |            |
| txdata           | 00 | 00 |      |       |        |            |       |       | 00  |      |      |      |         |      |            |        |       |        |           |                    |     |   |               |                                     |     |      |                |     |    |            |
| rxdata           | 00 | 00 |      |       |        |            |       |       | 00  |      |      |      |         |      |            | )      |       |        |           |                    |     |   |               |                                     |     | 4-1  |                |     |    |            |
| RX_CHAR_IS_K     | 0  | 0  |      | 0     |        | 3 0        | X 0 X | 0     | 0   |      | 0    |      | 2 .     | 1.0  |            | 0.6693 | • 💮 💮 | • (33) | • () · () | 0.0.0.000          |     |   |               |                                     |     | ata( | <b>T</b> ) – – |     |    |            |
| Align            | 0  | 0  |      |       |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     |   |               |                                     | 64  | ' -> | <b>'∩'</b> ,   | not | 60 |            |
| Sync             | 0  | 0  |      |       |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     |   |               | `                                   |     |      | <b>V</b> 1     | ιοι | 30 | <b>FIG</b> |
| TXComStart       | 1  | 1  |      | 1     |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     |   | ~~~           |                                     | JL. |      |                |     |    |            |
| TXComType        | 0  | 0  |      |       | _      | _          |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     | 5 |               |                                     |     |      |                |     |    |            |
| RX_Electric_IDLE | 1  | 1  |      |       |        |            | ГЛ    | T     |     |      |      |      | Л       | ЛЛГ  | UП         |        |       |        |           |                    |     | 4 | فتستحقق       | and the second                      |     |      | and the second |     | -  |            |
| RX_Byte_Realign  | 0  | 0  |      |       |        |            |       | a sea |     |      |      |      | 0000000 |      | ananyar wa | 1      |       |        |           |                    |     |   |               |                                     |     |      |                |     |    | <u> </u>   |
| RX_Byte_is_ali   | 0  | 0  |      |       |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           | Control II. Sector |     |   |               |                                     |     |      |                |     |    |            |
| SOF              | 0  | 0  |      |       |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     |   |               |                                     |     |      |                |     |    |            |
| EOF              | 0  | 0  |      |       |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     |   | e(1):No edge: |                                     |     |      |                |     |    |            |
| SPEED            | 1  | 1  |      |       |        |            |       |       |     |      |      |      |         |      |            |        |       |        |           |                    |     | - | Source        | <ul> <li>Slop</li> <li>J</li> </ul> | e   |      |                |     |    |            |

A. Borga Electronics Technology Department

20

February 4, 2015

ISOTDAQ 2015 - Rio de Janeiro







0

A. Borga Electronics Technology Department

21

ISOTDAQ 2015 – Rio de Janeiro

## Design constraining

- Remember: you are describing your hardware!
- Constraining is becoming so important that it is turning into a (not yet) standardized language of it own:
  - .qsf: Quartus II Setting File (Altera)
  - o .sdc: Synopsis Design Constraints (de facto standard)
  - .ucf: User Constraint File  $\rightarrow$  .xdc Xilinx Constraint File (Xilinx)

ISOTDAQ 2015 - Rio de Janeiro

• Two types of constraints:

#### Location constraints

• Geographical position and pin related

#### Timing constraints

February 4, 2015

• clock and timing related



### Design constraining: location

- FPGAs usually provide a large number of I/O pins for communication with the outside world
- Large variety of I/O standards supported: 3.3V CMOS, 2.5V LVDS, SSTL, ...
- I/O pins can be assigned more or less freely

#### BUT

- I/O cells are grouped in I/O banks → All cells in an I/O bank need to use either the same standard or a similar one (with the same voltage level), e.g. 3.3V CMOS is not compatible with LVDS
- LVDS signals always come in dedicated pairs
- **Clock** signals should use dedicated clock input pins  $\rightarrow$  routed internally over a dedicated network
- High-Speed serial interfaces (PCIe, Gigabit-Transceivers) or hard macros might need dedicated pins as well

#### **Good Practice**

- Try to locate pins belonging to one design module close to each other  $\rightarrow$  **avoid routing** across chip
- PCB Designers:
  - Check your I/O assignment with a **preliminary** design with only I/O pins instantiated
  - Check for **SSN** (Simultaneous Switching Noise)
  - Use **back-annotation** of I/O pins to optimise fan-out and routing of signals



February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department

#### Design constraining: location





0

ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department

### Design constraining: timing

#### <u>Timing constraints</u>

- clock period
- setup and hold times
- path delays: highlight critical connections
- false paths: force ignoring some connections



- See for example: Xilinx Vivado Using Constraints (UG903)
- Remember the story of the propagation delay in the timing simulation? (the positioning of Flip-Flops?)
- If you do a good job constraining... you can spare yourself the timing simulation!

```
157 set property LOC IBUFDS GTE2 X1Y11 [get cells pcie0/u1/refclk buff]
158
160 # Timing Constraints
162 create clock -period 10.000 -name sys clk -waveform {0.000 5.000} [get ports sys clk p]
163
164 create generated clock -name clk 125mhz x0y1 [get pins pcie0/u1/pipe clock0/mmcm0/CLKOUT0]
165 create generated clock -name clk 250mhz x0y1 [get pins pcie0/ul/pipe clock0/mmcm0/CLK0UT1]
166
167 create generated clock -name clk 125mhz mux x0y1 -source [get pins pcie0/ul/pipe clock0/g0.pcl}
168 create generated clock -name clk 250mhz mux x0y1 -source [get pins pcie0/ul/pipe clock0/g0.pcl
169 set clock groups -name pcieclkmux -physically exclusive -group clk 125mhz mux x0y1 -group clk :
170 set false path -to [get pins pcie0/ul/pipe_clock0/g0.pclk_i1/S0]
171 set false path -to [get pins pcie0/ul/pipe clock0/g0.pclk i1/S1]
172
173 set false path -from [get clocks I] -to [get clocks clk40 clk wiz 0]
174 set false path -from [get clocks clk40 clk wiz 0] -to [get clocks I]
175 set false path -from [get clocks n 10 mmcm0] -to [get clocks clk40 clk wiz 0]
176 set false path -from [get clocks clk40_clk_wiz_0] -to [get_clocks n_10_mmcm0]
177 set false path -from [get clocks clk40 clk wiz 0] -to [get clocks clk160 clk wiz 0]
```

Xilinx Vivado 2014.4 design flow

February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department

## Takeaway thoughts

Sooner or later you're gonna realize, just like I did...

There's a difference between knowing the path and walking the path.

- Morpheus -

February 4, 2015

A. Borga Electronics Technology Department

ISOTDAQ 2015 - Rio de Janeiro



## Takeaway: don't ignore reports!

- Learn to carefully review reports
- The reason why your design is not functioning as intended... can be right in front of your eyes!
- Especially check timing and... don't run designs that haven't met timing!





ISOTDAQ 2015 – Rio de Janeiro



## Takeaway: scripting for Gateware designs

- Design tools can be scripted: Tool Command Language (TCL)
- Parameters/Options can be passed via command-line (makefile, shell scripts)
- You have **much more control and reproducibility** on your procedures (you can forget about checking a tick-box, and you will, sooner or later...)
- allows for complete automation  $\rightarrow$  design servers and nightly build

| 😣 Add Source Files                | add_files -norecurse /data/Projects/FELIX/FELIX_CERN/firmware/sources/pcie/pcie_init_7vx.v |
|-----------------------------------|--------------------------------------------------------------------------------------------|
| Look <u>i</u> n: 💋 pcie           | update_compile_order -fileset sources_1                                                    |
| 에 dma_control.vhd                 |                                                                                            |
| of pcie_clocking.vhd              | Type a Tcl command here                                                                    |
| 😡 pcie_init_7vx.v                 | Type a fet comand here                                                                     |
| 😡 vc709_pcie_x8_gen3_pipe_clock.v | 🛄 Tcl Console 🔎 Messages 🛛 🖼 Log 🗋 Reports 🗊 Design Runs 🏼 🍎 Timing                        |
| vc709_pcie_x8_gen3_support.v      |                                                                                            |
| vilinx_pcie_3_0_7vx_ep.v          | Xilinx Vivado 2014.4                                                                       |

 Simulators can be controlled with TCL and even used to create test benches (slower but extremely flexible)





MentorGraphics Questasim v10.2

ISOTDAQ 2015 - Rio de Janeiro

### Takeaway: more tips

• Describe your hardware: **think hard**...ware!



- RT...M! Seriously... you HAVE to, especially with FPGAs (family overview, DC and Switching, clock resource, Transceiver Guides, Package and pinout)
- Consider your FPGA full at 70% or you'll get nice surprises from your router...
- Digital designs are analog in essence (especially with ever higher clock frequencies)
- Share share share...
- Celebrate your achievements!





ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department

## FPGAs... so what? Practical example



A. Borga Electronics Technology Department

ISOTDAQ 2015 – Rio de Janeiro

## ISO / OSI model: You are here...

- International Organization for Standardization / Open System Interconnection: if you are talking about engineering, can't do a talk without!<sup>©</sup>
- It is a conceptual model that characterizes and standardizes the internal functions of a communication system by partitioning it into abstraction layers
- A layer serves the layer above it and is served by the layer below it



http://en.wikipedia.org/wiki/OSI\_model

A. Borga Electronics Technology Department

31



February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro

#### System Architecture: You are here...

|   |        | Segment | 4. Transport |  |  |  |  |
|---|--------|---------|--------------|--|--|--|--|
| Ī | Media  | Packet  | 3. Network   |  |  |  |  |
|   | Layers | Frame   | 2. Data Link |  |  |  |  |
|   |        | Bit     | 1. Physical  |  |  |  |  |



 Very popular since a lot of applications have demanding and fast (serial) I/O requirements



Gigabit-Transceiver X



ISOTDAQ 2015 – Rio de Janeiro

## Eye Diagrams

• An eye diagram (eye pattern) is the first measure of the quality of a transmission channel: how good are my "ones" and "zeroes"?



- Essential information on transmission quality can be obtained from these diagrams : amplitude (voltage) stability, time stability,, etc.
- It is all about the probability to sample the signal correctly

Instrument: Agilent 86100C DCA-J

February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department

## Pseudo Random Bit Sequence (PRBS)

- A PRBS is a sequence of bits that are pseudo-random. That is, they are not really random but they can be used where a good approximation to random values is required → test vectors, white noise
- They are often implemented using Linear Feedback Shift Registers (LFSR)
- The arrangement of taps for feedback in an LFSR is a polynomial mod 2
  - PRBS =  $x^7 + x^6 + 1$
- Maximum number of sequences: 2<sup>n</sup>-1
- It starts from a "seed value", the only forbidden state is all-zeroes (no exit)



#### Xilinx Virtex-7 Serializers Deserializers



- Good news: the PRBS is a built-in function of most modern transceivers!
- The next step is to write your own code and drive a link!

www.xilinx.com/support/.../user.../ug476\_7Series\_Transceivers.pdf

February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro



## Pitbullen!

• We may face (very) difficult problems...



#### Never let go!

- It may take a while... but victory will be yours!
- Thank you very very much to: Torsten Alt (FIAS) and Peter Jansweijer (Nikhef)

February 4, 2015

ISOTDAQ 2015 – Rio de Janeiro

A. Borga Electronics Technology Department