

### Increase Development Efficiency and Quality using System-on-Modules with SOC Technology

3 October 2023 - Dirk van den Heuvel



### We are TOPIC.

- ▲ Real Embedded company
- ▲ Founded in 1996
  - ▲ Since 2023 proud member of the T&S Group, France



- ▲ Based in the Netherlands; Europe
- ▲ 5 Business Lines:

  - Consultancy: The Netherlands
    Turn-key Projects: Europe and North America
    Farm-out Projects: Europe and North America
    Embedded Product Development and Sales: Worldwide
    Healthcare Solutions: Worldwide







TOPIC



PREMIER



▲ Miami System-on-Modules portfolio

▲New member of the family: Miami Versal

▲Some example projects

▲ Precision Timing, White Rabbit & TOPIC



# Miami System-on-Modules portfolio

### Embedded technology evolution.



TOPIC

DDR

### Embedded development evolution.







### Miami SOM product family in a glance.



- Miami ZYNQ
- AMD SOC technology:
- Technology node:
- Processors:
- Logic density:
  - Connectors:
  - Gigabit transceivers:
- DDR-SDR memory:
- Introduction date:

- Zyng 7000 (7012S, 7015, 7030) 28 nm Single or dual core ARM Cortex A9
- 55k-125k cells

- 2x 120 pins 4x GTH (PL)
- 1GB 32b DDR3 (PS) 2014





- Miami Plus MPSoC
- ▲ AMD SOC technology:
- ▲ Technology node:
- ▲ Processors:
- ▲ Logic density:
- Connectors:
- Gigabit transceivers:
- DDR-SDR memory:
- Introduction date:

#### ZYNQ

Zyng Ultrascale+ (ZU6, ZU9, ZU15) 16 nm

TOPIC

Quad core ARM Cortex A53, dual core ARM Cortex R7, ARM Mali-400 GPU 469k-747k cells 2x 120 pins, 1x 180 pins 3x GTP (PS), 16x GTH (PL) 2GB 72b DDR4 (PS) 2020

# YNQ.

- Miami Plus ZYNQ
- ▲ AMD SOC technology:
  - Zyng 7000 (7035, 7045, 7100) Technology node: 28 nm

1GB 32b DDR3 (PS)

1GB 32b DDR3 (PL)

Dual core ARM Cortex A9

2016

- ▲ Logic density: 275k-444k cells
  - 2x 120 pins, 1x 180 pins 16x GTH (PL)
- ▲ Gigabit transceivers:
- ▲ DDR-SDR memory:
- ▲ Introduction date:

Processors:

▲ Connectors:



### Miami Plus

- ▲ AMD SOC technology:
- ▲ Technology node:
- ▲ Processors:
- ▲ Logic density:
- Connectors:
- Gigabit transceivers:
- DDR-SDR memory:
- ▲ Introduction date:

#### VERSAL

Prime Versal ACAP (VE2602, VE2802,

VM2302, VM2902)

7 nm

Dual core ARM Cortex A72, Dual core ARM Cortex R5F, AI Engine-ML Tiles 820k – 2233k cells Highspeed Samtec Mezzanine 24x GTYP (PL) 8GB 72b DDR4 (PS) Under development

### 



### Development trends.

▲ Signal bandwidths >10Gbps more common than exception

- ▲ Bundles of Gigabit transceivers
- Number of independent clocks on the board
- ▲BGA package pitch smaller and smaller
  - ▲ 1.0mm → 0.8mm → 0.35 mm (info packages)
- ▲ Supply voltages lower and lower
  - ▲ Currents going up  $\rightarrow$  25A-100A per supply rails of 0.7V-0.9V
  - ▲ Static power supply increasing → Logic partitioning, use of power regions

▲ Safety and security impact

- ▲ Logic isolation
- Functional isolation

### Boards design consequences.

- ▲Wider and faster memory
- ▲ High-speed serialized interfaces
- ▲ Larger supply currents, lower voltages, more supply rails
- ▲ More complex board peripherals
- ▲ Higher demands to your PCB technology
  - ▲ High Density Interconnect (HDI)
  - Stackup and materials

→ Board design has become a critical design factor
 → Circuit design AND (!) FPGA design can have a significant influence on the board design quality

### ΓΟΡΙΟ

### Power distribution consequences.



~150W max. total power consumption 0.85V/120A core supply Liquid cooling solution Must support full device utilization





45W limited power consumption 0.85V/24A core supply Passive cooling capabilities Limited device utilization



5W peak power consumption Battery powered Passive cooling capabilities Optimized device utilization

### Current and voltage distribution.



TOPIC

954.675 943.724 932 872 922.021 911:17 900.219 889 467 978.615 997.784 955 913 948 081 m

> +-430,483 +-987.434 +344.386 +-301.338 +-258.29 +215.241 +-172.193 +-129.145 +-86.0965 43 0493

### Thermal enhancements.

TOPIC



Logic core supply: 0.85V, 24A

Thermal via's

Edge plating Mounting hole conductance Covering heatsink





### What makes a SOM a SOM?

Software support e.g. Linux distribution maintenance e.g. software eco-system

Reference designs (carrier board, FPGA, processors, ...)

Development support

Functional safety

Reliability



Integrated functionality

Environment qualification (shock, vibration, climate, EMC)

> Life-cycle-management (obsolescence, long levity)

Heat dissipation distribution

Signal integrity

Tooling and engineering support

Certifiability





# Versal Apdative SOC SOM

# Identified applications.

### Embedded AI solutions

▲ Super-smart cameras with video pre-processing and AI on-the-edge

ΤΟΡ

- ▲ AI engines used for beam-forming/correlation functionality
  - ▲ Ultrasound, Radar, Lidar, SDR
- Photonic interface coupling
- Autonomous Mobile Robotics
- ▲ High-performance computing
  - ▲ Loads of programmable gates
  - Edge processing and offloading
  - ▲ 4K/8K video processing, AVoIP

#### **▲** . . . . .



 Processors
 Image: Constraint of the second seco

### Product details Miami Versal

### TOPIC

| Miami type                     | Miami Versal                                                                  | Miami Versal              | Miami Versal                  | Miami Versal          |
|--------------------------------|-------------------------------------------------------------------------------|---------------------------|-------------------------------|-----------------------|
| Order number                   | miav-ve26-1-7-4-2                                                             | miav-ve28-1-7-4-2         | miav-vm23-1-7-4-2             | miav-ve29-1-7-4-2     |
| FPGA                           |                                                                               |                           |                               |                       |
| Device                         | XCVE2602-2MSIVFVH1760                                                         | XCVE2802-2MSIVFVH1760     | XCVM2302-2MSIVFVF1760         | XCVM2902-2MSIVEVE1760 |
| Technology                     | Versal <sup>®</sup>                                                           | Versal®                   | Versal®                       | Versal®               |
| Logic cells                    | 820K                                                                          | 1139K                     | 1575K                         | 2233K                 |
| Flip Flops                     |                                                                               |                           |                               |                       |
| Block RAM                      | 16.7Mbit                                                                      | 21.1Mbit                  | 49Mbit                        | 70Mbit                |
| UltraRAM                       | 63.0Mbit                                                                      | 74.3Mbit                  | 127Mbit                       | 181Mbit               |
| DSP slices                     | 984                                                                           | 1312                      | 1904                          | 2672                  |
| GTx (PL controlled)            | 24x (32 Gbit/s each)                                                          | 24x (32 Gbit/seach)       | 24x (56 Gbit/s each)          | 24x (56 Gbit/seach)   |
| Processor System               |                                                                               |                           |                               |                       |
| Application Processor (cores)  | ARM Cortex-A72 (dual)                                                         | ARM Cortex-A72 (dual)     | ARM Cortex-A72 (dual)         | ARM Cortex-A72 (dual) |
| CPU Performance                | 2x 1.5GHz                                                                     | 2x 1.5GHz                 | 2x 1.5GHz                     | 2x 1.5GHz             |
| Co-Processor                   | 2x ARM NEON™                                                                  | 2x ARM NEON™              | 2x ARM NEON™                  | 2x ARM NEON™          |
| Real-Time Processor (cores)    | ARM Cortex R5F (dual)                                                         | ARM Cortex R5F (dual)     | ARM Cortex R5F (dual)         | ARM Cortex R5F (dual) |
| Al Engine-ML Tiles             | 152                                                                           | 304                       | 0                             | 0                     |
| Network-on-Chip M/S ports      | 21                                                                            | 21                        | 30                            | 42                    |
| Graphics Processor             | 542                                                                           | 2                         | 1                             | 2                     |
| GTx (PS controlled)            | •                                                                             |                           | 1. T                          | -8                    |
| Memory                         |                                                                               |                           |                               |                       |
| Cache (application processor)  | L1: 32KB I / D per core, L2: 1MB, on chip memory 256 KByte                    |                           |                               |                       |
| Cache (real-time processor)    | L1: 32KB I / D per core, tightly coupled memory 128 KByte per core            |                           |                               |                       |
| Cache (GPU)                    | *                                                                             |                           |                               |                       |
| SDRAM (PS/PL controlled)       | 2,4 or 8 GByte DDR4 with/without ECC (assembly option 32, 64 or 72 bits wide) |                           |                               |                       |
| SDRAM (PL only controlled)     |                                                                               |                           |                               |                       |
| NOR                            | Quad-speed SPI, (128 MByte, 256 MByte)                                        |                           |                               |                       |
| NAND                           | 0, 8, 16, 32 or 64 GByte pseudo-SLC or MLC                                    |                           |                               |                       |
| EEPROM                         | 32 Kbit I2C EEPROM storage                                                    |                           |                               |                       |
| User programmable/configurable | e interfaces on SoM connector                                                 |                           |                               |                       |
| PS connected I/O               |                                                                               | PS connected 1.8V GPIO, n | nultiplexed peripherals (MIO) | )                     |
| PL connected HR I/O            |                                                                               |                           |                               |                       |
| PL connected HP I/O            | HP and HD GPIO, 100 Ohm impedance controlled and length matched within guads  |                           |                               |                       |

| Dedicated interfaces on SoM connector |                                                                                       |  |  |
|---------------------------------------|---------------------------------------------------------------------------------------|--|--|
| Network                               | 10/100/1000Mbps Ethernet, (PHY included), IEEE 1588 and SyncE support                 |  |  |
| USB                                   | 2x USB 3.0, including on-board ULPI media                                             |  |  |
| CAN                                   | UART, I2C, SPI, I2S, CAN (user configurable/selectable)                               |  |  |
| Gigabit transceivers                  | e.g. FPD link, SDI, TFT, HDMI (PL), DisplayPort (PS)                                  |  |  |
| PCI-Express (end-point/root-complex)  | yes, GEN4 yes, GEN5                                                                   |  |  |
| GTx (PS controlled)                   |                                                                                       |  |  |
| GTx (PL controlled)                   | 16x (PCIe, 100Gb/40Gb Ethernet, USB 3.0, CoaXPress, HDMI, DisplayPort)                |  |  |
| Miscellaneous                         | GPIOs, SD/SDIO 2.0/MMC 3.31 compliant controllers                                     |  |  |
| JTAG                                  | PL and PS JTAG chain for shared debugging                                             |  |  |
| Debug                                 | Debug UART, console, PS JTAG, PL JTAG, 4 pins                                         |  |  |
| Supply                                |                                                                                       |  |  |
| Power supply input                    | 9.0- 16.0 Vdc via carrier board connector, 50[W] maximum. On-board voltage regulation |  |  |
| Logic I/O supply output               | Selectable I/O standards and voltages for I/O banks                                   |  |  |
| Software support                      |                                                                                       |  |  |
| Bootloader / BSP                      | U-Boot                                                                                |  |  |
| Boot resources                        | JTAG, QSPI-NOR, eMMC, SD-Card, USB                                                    |  |  |
| Operating System                      | TOPIC managed/maintained PetaLinux distribution                                       |  |  |
| FPGA reference design                 | Vivado BSP and module configuration                                                   |  |  |
| Carrier board (order number)          | Florida Versal (flo_versal)                                                           |  |  |
| Mechanical and environmental          |                                                                                       |  |  |
| Dimensions                            | 100mm x 75mm                                                                          |  |  |
| Connectors                            | Samtec high performance mezzanine carrier board connectors                            |  |  |
| Temperature                           | Industrial grade                                                                      |  |  |
| Temperature and humidity              | IEC 60068-2-1 (Cold), IEC 60068-2-2 (Dry heat), IEC 60068-2-78 (Damp heat)            |  |  |
| EMC/EMI                               | EN 55032, IEC 61132, EN 61326, IEC 55024                                              |  |  |
| Shock and vibration                   | MIL-STD-202G (method 204D), MIL-STD-202G (method 213B)                                |  |  |

### PRELIMINARY

### When is a Versal SOM the right SOM?

#### What is the connector strategy?

Join a standard in the SOM world?

Is there a standard around facilitating the required performance?

What interfaces are required to build a proper system with this kind of performance?

For what applications is this needed?

What are typical use-cases?

Native White Rabbit support



Maximum power consumption 75W? 100W?

Maximum load step (up & down) 25%? 50%? 100%?

TOPIC

Required transceiver rates Is ~30Gbps enough or is >50Gbps useful?

Memory bandwidth should match communication bandwidth?

What if you have 4 banks of 4 transceivers each running at 50Gbps per transceiver?

What are the environmental conditions?

Is it worthwhile considering an OHW-route?

### Your thoughts are appreciated.

- ▲ Are you considering an AMD Versal as a SOM?
- ▲ Do you have particular requirements for you application?
- ▲ Can you/are you willing to share this with us?
- ▲ Contact us:
  - E-mail me at <u>dirk.van.den.heuvel@topic.nl</u>
  - Leave a message at <u>https://topicembedded.com/products/system-on-modules/miami-versal</u>

- ▲ Think with us by responding to our questionnaire we will share.
- ▲ No strings attached. Just looking for application context.
- ▲ However, when a suitable SOM materializes, we will make it commercially attractive for you.



# Miami SOM design-in examples

### ΤΟΡΙĊ

# Video processing & multiplexing.

### ▲ Application field

- ▲ Real-time, low-latency video projection of cockpit video streams
- ▲ Box-2-Box video time synchronization
- ▲ Functionality
  - ▲ Support for multiple video sources and sinks up to 4Kp60 video resolution
    - ▲ 4x HDMI input + 4x HDMI outputs + 4x high-speed SFP+ communication links
  - ▲ Programmable video processing pipeline
    - ▲ Picture-in-picture, overlay creation, color space conversion
    - ▲ Video stream synchronization (GenLock), scaling, cropping, moving, etc.
  - ▲ Ethernet based system control interface for remote control and updates

Platform

- ▲ Miami Zynq Plus System-on-Module (SOM) + dedicated carrier board
- ▲ Custom box/enclosure design including EMC, safety and CB-Scheme certification
- ▲ Embedded Linux with Video-for-Linux (V4L2) for the video pipe-line
- ▲ 3<sup>de</sup> party IP block integration in FPGA pipeline (HDMI, 10G Ethernet)









### Delirium monitor.

▲ Safe and accurate delirium monitoring in routine hospital care

- ▲ Acute brain failure → long-term cognitive impairment (dementia)
- ▲ Replaces labor intensive patient questionnaires
- ▲ Significant improvement in qualitative delirium measurement
- ▲ Brain activity measurement using disposable EEG electrode patch
- ▲ Algorithmic detection of delirium by EEG signal processing
  - ▲ Algorithm development and validation based on Matlab models
  - Research model executes 30 seconds of recordings in 20 minutes on an Intel i7 8 cores machine
  - ▲ Target is a battery-operated device to process 30 seconds of recordings in maximum 30 seconds
  - ▲ Algorithm implementation uses both FPGA fabric and dual-core Cortex A9 CPU
  - Manual translation from batch-oriented Matlab model into streaming C++ model





### Power-aware architecture design.

### ▲ Approach:

- ▲ CPU centric application using FPGA based accelerators
- ▲ Profiler to determine critical processes → the wavelet transform >80% load
- ▲ WT applied 5x and 1x iWT , implemented double precision floating point
- ▲ High-level synthesis applied for WT implementation as accelerator
- ▲ Datatype casting effects analyzed in Matlab environment
- ▲ Observations:
  - ▲ Double precision floating point implementation @ 200MHz data path speed
    - ▲ Execution time: too long, battery lifetime: way to short, enclosure gets hot
  - ▲ Transformation to single precision floating point @ 200MHz data path speed
    - ▲ Execution time: ok, battery lifetime: still to short, algorithm quality still fine
  - Transformation to fixed-point implementation
    - ▲ Data path speed reduced to 100MHz
    - ▲ Execution time: ok, battery lifetime: ok, algorithm quality still fine after a twist
  - ▲ Application on CPU hardly touched
    - ▲ Possible application running on FPGA fabric more power consuming
  - ▲ Power performance improved by factor of ~10





### Ultrasound steel plate inspection.

### ▲ Application field

- ▲ Steel plate inspection system based on ultrasound technology
- In-line detection of cracks and bubbles in metal plates during production
- ▲ Software application for visualization, management and controls

### ▲ Functionality

- ▲ Data acquisition, pre-processing and communication
- ERP integrated system software
- ▲ Integration with steel factory shop floor control

### Platform

- Ultrasound sensor array
- ▲ Carrier board design based on TOPIC SOM (Miami Zynq Plus)
- ▲ 96 channels analog ultrasound signal acquisition
- ▲ FPGA based signal processing/gigabit communication
- ▲ Linux supported data and communication management







### ΤΟΡΙΟ



# **Precision timing**





# Distributed Real Time Systems.

▲ Systems need to execute operations with ever tighter time constraints

- ▲ Larger distances between nodes give longer transmission delays
- ▲ Management/calibration of changes to the number of nodes is not easy



### Precision Timing.

▲ Synchronizes time between network connected devices within a specific time resolution

- ▲ PTP = Precision Time Protocol IEEE 1588-2019
  - See "tenart-timestamping-and-ptp-in-linux.pdf" for details on the integration in the Linux kernel
  - ▲ Support for both HW and SW based PTP synchronization
- ▲ SyncE = Synchronous Ethernet ITU-T Rec. G.8261/8262/8264
  - ▲ Based on ITU-T G.813 clocks
  - ▲ Key: accuracy, noise transfer, holdover performance, noise tolerance/generation
- ▲ White Rabbit = IEEE 1588 HA default PTP implementation
  - ▲ Common notion of time in the entire network
  - ▲ Synchronous Ethernet (SyncE) limited resolution and precision
  - ▲ PTP (enhanced PTP or White Rabbit PTP) base principal technology
  - ▲ Digital Dual-Mixer Time Difference (DDMTD) phase detection (unknown link symmetry compensation)
  - See "HighAccuracyDefaultPTPProfile.pdf" for a general context
  - ▲ See "WR\_Maciej\_ALBA.v0.3.pdf" for more depth

### Embedding White Rabbit.





### #1: Low-cost White Rabbit node.

TOPIC

### ▲ M.2 form factor NIC

- ▲ Target = 30mm x 110mm (V1R0 **not** yet compliant to this form)
- Alternative SYZYGY connector under investigation
- ▲ Next variant: without FPGA as add-on for exiting SOMs
- ▲ Based 100% on CERN WR
- ▲ Uses latest AMD low-cost FPGA technology

| В                      | bard functionality                                        |
|------------------------|-----------------------------------------------------------|
| SFP(+) cage            | FPGA functionality<br>- Grandmaster mode<br>- Master mode |
| SFP(+) cage (optional) | - Slave mode M.2                                          |
|                        | iguration Power Debug<br>mory(s) supplies infrastructure  |

### #2: White Rabbit switch converter.



### WR IP core building blocks.



# #3: Precision Timing and Miami SOMs.

#### programmable TCXO

#### accurate PLL

4x EMAC with IEEE1588 support

4x4 transceiver banks for SFP support

> Ethernet PHY with SyncE capabilities



#### Dual core RT processor

#### Quad core CPU + Linux

External clock sync (SyncE)

Ethernet PHY with SyncE capabilities

### Conclusion and take-away.

▲ The value of using System-on-Modules is not in the cost of it

- ▲ Reduction of development time
- ▲ Focus of what really matters for your application
- ▲ Key drivers for SOM design-ins
  - ▲A good means to design-reuse
  - Reducing board/application design complexity
  - ▲ Simplified product life-cycle-management
  - ▲ Focus on functionality instead of design complexity
  - ▲ Follow technology advancements more easy

### Contact us.

Materiaalweg 4, 5681 RJ BEST, The Netherlands



+31 499 336979



www.topicembedded.com



contact@topic.nl, dirk.van.den.heuvel@topic.nl



www.linkedin.com/company/topic-embedded-systems