26–30 Sept 2016
Karlsruhe Institute of Technology (KIT)
Europe/Zurich timezone

AM06: the Associative Memory chip for the Fast TracKer in the upgraded ATLAS detector

30 Sept 2016, 11:10
25m
Tulla Lecture Hall (Building 11.40)

Tulla Lecture Hall (Building 11.40)

Oral ASIC Plenary

Speaker

Valentino Liberali (Università degli Studi e INFN Milano (IT))

Description

This paper describes the AM06 chip, a highly parallel processor for
pattern recognition in high energy physics. AM06 contains memory banks
that store up to 2^17 patterns made up of 8x18 bit words and integrates
SER/DES IP blocks for 2.4 Gb/s IO to avoid routing congestion.
AM06 combines custom memory arrays, standard logic cells and IP blocks
within a 168 mm^2 silicon area with 421 million transistors and can
perform bitwise comparisons at 1.6 Pbit/s, consuming ~2 fJ/bit per
comparison thanks to an optimized design based on XORAM cells.

Summary

In this paper we describe the AM06 chip, which is a highly parallel ASIC
processor for pattern recognition: its purpose is to find particle
tracks in real-time as part of the Fast TracKer (FTK) processor, which
is being installed in view of the next ATLAS upgrade. Version 6 of the
Associative Memory chip is designed in 65 nm CMOS technology and is
based on XORAM cell architecture. The AM stores segmented data and
finds addresses that match a combination of segments with an input data
sample. Being more than a memory device, it is an engine able to solve a
class of combinatorial problems. The AM06 is tailored for real-time
track finding in high-energy physics (HEP) experiments; however, it can
be used also in many interdisciplinary applications (i.e., general
purpose image filtering and analysis).

The chip has been designed with a mixed approach. AM core cells are
fully customised to optimize area and power consumption. The remaining
logic was been described in VHDL and synthesized in standard cells for
rapid design and verification. Finally, serializer and deserializer IP
blocks were used for data input and output, to avoid routing congestion
at the PCB level. The AM serial data rate is between 2 GHz and 2.4 GHz,
while the clock for parallel data rate inside the chip is 100 MHz.

The AM06 contains a large memory bank that stores all data of interest.
The basic memory unit is a word of 18 bits; a set of 8 words from 8
different layers of the detector is called "pattern". The AM06 contains
2^17 patterns. To reduce fake detection and to increase efficiency, the
AM06 implements an elegant solution: "variable resolution patterns".
De-serialized input data are fed to all memory blocks in parallel. A
priority read-out tree has been used to serialize output results.

The AM06 is a complex VLSI chip with several parameters comparable with
the Intel Core Duo processor. The chip contains 14 different clock
domains, 7 different power domains, about 20 million standard cells, and
about 421 million transistors. The AM06 performs synchronous bitwise
comparison with a rate of about 1.6 Pbit/s. The latency from a
NEXT_EVENT signal to the first pattern readout out is in the range 25-30
clock cycles, referred to the IO clock. An alternative operation mode
reads pattern while hits are loading. In this case the latency is
similar and it is counted from the HIT that fires a pattern to the
first pattern out.

Power consumption has been a key point in the development of the chip. A
tough optimization was performed to reduce energy use to a value of ~2
fJ/bit per comparison. In the future, we plan to design a more powerful
and flexible chip in 28 nm CMOS, with the aim of achieving 2^19 patterns
per chip with an even lower power consumption per comparison per bit.

Presentation materials