The continuing advance in digital processing performance should allow CMS to build a much more powerful, yet simpler and easier to maintain trigger than we currently have. However, before embarking on such a project it would be wise to evaluate how this technology can be best deployed because it has characteristics which are different from the technologies used in the past. For example, high speed serial links are an excellent way to bring large volumes of data into FPGAs, but they have a high latency, typically 100-200ns. It is therefore essential that the number of serialisation stages be kept to a minimum. Consequently, all our new designs are based on just 3 stages (2 Regional/Global Trigger stages followed by the Global Trigger). This would result in 3 serial links, excluding data transmission from the detector. Constraints such as these have led us to re-evaluate how best to implement a new trigger and what sort of architecture would best suit it.
The Regional and Global Calorimeter Triggers are one of the most challenging aspects of CMS because the large data volume of several Tb/s has not just to be processed, but also (a) the data must be shared/duplicated between processing nodes to satisfy boundary constraints (b) the resulting physics objects need to be sorted in order of significance (c) this must be achieved within a latency budget of ~1μs.
The data sharing is a particularly significant constraint, which has in the past required complex systems/backplanes to share/duplicate data between processing nodes. An elegant way to reduce this problem was first proposed in by John Jones (SLHC Workshop, FermiLab, 2009). It was based on a time multiplexed trigger in which data from a single bunch crossing (bx) is concatenated and delivered to a processing system over several bx. This approach requires several processing systems operating in a round-robin fashion (i.e. processing system 1, takes bx = n, processing system 2, takes bx = n+1). We currently envisage approximately 10 processing systems. The major advantage with this approach is that the whole system becomes much more efficient because the ratio of the area processed to the boundary area is substantially increased. This results in fewer cards, which also makes the subsequent sort simpler.
The obvious drawback with this approach is that there is an immediate latency increase to time multiplex the data; however we expect this to be offset by the ability to be build a much more compact trigger, requiring fewer serialisation stages. The system also has other advantages. For example, it is possible to prototype the entire trigger system with just 10% of the hardware. It also offers redundancy because if one of the processing systems were to fail the data could be redirected to a backup system.
To help answer whether such a scheme is feasible we have developed a double width, full height AMC card, MINI-T5, to prototype new trigger designs (i.e. what is the minimum practical latency of a 5-6Gb/s link, what FPGA resources do we require, etc). The card is compatible with either a Xilinx XC5VTX150T or XC5VTX240T FPGA and has 32 links routed to SNAP12/QSFP optics on the front panel. As soon as the prototype is tested we intend to manufacture more cards so that we can populate a μTCA crate and thus build a demonstrator trigger system which we can load with new trigger algorithms currently under development. This hardware platform (MINI-T5) can also be used to develop and evaluate firmware for a traditional (non-time multiplexed) trigger system.