Speaker
P. Sheldon
(VANDERBILT UNIVERSITY)
Description
The BTeV experiment, a proton/antiproton collider experiment at the Fermi National
Accelerator Laboratory, will have a trigger that will perform complex computations
(to reconstruct vertices, for example) on every collision (as opposed to the more
traditional approach of employing a first level hardware based trigger). This
trigger requires large-scale fault adaptive embedded software: with thousands of
processors involved in performing event filtering in the trigger farm fault
conditions must be given proper treatment. Without fault mitigation, it is
conceivable that the trigger system will experience failures at a high enough rate to
have an unacceptable negative impact on BTeV's physics goals. The RTES (Real Time
Embedded Systems) collaboration is a group of physicists, engineers, and computer
scientists working to address the problem of reliability in large-scale clusters with
real-time constraints such as this. Resulting infrastructure must be highly scalable,
verifiable, extensible by users, and dynamically changeable. An initial prototype has
been built to test design ideas and methods for the final system, and a larger scale
and more ambitious prototype is currently under construction. I will discuss the
lessons learned from these prototypes as well as the overall design and deliverables
for the BTeV experiment.
Author
P. Sheldon
(VANDERBILT UNIVERSITY)