Speaker
Description
Scientific experiments rely on machine learning at the edge to process extreme volumes of real-time streaming data. Extreme edge computation often requires robustness to faults, e.g., to function correctly in high radiation environments or to reduce the effects of transient errors. As such, the computation must be designed with fault tolerance as a primary objective. FKeras is a tool that assesses the sensitivity of machine learning parameters to faults. FKeras uses a metric based on the Hessian of the neural network loss function to provide a bit-level ranking of neural network parameters with respect to their sensitivity to transient faults. FKeras is a valuable tool for the co-design of robust and fast ML algorithms. It guides and accelerates fault injection campaigns for single and multiple-bit flip error models. It analyzes the resilience of a neural network under single and multiple bit-flip fault models. It helps evaluate the fault tolerance of a network architecture, enabling co-design that considers fault tolerance alongside performance, power, and area. By quickly identifying the sensitive parameters, FKeras can determine how to protect neural network parameters selectively.