The traditional HEP analysis model uses successive processing steps to reduce the initial dataset to a size that permits real-time analysis. This iterative approach requires significant CPU time and storage of large intermediate datasets and may take weeks or months to complete. Low-latency, query-based analysis strategies are being developed to enable real-time analysis of primary datasets by replacing conventional nested loops over objects with native operations on hierarchically nested, columnar data. Such queries are well-suited to distributed processing using a strategy called function as a service (FaaS).
In this presentation we introduce funcX---a high-performance FaaS platform that enables intuitive, flexible, efficient, and scalable remote function execution on existing infrastructure including clouds, clusters, and supercomputers. A funcX function explicitly defines a function body and dependencies required to execute the function. FuncX allows users, interacting via a REST API, to register and then execute such functions without regard for the physical resource location or scheduler architecture on which the function is executed---an approach we refer to as ``serverless supercomputing.'' We show how funcX can be used to parallelize a real-world HEP analysis operating on columnar data to aggregate histograms of analysis products of interest in real time. Subtasks representing partial histograms are dispatched as funcX requests with expected runtimes of less than a second. Finally, we demonstrate efficient execution of such analyses on heterogeneous resources, including leadership-class computing facilities.
|Consider for promotion||No|