Speaker
Dr
Sebastien Binet
(LBNL)
Description
LHC experiments are entering in a phase where optimization in view of
data taking as well as robustness' improvements are of major importance. Any
reduction in event data size can bring very significant savings in the
amount of hardware (disk and tape in particular) needed to process
data. Another area of concern and potential major gains is reducing
the memory size and I/O bandwidth requirements of processing nodes,
especially with increasing usage of multi-core CPUs.
LHC experiments are already collecting abundant
performance information about event size, memory and CPU usage, I/O
compression and bandwidth requirements. What is missing is a coherent
set of tools to present this information in a tailored fashion to
release coordinators, package managers and physics algorithm
developers.
This paper describes such a toolkit that we are developing in the
context of ATLAS computing to harvest performance monitoring
information from an extensible set of sources. The challenge is to
map performance data with an immediate impact on hardware costs into entities
which are relevant for the various users. For example the toolkit allows an
ATLAS data model developer to evaluate the impact on resource usage throughout
the entire software pipeline of their design decisions for an event data
object.
We present the data in a way that highlights potential areas of
concerns, allowing experts to drill down to the level of detail they
need (the size of a data member of a class, the CPU usage of a
component method). A configurable monitoring system allows to set off
alarms when a quantity or an histogram goes out of a specified range.
This allows a release coordinator to monitor e.g. the global
size of a data stream throughout a development cycle and have the
developers correct a problem well before a release goes into
production.
Submitted on behalf of Collaboration (ex, BaBar, ATLAS) | ATLAS |
---|
Primary author
Dr
Sebastien Binet
(LBNL)