Speaker
Description
Detailed analysis
Recently we introduced extensions to the CREAM Computing Element and the L&B service, which unify the view on jobs executed by CREAM regardless of their submission path (via gLite WMS or directly to CREAM). CREAM is able to distinguish direct submission; in this case the incoming job is registered with L&B first. In both scenarios CREAM logs events on progress of job execution as well as possible failures to L&B. This information is processed at L&B into a view on overall job state which is consistent between WMS and CREAM-only jobs.
On the other hand, the RTM was modified to receive notifications on job state changes from L&B, rather than extracting job state information from raw data in L&B database as before. Besides improving overall reliability, this binding is done on the level of L&B job state which is already common to both WMS and CREAM-only jobs. Therefore the RTM needn't make any further distinction between different job types.
Impact
Some of the grid users prefer using their workload management systems (e.g. Atlas Panda) bypassing gLite WMS and submitting jobs directly to CEs. The amount of workload distributed to grid sites in this way is not negligible.
Our work, by unifying the view on all grid jobs (going through WMS or directly to CE) at the level of L&B, enables the uniform monitoring of al lgrid jobs with high-level tools like RTM . Consequently, the real-time view on the grid state is considerably improved. Additional benefit is extending the time span of CE job data (CREAM purges them soon after job completion), enabling better post-mortem analysis of problems etc.
The RTM is to date the only system which has access to distributed L&B servers worldwide. This makes in an important tool not only for dissemination purposes and individual users, but also for large experimental communities delivering a single monitoring entry point. Data collected by the RTM may also be analysed off-line providing an opportunity to study performance of the GRID in greater detail. Adding direct submissions to the RTM monitoring system will make this tool more attractive also to communities not using WMS resources in their work.
Conclusions and Future Work
The described work is mostly integration. With a relatively small effort we were able to put recently developed pieces of code together to bring additional considerable benefits. Besides the practical desirable results of a more accurate RTM view on the EGEE grid, this is a positive improvement on the overall architecture, with L&B as the glue monitoring service between various job types and the massively exposed high-level tools. Therefore in the near future we will concentrate on hardening the existing prototype towards production quality, and on its wide-scale deployment on the infrastructure.
URL for further information | http://egee.cesnet.cz/cms/export/sites/egee/cs/info/{UF4-poster-rtm.pdf,CREAM-poster.pdf} |
---|---|
Keywords | CREAM, L&B, RTM |