System Analysis Working Group

40/R-D10 (CERN)



Show room on map
Julia Andreeva (CERN)
Job monitoring for analysis and production in ATLAS
During Benjamin's overview of the current UI of the TLAS monitoring of the producton system: Diana: View with the aggregation of errors by categories, region, ... are missing. For ROC managers it was difficult to work with these pages trying to investigate errors. Julia: Do you think that in the new interface would be nice to have an approach similar to one for the ATLAS DDM : from the global view to a very detailed one. Diana: Yes, though the pages with very detailed info are needed for the experts. Diana agreed to summarize suggestions for modification required in the current UI to make it convinient for the ROC managers. Dietrch: Before going to the conrete implementation we need to have another meeting where all interetsed parties would take part and discuss a draft of the layout of the UI to give people an opporunity to express their opinion There was a discussion related to the description of the ATLAS topology, how it is currently described/supported and published. Currently it represents the python module in cvs, describing ATLAS tiers, mapping of the names of the sites in BDII and ATLAS convention. It is supported by several people. No publishing. Ricardo: We are pulling it in the Dashboard every hour and it can be published there. Julia: What about application error codes. Are they described anywhere? Is there a common table (shared by all ATLAS job submission tools both for production and analysis? Dietrich: No , nothing like this exists, there is only a set of job exit codes developed for production and contained in production DB. Julia: For effective application monitoring, there should be a common table with job error codes in the ATLAS scope. Dietrich agreed to coordinate an effort to create such a table. Comment from Diana, about confusing colours on the Dashboard interactive UI (application failures should not be light green, running jobs should not been orange) Analysis jobs monitoring: Provided in Panda for Panda jobs, Ganga for Ganga jobs and in Dashboard in the interactive UI. Though for PANDA is not working for quite a while since their ML server is down. Accounting: The work is in progress by Raquel, the UI to the accounting info will be implemented in the dashboard taking into account ATLAS topology, what is currently missing on the APEL UI. The conclusions: Need another meeting after the layout of the new UI for the production exists in a form of draft ,just on slides. Diana will send her suggestions Dietrich will coordinate the efort of creating description of job exit codes common for analysis and production.
There are minutes attached to this event. Show them.
    • 10:00 10:40
      ATLAS Job monitoring for analysis and production 40m
      Speaker: Benjamin Gaidioz (CERN)