System Analysis Working Group Meeting

14/4-030 (CERN)



Show room on map
This was a first regular meeting after summer and autumn months break, when we had meeting in the smaller groups of developers and prople requesting the application. During last meeting before the break we discussed the ATLAS job monitoring for analyis and production. There was a big progress done over last months done in the development of the new production monitoring for ATLAS and CMS experiments. In case of ATLAS the input informatiion source is ATLAS production system database , the user interface was completely redesigned by Benjamin Gaidioz in the Dashboard framework, preserving all functionality which existed in the old interface but adding new features requested by the users of the system. Currently both interfaces are run in parallel, the complete switch should happen by the end of 2007. In case of CMS , CMS Dashboard is now used as a central repositoriy of the CMS production monitoring data. All DB application related work is handled by the Dashboard team, the UI is developed by the CMS production development team, the web application will talk to the dashboard via API which dashboard team (Irina Sidorova) is currently working on.
There are minutes attached to this event. Show them.
    • 10:30 11:00
      Increasing reliability of the job monitoring application in the Experiment Dashboard

      Currently Job Monitoring application in the dashboard is using multiple sources of information. The primary source working across several infrastructures is Job submission tool via MonALISA. For LCG for job status changes information from LB is used, which is obtained from RGMA or ICRTM. More and more experiment users are using direct condor-g submission. In this case dashboard does not have any additional information source except
      Job Submission Toll and it is not always sufficient. At the same time we noticed that gLite WMS are mostly not monitored by RGMA and ICRBRTM, so an alternative source of information has to be regarded in this case.
      Two topics to be discussed related to it:
      1).How we get job status changes information in case of direct condor_g submission
      2).How we get job status changes information from LB, using for example subscription mechanism.

      • 10:30
        Instrumenting of the condor_g submitter with Dashboard reporting 20m
        Speaker: Sergey Belov (Unknown)
        Sergey will prepare the documentation for installation of the modified version of condor_g and which modifications are required in Job Submission tools and put it on twiki of the System Analysis Working Group. Julia will coordinate with the CMS people further tests.
      • 10:50
        Discussion. How we proceed with testing of the LB subscription. 10m
        How we proceed with testing of the LB subscription This work had just started. Enzo will follow on with his stests keeping Julia and Pablo informed how it is going. In case of success the subscription mechanism is supposed to be used by the Experiment Dashboards in the global VO scope and in CRAB and ProdAgent for getting info about job status changes for jobs submitted by a particular server.
        Speaker: all
        This work is just started. Enzo Miccio will run first tests. In case of success the subscription mechanism should be used by the Dashboard (in global scope for the VO), by Crab and ProdAgent to get status changes for the jobs submitted via a given server.
    • 11:20 11:40
      Requrements for the simple page where CMS analysis usesr can find neccessary information about site where their jobs are supposed to run 20m
      Discussion. CRAB support team consisders that it would be useful to have a simple web page where CRAB analysis users as well as CRAB support team can find in one place answers to the questions: My job was aborted because the corresponding resource is not found , why? My jobs are pending at the site forever, why? All my jobs are failing at a particular site, is there a known problem with the site? The content of the page is more or less clear: Whether the site is in BDII Whether the CMS specific tests fail at the site Whether the site has a required CMSSW software Whether the site is overloaded (number of pending/running jobs) Federica Fanzago will try to define how the page should look like and be navigated and we will discuss it on one of the next meetings.
      Speaker: Federica FANZAGO
    • 11:40 12:00
      New request from ATLAS to display access to data samples per site 20m
      The possibility to get from the Dashboard an XML which would contain information about datasets which are accessed at a given site and how often. Looks like this possibility already exists. Julia sent Dietrich instruction, he will check hthere this is enough. Two other items in the Dietrich slides about information of jobs submitted via PANDA and submitted to NorduGrid requires more detailed discussion and will be discussed at an allocated meeting.
      Speaker: Dietrich Liko (CERN)