lecture WLCG-OSG-EGEE Operations meeting
Date/Time: Monday, 24 September 2007 - 16:00 (Europe/Zurich)
Location: CERN conferencing service (joining details below) ( 28-R-15 )
Chairperson: Nick Thackray, Steve Traylen
Description: grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
  • OSG operations team
  • EGEE operations team
  • EGEE ROC managers
  • WLCG coordination representatives
  • WLCG Tier-1 representatives
  • other site representatives (optional)
  • GGUS representatives
  • VO representatives
  • To dial in to the conference:
    a. Dial +41227676000
    b. Enter access code 0157610

    OR click HERE

    NB: Reports were not received in advance of the meeting from:

  • ROCs: France, Northern Europe. SouthWestern Europe.
  • VOs: CMS, ALICE and ATLAS
  • Material: Minutes link

     
     Monday, 24 September 2007
     16:00
    Feedback on last meeting's minutes (5')    
     16:01
    EGEE Items (29')    
    • Grid-Operator-on-Duty handover
      From: ROC Germany-Switzerland / ROC CentralEurope
      To: ROC South-Western/ ROC France


      NB: Please can the grid ops-on-duty teams submit their reports no later than 12:00 UTC (14:00 Swiss local time).

      Issues:
      1. (CE ROC)Backup team
        Ticket summary 
        extended   : 41
        opened     : 55
        closed     : 12
        2-nd mail  : 4
        
        total        112
        

        Problem with Dashboard - tickets list is not refreshing itself since Wednesday. Notification was send.
     
    • PPS Report & Issues
      PPS reports were not received from these ROCs:
      AP, FR, IT, NE, SWE

      Issues from EGEE ROCs:
      1. Nothing to report
      Release News:
      • Release of gLite 3.1.0 Update06 to PPS done
        • new voms certificate for US-ATLAS server (sync to production)
        • uberftp for the glite-UI node
        • lcg-tags added to glite 3.1 UI, WN
        • lcg-infosites added to the glite 3.1 WN
      • Javier Lopez and Esteban Freire from PPS-CESGA have joined the PPS Coordination team. Their main task is to follow-up the roll-out of the gLite middleware updates from Certification to PPS
     
    • EGEE issues coming from ROC reports
      None.
     
    • The lcg-RB to be moved no maintanece mode. (15')
      JRA1 and SA3 have now moved the old lcg-RB to zero maintenance mode. glite 3.1 WMS
      whether on SL3 or SL4 is the supported solution.
     
     16:30
    WLCG Items (30')    
    • Tier 1 reports
      • GridKa
        SRM lockups. Possibly caused by memory subsystem interaction with the java vm. Severity: moderate
      • BNL-LCG
        Monday 17 - Thursday 20
        Users job get stuck
        Cause: too few movers available on some pool nodes
        Severity: one user affected
        Remediation: increase the number of movers
        Problems: 
        dCache has bad local account mappings for some critical users.  It caused USATLAS/ATLAS data transfer failures. 
        
        Cause: 
        We did an in-place GUMS (Grid user management system) upgrade on Thursday afternoon. Some critical users were mapped to wrong accounts due to the configuration file problem.   This problem was undetected because GUMS still provided  mapping service, and the generated grid map appeared to be "OK" while it was not.   The symptom did not show up several hours later until midnight when dCache regenerated its map file from the GUMS server, and experienced data transfer failures thereafter.  Even the GUMS update had been properly announced,  we might not be able to discover this type of problem.   
        
        Cause:  A fraction of USATLAS data transfer failed between midnight and 10:30AM, Friday monitoring.
        
        Solution:
        
        Short term, we corrected the configuration file errors, and let dCache regenerate dCache map file, recovered the data transfer problem.  In order to fix this problem and prevent the future occurrence,  we will do two improvements:
        
        1) The dCache team might want to consider regenerating the grid map file during prime business hour, i.e. 9:00AM, and 3:00PM.
        2) We will develop Nagios probes to validate the certificate mapping for the critical users:  Nurcan''''s production certificate, Hiro''''s data transfer certificates.  (Please let us know  any other critical certificates). 
        
        
        Main symptom was a slower throughput to HPSS. Files kept being flushed
        to HPSS and stayed precious.
        Cause: due to a Solaris/Linux difference, the script that stages files
        to HPSS is not recognizing flushed files correctly and was not working
        reliably on the Thumper
        Severity: Apart from a minor slowdown, few files got corrupted. In the
        case that a file was written, delete, and then rewritten with the same
        name, it could happen that the old copy was the one actually kept in
        HPSS.
        Remediation: Thumper assigned only as a read node
        Long term solution: need to do more work to guarantee that Thumpers in
        the write pool work reliably
        
        Friday 14 - Saturday 15
        Production had problems reading files.
        Cause: no pool was actually assigned to production, because of the
        reconfiguration done prior to the HPSS upgrade
        Severity: USATLAS production affected
        Remediation: reset the configuration
        
        dCache went down
        Cause: unknown - still investigating
        Severity: site down
        Remediation: restart dCache core servers
        
        Saturday 15
        User client hang because of suspended requests
        Cause: due to HPSS upgrade instability, many requests were suspended
        Severity: USATLAS production affected
        Remediation: retry the requests
        
        Ongoing
        File disappear. More so during HPSS upgrade.
        Cause: still unknown - user activity is primary suspect
        Severity: some files are lost
        Remediation: manually retrieve the list of file lost, and clean up the
        data catalog so users do not request files that are not available
        
        Monday 17 - Thursday 20
        Users job get stuck
        Cause: too few movers available on some pool nodes
        Severity: one user affected
        Remediation: increase the number of movers
        Monday 17 - Thursday 20
        Users job get stuck
        Cause: too few movers available on some pool nodes
        Severity: one user affected
        Remediation: increase the number of movers
        
        Friday 21
        dCache was down
        Cause: power failure in the facility brought down, and the UPS servicing
        that rack was not working properly. One of the machines in the rack was
        the PNFS servers.
        Severity: system down for less than an hour
        Remediation: system restarted
        Long term solution: fix the UPS
        
      • USCMS-FNAL-WC1
        Test shows SE and SRM down, but this is not true. Many ongoing transfers.
     
    • WLCG issues coming from ROC reports
      None.
     
     
    • FTS service review Paper word file pdf file  

      Please read the report linked to the agenda.

    Gavin McCance (CERN) Steve Traylen  
    • ATLAS service
      See also https://twiki.cern.ch/twiki/bin/view/Atlas/TierZero20071 and https://twiki.cern.ch/twiki/bin/view/Atlas/ComputingOperations for more information.

      • From Rod Walker
        A user with EMail in the DN had problems at 2 sites, SARA and BNL, to access dcache. I think it`s the old problem of java GSI chewing up the EMail part. The workaround put various permutations in the dcache mapfile. The response from SARA indicates that they have fewer permutations than TRIUMF. TRIUMF has
        # rpm -qf /opt/d-cache/bin/grid-mapfile2dcache-kpwd d-cache-lcg-6.2.0-1.noarch
        Still not clear what is in BNL. In SARA maybe there is a log saying which DN was refused. This issue is reported to clarify what should be in the mapfile, and whether all sites provide this.
      • From Campana et al.
        In many sites there is a mismatch between all inclusive information published for a CE and information published in the VOViews. As an example, the queue mentioned below supports only ATLAS, therefore, the number of waiting jobs in the inclusive view should be the same as the one for the ATLAS VoView. But it is not. The VOView publishes all zeroes. Moreover, there are some queues where the number of waiting jobs for all views do not add up to the total published in the inclusive view. In total more than 130 ATLAS queues are affected, among which almost all T1s. Since the WMS uses information in the VOView and the latest one is generally the wrongly published one, ATLAS is submitting jobs almost randomly with accumulation of jobs at small sites. The issue is extremely severe. We would like the operation team to investigate the reason for so many mismatches and chase site by site to have the problem cured. Attached below is a list of currently problematic CEs as of today
      • The average ATLAS job requires 1.1GB of memory per core. CERN publishes 1GB of RAM, therefore CERN is empty of ATLAS production jobs. We would like to ask CERN to evaluate if 1GB ram is completely realistic. In case it is, we should start thinking about publishing different subclusters for different types of machines.
      • The requirements for the size of the ATLAS SW installation area have been discussed 4 years ago, and they are surely obsolete. The ATLAS SW manager would like to ask for 100 GB of shared space for ATLAS where to install software at every ATLAS T1 and T2 site.
        
        # tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas, RO-02-NIPNE, grid
        dn:
        GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas,mds-vo-
        name=RO
         -02-NIPNE,o=grid
        objectClass: GlueCETop
        objectClass: GlueCE
        objectClass: GlueSchemaVersion
        objectClass: GlueCEAccessControlBase
        objectClass: GlueCEInfo
        objectClass: GlueCEPolicy
        objectClass: GlueCEState
        objectClass: GlueInformationService
        objectClass: GlueKey
        GlueCEHostingCluster: tbat01.nipne.ro
        GlueCEName: atlas
        GlueCEUniqueID: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas
        GlueCEInfoGatekeeperPort: 2119
        GlueCEInfoHostName: tbat01.nipne.ro
        GlueCEInfoLRMSType: torque
        GlueCEInfoLRMSVersion: 2.1.6
        GlueCEInfoTotalCPUs: 44
        GlueCEInfoJobManager: lcgpbs
        GlueCEInfoContactString: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas
        GlueCEInfoApplicationDir: /opt/exp_soft
        GlueCEInfoDataDir: unset
        GlueCEInfoDefaultSE: tbat05.nipne.ro
        GlueCEStateEstimatedResponseTime: 390175
        GlueCEStateFreeCPUs: 3
        GlueCEStateRunningJobs: 29
        GlueCEStateStatus: Production
        GlueCEStateTotalJobs: 750
        GlueCEStateWaitingJobs: 721
        GlueCEStateWorstResponseTime: 186883200
        GlueCEStateFreeJobSlots: 0
        GlueCEPolicyMaxCPUTime: 2880
        GlueCEPolicyMaxRunningJobs: 93
        GlueCEPolicyMaxTotalJobs: 0
        GlueCEPolicyMaxWallClockTime: 4320
        GlueCEPolicyPriority: 1
        GlueCEPolicyAssignedJobSlots: 0
        GlueCEAccessControlBaseRule: VO:atlas
        GlueForeignKey: GlueClusterUniqueID=tbat01.nipne.ro
        GlueInformationServiceURL: ldap://tbat01.nipne.ro:2135/mds-vo-
        name=local,o=gri
         d
        GlueSchemaVersionMajor: 1
        GlueSchemaVersionMinor: 2
        
        # atlas, tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas, RO-02-NIPNE,
        grid
        dn:
        
        GlueVOViewLocalID=atlas,GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-
        lcg
         pbs-atlas,mds-vo-name=RO-02-NIPNE,o=grid
        objectClass: GlueCETop
        objectClass: GlueVOView
        objectClass: GlueCEInfo
        objectClass: GlueCEState
        objectClass: GlueCEAccessControlBase
        objectClass: GlueCEPolicy
        objectClass: GlueKey
        objectClass: GlueSchemaVersion
        GlueVOViewLocalID: atlas
        GlueCEAccessControlBaseRule: VO:atlas
        GlueCEStateRunningJobs: 0
        GlueCEStateWaitingJobs: 0
        GlueCEStateTotalJobs: 0
        GlueCEStateFreeJobSlots: 15
        GlueCEStateEstimatedResponseTime: 0
        GlueCEStateWorstResponseTime: 0
        GlueCEInfoDefaultSE: tbat05.nipne.ro
        GlueCEInfoApplicationDir: /opt/exp_soft/atlas
        GlueCEInfoDataDir: unset
        GlueChunkKey: GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-lcgpbs-
        atlas
        GlueSchemaVersionMajor: 1
        GlueSchemaVersionMinor: 2
        

        [campanas@lxb0709 BDII]$ python VOViewsConsist.py | grep '==>'
        ===> CE:atlasce.lnf.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:27     TOTVOrun:0    TOTwait:531     TOTVOwait:0
        ===> CE:atlasce.phys.sinica.edu.tw:2119/jobmanager-lcgcondor-atlas TOTrun:1     TOTVOrun:0    TOTwait:20     TOTVOwait:0
        ===> CE:atlasce01.na.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:18     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:atlasce01.na.infn.it:2119/jobmanager-lcgpbs-atlas_short TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:bigmac-lcg-ce.physics.utoronto.ca:2119/jobmanager-lcgcondor-atlas TOTrun:12     TOTVOrun:0    TOTwait:3     TOTVOwait:4444
        ===> CE:cclcgceli02.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:4     TOTVOrun:2    TOTwait:0     TOTVOwait:0
        ===> CE:cclcgceli04.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:6     TOTVOrun:3    TOTwait:0     TOTVOwait:0
        ===> CE:cclcgceli05.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:14     TOTVOrun:7    TOTwait:2     TOTVOwait:1
        ===> CE:ce.bfg.uni-freiburg.de:2119/jobmanager-pbs-atlas TOTrun:18     TOTVOrun:16    TOTwait:0     TOTVOwait:0
        ===> CE:ce.epcc.ed.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:29     TOTVOwait:0
        ===> CE:ce.gina.sara.nl:2119/jobmanager-pbs-medium TOTrun:141     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:ce.gina.sara.nl:2119/jobmanager-pbs-short TOTrun:14     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce.hpc.csie.thu.edu.tw:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:3     TOTVOwait:2
        ===> CE:ce.keldysh.ru:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce.phy.bg.ac.yu:2119/jobmanager-pbs-atlas TOTrun:14     TOTVOrun:0    TOTwait:16     TOTVOwait:2
        ===> CE:ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-atlas TOTrun:11     TOTVOrun:9    TOTwait:0     TOTVOwait:0
        ===> CE:ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr TOTrun:177     TOTVOrun:3717    TOTwait:0     TOTVOwait:0
        ===> CE:ce001.grid.uni-sofia.bg:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:5    TOTwait:0     TOTVOwait:0
        ===> CE:ce01-lcg.projects.cscs.ch:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce01.afroditi.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:5     TOTVOrun:2    TOTwait:2     TOTVOwait:1
        ===> CE:ce01.ariagni.hellasgrid.gr:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce01.athena.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce01.ific.uv.es:2119/jobmanager-pbs-atlas TOTrun:29     TOTVOrun:0    TOTwait:299     TOTVOwait:1
        ===> CE:ce01.ific.uv.es:2119/jobmanager-pbs-atlasL TOTrun:28     TOTVOrun:0    TOTwait:295     TOTVOwait:1
        ===> CE:ce01.kallisto.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:17     TOTVOrun:16    TOTwait:0     TOTVOwait:0
        ===> CE:ce01.marie.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:10     TOTVOrun:6    TOTwait:2     TOTVOwait:0
        ===> CE:ce02.athena.hellasgrid.gr:2119/blah-pbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce02.lip.pt:2119/jobmanager-lcgsge-atlasgrid TOTrun:8     TOTVOrun:6    TOTwait:0     TOTVOwait:0
        ===> CE:ce02.marie.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:3
        ===> CE:ce03-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:29     TOTVOrun:6    TOTwait:0     TOTVOwait:0
        ===> CE:ce04-lcg.cr.cnaf.infn.it:2119/blah-lsf-atlas TOTrun:10     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce05-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-slc4_debug TOTrun:881     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-atlastest TOTrun:4     TOTVOrun:0    TOTwait:163     TOTVOwait:0
        ===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19     TOTVOrun:3    TOTwait:0     TOTVOwait:0
        ===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:4
        ===> CE:ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:9     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-debug TOTrun:9     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ce06.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19     TOTVOrun:3    TOTwait:0     TOTVOwait:0
        ===> CE:ce06.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:4
        ===> CE:ce07.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19     TOTVOrun:3    TOTwait:0     TOTVOwait:0
        ===> CE:ce07.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:4
        ===> CE:ce1-egee.srce.hr:2119/jobmanager-sge-dteam TOTrun:12     TOTVOrun:12    TOTwait:0     TOTVOwait:4444
        ===> CE:ce1.egee.fr.cgg.com:2119/jobmanager-lcgpbs-atlas TOTrun:10     TOTVOrun:0    TOTwait:2     TOTVOwait:0
        ===> CE:ce1.triumf.ca:2119/jobmanager-lcgpbs-atlas TOTrun:18     TOTVOrun:9    TOTwait:0     TOTVOwait:0
        ===> CE:ce101.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:2     TOTVOwait:2
        ===> CE:ce102.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:2     TOTVOwait:2
        ===> CE:ce106.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:0     TOTVOwait:0
        ===> CE:ce107.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas TOTrun:100     TOTVOrun:99    TOTwait:0     TOTVOwait:0
        ===> CE:ce107.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:6     TOTVOrun:3    TOTwait:11     TOTVOwait:5
        ===> CE:ce108.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:0     TOTVOwait:0
        ===> CE:ce123.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:0     TOTVOwait:0
        ===> CE:ce2.triumf.ca:2119/jobmanager-lcgpbs-atlas TOTrun:18     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ceitep.itep.ru:2119/jobmanager-lcgpbs-atlas TOTrun:2     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:clrlcgce02.in2p3.fr:2119/jobmanager-lcgpbs-atlas TOTrun:12     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:cs-grid0.bgu.ac.il:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:31     TOTVOwait:0
        ===> CE:cs-grid1.bgu.ac.il:2119/blah-pbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:31     TOTVOwait:0
        ===> CE:dgce0.icepp.jp:2119/jobmanager-lcgpbs-atlas TOTrun:16     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:egee.irb.hr:2119/jobmanager-lcgpbs-grid TOTrun:16     TOTVOrun:15    TOTwait:0     TOTVOwait:0
        ===> CE:epgce1.ph.bham.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:457     TOTVOwait:0
        ===> CE:epgce1.ph.bham.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:3     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:7    TOTwait:0     TOTVOwait:0
        ===> CE:fornax-ce.itwm.fhg.de:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:g03n02.pdc.kth.se:2119/jobmanager-pbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:gcn54.hep.physik.uni-siegen.de:2119/jobmanager-lcgpbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:glite-ce-01.cnaf.infn.it:2119/blah-pbs-lcg TOTrun:3     TOTVOrun:3    TOTwait:3     TOTVOwait:0
        ===> CE:glite-ce01.marie.hellasgrid.gr:2119/blah-pbs-atlas TOTrun:10     TOTVOrun:6    TOTwait:2     TOTVOwait:0
        ===> CE:golias25.farm.particle.cz:2119/jobmanager-lcgpbs-lcgatlas TOTrun:2     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-dg_long TOTrun:0     TOTVOrun:0    TOTwait:2     TOTVOwait:0
        ===> CE:grid-ce.rzg.mpg.de:2119/jobmanager-sge-long TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:26664
        ===> CE:grid-ce3.desy.de:2119/jobmanager-lcgpbs-default TOTrun:143     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:grid-ce3.desy.de:2119/jobmanager-lcgpbs-testing TOTrun:2     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:grid.uibk.ac.at:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:grid0.fe.infn.it:2119/jobmanager-lcgpbs-lcg TOTrun:7     TOTVOrun:4    TOTwait:0     TOTVOwait:0
        ===> CE:grid001.fi.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:12    TOTwait:0     TOTVOwait:0
        ===> CE:grid002.ca.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:0     TOTVOrun:0    TOTwait:36     TOTVOwait:10
        ===> CE:grid002.jet.efda.org:2119/jobmanager-lcgpbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:grid003.roma2.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:30     TOTVOrun:29    TOTwait:0     TOTVOwait:0
        ===> CE:grid01.cu.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:5    TOTwait:0     TOTVOwait:0
        ===> CE:grid109.kfki.hu:2119/jobmanager-lcgpbs-atlas TOTrun:4     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite TOTrun:51     TOTVOrun:0    TOTwait:287     TOTVOwait:0
        ===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long TOTrun:13     TOTVOrun:0    TOTwait:84     TOTVOwait:0
        ===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-atlas TOTrun:2     TOTVOrun:1    TOTwait:3     TOTVOwait:1
        ===> CE:gridce.pi.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-lcg TOTrun:3     TOTVOrun:3    TOTwait:3     TOTVOwait:0
        ===> CE:grim-ce.iucc.ac.il:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:hep-ce.cx1.hpc.ic.ac.uk:2119/jobmanager-pbs-heplt2 TOTrun:274     TOTVOrun:5206    TOTwait:353     TOTVOwait:6707
        ===> CE:heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:19     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:8     TOTVOrun:0    TOTwait:2     TOTVOwait:0
        ===> CE:heplnx207.pp.rl.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:19     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:heplnx207.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:8     TOTVOrun:0    TOTwait:2     TOTVOwait:0
        ===> CE:i101.hpc2n.umu.se:2119/jobmanager-lcgpbs-ngrid TOTrun:20     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:ifaece01.pic.es:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1304     TOTVOwait:0
        ===> CE:ifaece01.pic.es:2119/jobmanager-lcgpbs-atlas2 TOTrun:14     TOTVOrun:0    TOTwait:162     TOTVOwait:0
        ===> CE:ituce.grid.itu.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:lapp-ce01.in2p3.fr:2119/jobmanager-pbs-atlas TOTrun:42     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:lcg-ce.lps.umontreal.ca:2119/jobmanager-lcgpbs-atlas TOTrun:10     TOTVOrun:0    TOTwait:171     TOTVOwait:0
        ===> CE:lcg-ce.rcf.uvic.ca:2119/jobmanager-lcgpbs-general TOTrun:14     TOTVOrun:3    TOTwait:0     TOTVOwait:0
        ===> CE:lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-atlas TOTrun:12     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:lcg-ce01.icepp.jp:2119/jobmanager-lcgpbs-atlas TOTrun:10     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:lcg-ce1.ifh.de:2119/jobmanager-lcgpbs-atlas_blade TOTrun:80     TOTVOrun:0    TOTwait:108     TOTVOwait:0
        ===> CE:lcg-lrz-ce.lrz-muenchen.de:2119/jobmanager-sge-atlas TOTrun:4     TOTVOrun:0    TOTwait:0     TOTVOwait:4444
        ===> CE:lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:2     TOTVOwait:0
        ===> CE:lcgce01.jinr.ru:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:lcgce01.phy.bris.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:lcgce01.phy.bris.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:13     TOTVOrun:0    TOTwait:24     TOTVOwait:0
        ===> CE:lcgrid.dnp.fmph.uniba.sk:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:lgdce01.jinr.ru:2119/jobmanager-lcgpbs-atlas TOTrun:2     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min TOTrun:1     TOTVOrun:18    TOTwait:0     TOTVOwait:0
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr TOTrun:40     TOTVOrun:720    TOTwait:1     TOTVOwait:18
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr TOTrun:6     TOTVOrun:108    TOTwait:0     TOTVOwait:0
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr TOTrun:70     TOTVOrun:1260    TOTwait:33     TOTVOwait:594
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min TOTrun:3     TOTVOrun:54    TOTwait:0     TOTVOwait:0
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr TOTrun:14     TOTVOrun:252    TOTwait:1     TOTVOwait:18
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr TOTrun:25     TOTVOrun:450    TOTwait:1     TOTVOwait:18
        ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr TOTrun:99     TOTVOrun:1782    TOTwait:33     TOTVOwait:594
        ===> CE:mu6.matrix.sara.nl:2119/jobmanager-pbs-medium TOTrun:0     TOTVOrun:0    TOTwait:117     TOTVOwait:0
        ===> CE:mu9.matrix.sara.nl:2119/jobmanager-pbs-batch TOTrun:354     TOTVOrun:0    TOTwait:529     TOTVOwait:0
        ===> CE:node001.grid.auth.gr:2119/jobmanager-pbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:paugrid1.pamukkale.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:pc90.hep.ucl.ac.uk:2119/jobmanager-lcgpbs-lcgatlas TOTrun:13     TOTVOrun:0    TOTwait:32     TOTVOwait:0
        ===> CE:serv03.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-atlas TOTrun:3     TOTVOrun:0    TOTwait:4     TOTVOwait:4444
        ===> CE:skurut17.cesnet.cz:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:snowpatch.hpc.sfu.ca:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:274     TOTVOwait:0
        ===> CE:spacin-ce1.dma.unina.it:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:2     TOTVOwait:1
        ===> CE:svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:0    TOTwait:0     TOTVOwait:0
        ===> CE:t2-ce-01.mi.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:8    TOTwait:0     TOTVOwait:0
        ===> CE:t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:3     TOTVOrun:0    TOTwait:7     TOTVOwait:1
        ===> CE:t2ce02.physics.ox.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:7     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        ===> CE:t2ce02.physics.ox.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:0     TOTVOrun:0    TOTwait:2     TOTVOwait:0
        ===> CE:tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas TOTrun:30     TOTVOrun:0    TOTwait:659     TOTVOwait:0
        ===> CE:tbit01.nipne.ro:2119/jobmanager-lcgpbs-atlas TOTrun:20     TOTVOrun:19    TOTwait:0     TOTVOwait:0
        ===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-atlas TOTrun:5     TOTVOrun:3    TOTwait:0     TOTVOwait:0
        ===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-qlong TOTrun:43     TOTVOrun:41    TOTwait:0     TOTVOwait:0
        ===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-qshort TOTrun:7     TOTVOrun:6    TOTwait:0     TOTVOwait:0
        ===> CE:yildirim.grid.boun.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
        
     
    • CMS service
      • No report.
    Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)  
    • LHCb service
      • Issue at CERN still waiting to be answered. (Remedy ticket from Philippe)

        When we run jobs reading files that are on lhcbdata (SRM endpoint srm-durable-lhcb.cern.ch) we expect that the files are actually on the lhcbdata pool and then suddenly available for being opened. However it seems that querying the stager for one of these files its status is STAGEIN We would like to know whether this is an expected behaviour of the CERN durable SE, in which case we shall pass all our jobs through the DIRAC stager in order to cope with this. Our assumption was that most of the analysis jobs accessing TxD1 data would not need to unduly overload the service.

    roberto santinelli (CERN/IT/GD)  
    • ALICE service
      • No report.
    Patricia Mendez Lorenzo (CERN IT/GD)  
    • Service Coordination
      The CMS CSA07 service challenge Tier 0 reconstruction and Tier 1 data export phase should now start on Tuesday 25 September and run
      for 30 days. See https://twiki.cern.ch/twiki/bin/view/CMS/CSA07Plan
    Harry Renshall / Jamie Shiers  
     16:55
    OSG Items (5')    
      1. Discussion of open tickets for OSG.
      2. https://gus.fzk.de/pages/download_escalation_reports_roc.php
     17:00
    Review of action items (5')   list of actions link    
     17:10
    AOB (5')    
    • .