WLCG-OSG-EGEE Operations meeting

Name: WLCG-OSG-EGEE Operations meeting
Start: 2007-09-24T16:00:00+02:00
End: 2007-09-24T18:00:00+02:00
Location: CERN conferencing service (joining details below)

Monday 24 Sept 2007, 16:00 → 18:00 Europe/Zurich

28-R-15 (CERN conferencing service (joining details below))

28-R-15

CERN conferencing service (joining details below)

Nick Thackray, Steve Traylen

Description

grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:

OSG operations team

EGEE operations team

EGEE ROC managers

WLCG coordination representatives

WLCG Tier-1 representatives

other site representatives (optional)

GGUS representatives

VO representatives

To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0157610

OR click HERE

NB: Reports were not received in advance of the meeting from:

ROCs: France, Northern Europe. SouthWestern Europe.

VOs: CMS, ALICE and ATLAS

16:00 → 16:05

Feedback on last meeting's minutes 5m
16:01 → 16:30
EGEE Items 29m
- <big> Grid-Operator-on-Duty handover </big>
  From: ROC Germany-Switzerland / ROC CentralEurope
  To: ROC South-Western/ ROC France
  
  NB: Please can the grid ops-on-duty teams submit their reports no later than 12:00 UTC (14:00 Swiss local time).
  
  Issues:
  
  (CE ROC)Backup team
  Ticket summary extended : 41 opened : 55 closed : 12 2-nd mail : 4 total 112
  
  Problem with Dashboard - tickets list is not refreshing itself since Wednesday. Notification was send.
- <big> PPS Report & Issues </big>
  PPS reports were not received from these ROCs:
  AP, FR, IT, NE, SWE
  Issues from EGEE ROCs:
  
  Nothing to report
  Release News:
  
  Release of gLite 3.1.0 Update06 to PPS done
  
  new voms certificate for US-ATLAS server (sync to production)
  uberftp for the glite-UI node
  lcg-tags added to glite 3.1 UI, WN
  lcg-infosites added to the glite 3.1 WN
  
  Javier Lopez and Esteban Freire from PPS-CESGA have joined the PPS Coordination team. Their main task is to follow-up the roll-out of the gLite middleware updates from Certification to PPS
- <big> EGEE issues coming from ROC reports </big>
  
  None.
- <big>The lcg-RB to be moved no maintanece mode.</big> 15m
  
  JRA1 and SA3 have now moved the old lcg-RB to zero maintenance mode. glite 3.1 WMS whether on SL3 or SL4 is the supported solution.

16:30 → 17:00

WLCG Items 30m

<big> Tier 1 reports </big>

GridKa
SRM lockups. Possibly caused by memory subsystem interaction with the java vm. Severity: moderate

BNL-LCG

Monday 17 - Thursday 20
Users job get stuck
Cause: too few movers available on some pool nodes
Severity: one user affected
Remediation: increase the number of movers
Problems:
dCache has bad local account mappings for some critical users. It caused USATLAS/ATLAS data transfer failures.

Cause:
We did an in-place GUMS (Grid user management system) upgrade on Thursday afternoon. Some critical users were mapped to wrong accounts due to the configuration file problem. This problem was undetected because GUMS still provided mapping service, and the generated grid map appeared to be "OK" while it was not. The symptom did not show up several hours later until midnight when dCache regenerated its map file from the GUMS server, and experienced data transfer failures thereafter. Even the GUMS update had been properly announced, we might not be able to discover this type of problem.

Cause: A fraction of USATLAS data transfer failed between midnight and 10:30AM, Friday monitoring.

Solution:

Short term, we corrected the configuration file errors, and let dCache regenerate dCache map file, recovered the data transfer problem. In order to fix this problem and prevent the future occurrence, we will do two improvements:

1) The dCache team might want to consider regenerating the grid map file during prime business hour, i.e. 9:00AM, and 3:00PM.
2) We will develop Nagios probes to validate the certificate mapping for the critical users: Nurcan''''s production certificate, Hiro''''s data transfer certificates. (Please let us know any other critical certificates).

Main symptom was a slower throughput to HPSS. Files kept being flushed
to HPSS and stayed precious.
Cause: due to a Solaris/Linux difference, the script that stages files
to HPSS is not recognizing flushed files correctly and was not working
reliably on the Thumper
Severity: Apart from a minor slowdown, few files got corrupted. In the
case that a file was written, delete, and then rewritten with the same
name, it could happen that the old copy was the one actually kept in
HPSS.
Remediation: Thumper assigned only as a read node
Long term solution: need to do more work to guarantee that Thumpers in
the write pool work reliably

Friday 14 - Saturday 15
Production had problems reading files.
Cause: no pool was actually assigned to production, because of the
reconfiguration done prior to the HPSS upgrade
Severity: USATLAS production affected
Remediation: reset the configuration

dCache went down
Cause: unknown - still investigating
Severity: site down
Remediation: restart dCache core servers

Saturday 15
User client hang because of suspended requests
Cause: due to HPSS upgrade instability, many requests were suspended
Severity: USATLAS production affected
Remediation: retry the requests

Ongoing
File disappear. More so during HPSS upgrade.
Cause: still unknown - user activity is primary suspect
Severity: some files are lost
Remediation: manually retrieve the list of file lost, and clean up the
data catalog so users do not request files that are not available

Monday 17 - Thursday 20
Users job get stuck
Cause: too few movers available on some pool nodes
Severity: one user affected
Remediation: increase the number of movers
Monday 17 - Thursday 20
Users job get stuck
Cause: too few movers available on some pool nodes
Severity: one user affected
Remediation: increase the number of movers

Friday 21
dCache was down
Cause: power failure in the facility brought down, and the UPS servicing
that rack was not working properly. One of the machines in the rack was
the PNFS servers.
Severity: system down for less than an hour
Remediation: system restarted
Long term solution: fix the UPS

USCMS-FNAL-WC1
Test shows SE and SRM down, but this is not true. Many ongoing transfers.

<big> WLCG issues coming from ROC reports </big>

None.
<big>WLCG Service Interventions (with dates / times where known) </big>
Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
Time at WLCG T0 and T1 sites.
<big>FTS service review</big>

Please read the report linked to the agenda.

Speakers: Gavin McCance (CERN), Steve Traylen

Paper

<big> ATLAS service </big>

See also https://twiki.cern.ch/twiki/bin/view/Atlas/TierZero20071 and https://twiki.cern.ch/twiki/bin/view/Atlas/ComputingOperations for more information.

From Rod Walker
A user with EMail in the DN had problems at 2 sites, SARA and BNL, to access dcache. I think it`s the old problem of java GSI chewing up the EMail part. The workaround put various permutations in the dcache mapfile. The response from SARA indicates that they have fewer permutations than TRIUMF. TRIUMF has
# rpm -qf /opt/d-cache/bin/grid-mapfile2dcache-kpwd d-cache-lcg-6.2.0-1.noarch
Still not clear what is in BNL. In SARA maybe there is a log saying which DN was refused. This issue is reported to clarify what should be in the mapfile, and whether all sites provide this.
From Campana et al.
In many sites there is a mismatch between all inclusive information published for a CE and information published in the VOViews. As an example, the queue mentioned below supports only ATLAS, therefore, the number of waiting jobs in the inclusive view should be the same as the one for the ATLAS VoView. But it is not. The VOView publishes all zeroes. Moreover, there are some queues where the number of waiting jobs for all views do not add up to the total published in the inclusive view. In total more than 130 ATLAS queues are affected, among which almost all T1s. Since the WMS uses information in the VOView and the latest one is generally the wrongly published one, ATLAS is submitting jobs almost randomly with accumulation of jobs at small sites. The issue is extremely severe. We would like the operation team to investigate the reason for so many mismatches and chase site by site to have the problem cured. Attached below is a list of currently problematic CEs as of today
The average ATLAS job requires 1.1GB of memory per core. CERN publishes 1GB of RAM, therefore CERN is empty of ATLAS production jobs. We would like to ask CERN to evaluate if 1GB ram is completely realistic. In case it is, we should start thinking about publishing different subclusters for different types of machines.

The requirements for the size of the ATLAS SW installation area have been discussed 4 years ago, and they are surely obsolete. The ATLAS SW manager would like to ask for 100 GB of shared space for ATLAS where to install software at every ATLAS T1 and T2 site.

# tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas, RO-02-NIPNE, grid
dn:
GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas,mds-vo-
name=RO
 -02-NIPNE,o=grid
objectClass: GlueCETop
objectClass: GlueCE
objectClass: GlueSchemaVersion
objectClass: GlueCEAccessControlBase
objectClass: GlueCEInfo
objectClass: GlueCEPolicy
objectClass: GlueCEState
objectClass: GlueInformationService
objectClass: GlueKey
GlueCEHostingCluster: tbat01.nipne.ro
GlueCEName: atlas
GlueCEUniqueID: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas
GlueCEInfoGatekeeperPort: 2119
GlueCEInfoHostName: tbat01.nipne.ro
GlueCEInfoLRMSType: torque
GlueCEInfoLRMSVersion: 2.1.6
GlueCEInfoTotalCPUs: 44
GlueCEInfoJobManager: lcgpbs
GlueCEInfoContactString: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas
GlueCEInfoApplicationDir: /opt/exp_soft
GlueCEInfoDataDir: unset
GlueCEInfoDefaultSE: tbat05.nipne.ro
GlueCEStateEstimatedResponseTime: 390175
GlueCEStateFreeCPUs: 3
GlueCEStateRunningJobs: 29
GlueCEStateStatus: Production
GlueCEStateTotalJobs: 750
GlueCEStateWaitingJobs: 721
GlueCEStateWorstResponseTime: 186883200
GlueCEStateFreeJobSlots: 0
GlueCEPolicyMaxCPUTime: 2880
GlueCEPolicyMaxRunningJobs: 93
GlueCEPolicyMaxTotalJobs: 0
GlueCEPolicyMaxWallClockTime: 4320
GlueCEPolicyPriority: 1
GlueCEPolicyAssignedJobSlots: 0
GlueCEAccessControlBaseRule: VO:atlas
GlueForeignKey: GlueClusterUniqueID=tbat01.nipne.ro
GlueInformationServiceURL: ldap://tbat01.nipne.ro:2135/mds-vo-
name=local,o=gri
 d
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 2

# atlas, tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas, RO-02-NIPNE,
grid
dn:

GlueVOViewLocalID=atlas,GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-
lcg
 pbs-atlas,mds-vo-name=RO-02-NIPNE,o=grid
objectClass: GlueCETop
objectClass: GlueVOView
objectClass: GlueCEInfo
objectClass: GlueCEState
objectClass: GlueCEAccessControlBase
objectClass: GlueCEPolicy
objectClass: GlueKey
objectClass: GlueSchemaVersion
GlueVOViewLocalID: atlas
GlueCEAccessControlBaseRule: VO:atlas
GlueCEStateRunningJobs: 0
GlueCEStateWaitingJobs: 0
GlueCEStateTotalJobs: 0
GlueCEStateFreeJobSlots: 15
GlueCEStateEstimatedResponseTime: 0
GlueCEStateWorstResponseTime: 0
GlueCEInfoDefaultSE: tbat05.nipne.ro
GlueCEInfoApplicationDir: /opt/exp_soft/atlas
GlueCEInfoDataDir: unset
GlueChunkKey: GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-lcgpbs-
atlas
GlueSchemaVersionMajor: 1
GlueSchemaVersionMinor: 2

[campanas@lxb0709 BDII]$ python VOViewsConsist.py | grep '==>'
===> CE:atlasce.lnf.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:27     TOTVOrun:0    TOTwait:531     TOTVOwait:0
===> CE:atlasce.phys.sinica.edu.tw:2119/jobmanager-lcgcondor-atlas TOTrun:1     TOTVOrun:0    TOTwait:20     TOTVOwait:0
===> CE:atlasce01.na.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:18     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:atlasce01.na.infn.it:2119/jobmanager-lcgpbs-atlas_short TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:bigmac-lcg-ce.physics.utoronto.ca:2119/jobmanager-lcgcondor-atlas TOTrun:12     TOTVOrun:0    TOTwait:3     TOTVOwait:4444
===> CE:cclcgceli02.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:4     TOTVOrun:2    TOTwait:0     TOTVOwait:0
===> CE:cclcgceli04.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:6     TOTVOrun:3    TOTwait:0     TOTVOwait:0
===> CE:cclcgceli05.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:14     TOTVOrun:7    TOTwait:2     TOTVOwait:1
===> CE:ce.bfg.uni-freiburg.de:2119/jobmanager-pbs-atlas TOTrun:18     TOTVOrun:16    TOTwait:0     TOTVOwait:0
===> CE:ce.epcc.ed.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:29     TOTVOwait:0
===> CE:ce.gina.sara.nl:2119/jobmanager-pbs-medium TOTrun:141     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:ce.gina.sara.nl:2119/jobmanager-pbs-short TOTrun:14     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce.hpc.csie.thu.edu.tw:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:3     TOTVOwait:2
===> CE:ce.keldysh.ru:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce.phy.bg.ac.yu:2119/jobmanager-pbs-atlas TOTrun:14     TOTVOrun:0    TOTwait:16     TOTVOwait:2
===> CE:ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-atlas TOTrun:11     TOTVOrun:9    TOTwait:0     TOTVOwait:0
===> CE:ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr TOTrun:177     TOTVOrun:3717    TOTwait:0     TOTVOwait:0
===> CE:ce001.grid.uni-sofia.bg:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:5    TOTwait:0     TOTVOwait:0
===> CE:ce01-lcg.projects.cscs.ch:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce01.afroditi.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:5     TOTVOrun:2    TOTwait:2     TOTVOwait:1
===> CE:ce01.ariagni.hellasgrid.gr:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce01.athena.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce01.ific.uv.es:2119/jobmanager-pbs-atlas TOTrun:29     TOTVOrun:0    TOTwait:299     TOTVOwait:1
===> CE:ce01.ific.uv.es:2119/jobmanager-pbs-atlasL TOTrun:28     TOTVOrun:0    TOTwait:295     TOTVOwait:1
===> CE:ce01.kallisto.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:17     TOTVOrun:16    TOTwait:0     TOTVOwait:0
===> CE:ce01.marie.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:10     TOTVOrun:6    TOTwait:2     TOTVOwait:0
===> CE:ce02.athena.hellasgrid.gr:2119/blah-pbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce02.lip.pt:2119/jobmanager-lcgsge-atlasgrid TOTrun:8     TOTVOrun:6    TOTwait:0     TOTVOwait:0
===> CE:ce02.marie.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:3
===> CE:ce03-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:29     TOTVOrun:6    TOTwait:0     TOTVOwait:0
===> CE:ce04-lcg.cr.cnaf.infn.it:2119/blah-lsf-atlas TOTrun:10     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce05-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-slc4_debug TOTrun:881     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-atlastest TOTrun:4     TOTVOrun:0    TOTwait:163     TOTVOwait:0
===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19     TOTVOrun:3    TOTwait:0     TOTVOwait:0
===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:4
===> CE:ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:9     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-debug TOTrun:9     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ce06.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19     TOTVOrun:3    TOTwait:0     TOTVOwait:0
===> CE:ce06.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:4
===> CE:ce07.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19     TOTVOrun:3    TOTwait:0     TOTVOwait:0
===> CE:ce07.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9     TOTVOrun:8    TOTwait:4     TOTVOwait:4
===> CE:ce1-egee.srce.hr:2119/jobmanager-sge-dteam TOTrun:12     TOTVOrun:12    TOTwait:0     TOTVOwait:4444
===> CE:ce1.egee.fr.cgg.com:2119/jobmanager-lcgpbs-atlas TOTrun:10     TOTVOrun:0    TOTwait:2     TOTVOwait:0
===> CE:ce1.triumf.ca:2119/jobmanager-lcgpbs-atlas TOTrun:18     TOTVOrun:9    TOTwait:0     TOTVOwait:0
===> CE:ce101.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:2     TOTVOwait:2
===> CE:ce102.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:2     TOTVOwait:2
===> CE:ce106.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:0     TOTVOwait:0
===> CE:ce107.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas TOTrun:100     TOTVOrun:99    TOTwait:0     TOTVOwait:0
===> CE:ce107.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:6     TOTVOrun:3    TOTwait:11     TOTVOwait:5
===> CE:ce108.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:0     TOTVOwait:0
===> CE:ce123.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10     TOTVOrun:7    TOTwait:0     TOTVOwait:0
===> CE:ce2.triumf.ca:2119/jobmanager-lcgpbs-atlas TOTrun:18     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ceitep.itep.ru:2119/jobmanager-lcgpbs-atlas TOTrun:2     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:clrlcgce02.in2p3.fr:2119/jobmanager-lcgpbs-atlas TOTrun:12     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:cs-grid0.bgu.ac.il:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:31     TOTVOwait:0
===> CE:cs-grid1.bgu.ac.il:2119/blah-pbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:31     TOTVOwait:0
===> CE:dgce0.icepp.jp:2119/jobmanager-lcgpbs-atlas TOTrun:16     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:egee.irb.hr:2119/jobmanager-lcgpbs-grid TOTrun:16     TOTVOrun:15    TOTwait:0     TOTVOwait:0
===> CE:epgce1.ph.bham.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:457     TOTVOwait:0
===> CE:epgce1.ph.bham.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:3     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:7    TOTwait:0     TOTVOwait:0
===> CE:fornax-ce.itwm.fhg.de:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:g03n02.pdc.kth.se:2119/jobmanager-pbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:gcn54.hep.physik.uni-siegen.de:2119/jobmanager-lcgpbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:glite-ce-01.cnaf.infn.it:2119/blah-pbs-lcg TOTrun:3     TOTVOrun:3    TOTwait:3     TOTVOwait:0
===> CE:glite-ce01.marie.hellasgrid.gr:2119/blah-pbs-atlas TOTrun:10     TOTVOrun:6    TOTwait:2     TOTVOwait:0
===> CE:golias25.farm.particle.cz:2119/jobmanager-lcgpbs-lcgatlas TOTrun:2     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-dg_long TOTrun:0     TOTVOrun:0    TOTwait:2     TOTVOwait:0
===> CE:grid-ce.rzg.mpg.de:2119/jobmanager-sge-long TOTrun:1     TOTVOrun:0    TOTwait:0     TOTVOwait:26664
===> CE:grid-ce3.desy.de:2119/jobmanager-lcgpbs-default TOTrun:143     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:grid-ce3.desy.de:2119/jobmanager-lcgpbs-testing TOTrun:2     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:grid.uibk.ac.at:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:grid0.fe.infn.it:2119/jobmanager-lcgpbs-lcg TOTrun:7     TOTVOrun:4    TOTwait:0     TOTVOwait:0
===> CE:grid001.fi.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:12    TOTwait:0     TOTVOwait:0
===> CE:grid002.ca.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:0     TOTVOrun:0    TOTwait:36     TOTVOwait:10
===> CE:grid002.jet.efda.org:2119/jobmanager-lcgpbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:grid003.roma2.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:30     TOTVOrun:29    TOTwait:0     TOTVOwait:0
===> CE:grid01.cu.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:6     TOTVOrun:5    TOTwait:0     TOTVOwait:0
===> CE:grid109.kfki.hu:2119/jobmanager-lcgpbs-atlas TOTrun:4     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite TOTrun:51     TOTVOrun:0    TOTwait:287     TOTVOwait:0
===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long TOTrun:13     TOTVOrun:0    TOTwait:84     TOTVOwait:0
===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-atlas TOTrun:2     TOTVOrun:1    TOTwait:3     TOTVOwait:1
===> CE:gridce.pi.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-lcg TOTrun:3     TOTVOrun:3    TOTwait:3     TOTVOwait:0
===> CE:grim-ce.iucc.ac.il:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:hep-ce.cx1.hpc.ic.ac.uk:2119/jobmanager-pbs-heplt2 TOTrun:274     TOTVOrun:5206    TOTwait:353     TOTVOwait:6707
===> CE:heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:19     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:8     TOTVOrun:0    TOTwait:2     TOTVOwait:0
===> CE:heplnx207.pp.rl.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:19     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:heplnx207.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:8     TOTVOrun:0    TOTwait:2     TOTVOwait:0
===> CE:i101.hpc2n.umu.se:2119/jobmanager-lcgpbs-ngrid TOTrun:20     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:ifaece01.pic.es:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1304     TOTVOwait:0
===> CE:ifaece01.pic.es:2119/jobmanager-lcgpbs-atlas2 TOTrun:14     TOTVOrun:0    TOTwait:162     TOTVOwait:0
===> CE:ituce.grid.itu.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:lapp-ce01.in2p3.fr:2119/jobmanager-pbs-atlas TOTrun:42     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:lcg-ce.lps.umontreal.ca:2119/jobmanager-lcgpbs-atlas TOTrun:10     TOTVOrun:0    TOTwait:171     TOTVOwait:0
===> CE:lcg-ce.rcf.uvic.ca:2119/jobmanager-lcgpbs-general TOTrun:14     TOTVOrun:3    TOTwait:0     TOTVOwait:0
===> CE:lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-atlas TOTrun:12     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:lcg-ce01.icepp.jp:2119/jobmanager-lcgpbs-atlas TOTrun:10     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:lcg-ce1.ifh.de:2119/jobmanager-lcgpbs-atlas_blade TOTrun:80     TOTVOrun:0    TOTwait:108     TOTVOwait:0
===> CE:lcg-lrz-ce.lrz-muenchen.de:2119/jobmanager-sge-atlas TOTrun:4     TOTVOrun:0    TOTwait:0     TOTVOwait:4444
===> CE:lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:1     TOTVOrun:0    TOTwait:2     TOTVOwait:0
===> CE:lcgce01.jinr.ru:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:lcgce01.phy.bris.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:lcgce01.phy.bris.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:13     TOTVOrun:0    TOTwait:24     TOTVOwait:0
===> CE:lcgrid.dnp.fmph.uniba.sk:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:lgdce01.jinr.ru:2119/jobmanager-lcgpbs-atlas TOTrun:2     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min TOTrun:1     TOTVOrun:18    TOTwait:0     TOTVOwait:0
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr TOTrun:40     TOTVOrun:720    TOTwait:1     TOTVOwait:18
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr TOTrun:6     TOTVOrun:108    TOTwait:0     TOTVOwait:0
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr TOTrun:70     TOTVOrun:1260    TOTwait:33     TOTVOwait:594
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min TOTrun:3     TOTVOrun:54    TOTwait:0     TOTVOwait:0
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr TOTrun:14     TOTVOrun:252    TOTwait:1     TOTVOwait:18
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr TOTrun:25     TOTVOrun:450    TOTwait:1     TOTVOwait:18
===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr TOTrun:99     TOTVOrun:1782    TOTwait:33     TOTVOwait:594
===> CE:mu6.matrix.sara.nl:2119/jobmanager-pbs-medium TOTrun:0     TOTVOrun:0    TOTwait:117     TOTVOwait:0
===> CE:mu9.matrix.sara.nl:2119/jobmanager-pbs-batch TOTrun:354     TOTVOrun:0    TOTwait:529     TOTVOwait:0
===> CE:node001.grid.auth.gr:2119/jobmanager-pbs-atlas TOTrun:3     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:paugrid1.pamukkale.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:pc90.hep.ucl.ac.uk:2119/jobmanager-lcgpbs-lcgatlas TOTrun:13     TOTVOrun:0    TOTwait:32     TOTVOwait:0
===> CE:serv03.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-atlas TOTrun:3     TOTVOrun:0    TOTwait:4     TOTVOwait:4444
===> CE:skurut17.cesnet.cz:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:snowpatch.hpc.sfu.ca:2119/jobmanager-lcgpbs-atlas TOTrun:8     TOTVOrun:0    TOTwait:274     TOTVOwait:0
===> CE:spacin-ce1.dma.unina.it:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:2     TOTVOwait:1
===> CE:svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:0    TOTwait:0     TOTVOwait:0
===> CE:t2-ce-01.mi.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:13     TOTVOrun:8    TOTwait:0     TOTVOwait:0
===> CE:t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:3     TOTVOrun:0    TOTwait:7     TOTVOwait:1
===> CE:t2ce02.physics.ox.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:7     TOTVOrun:0    TOTwait:1     TOTVOwait:0
===> CE:t2ce02.physics.ox.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:0     TOTVOrun:0    TOTwait:2     TOTVOwait:0
===> CE:tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas TOTrun:30     TOTVOrun:0    TOTwait:659     TOTVOwait:0
===> CE:tbit01.nipne.ro:2119/jobmanager-lcgpbs-atlas TOTrun:20     TOTVOrun:19    TOTwait:0     TOTVOwait:0
===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-atlas TOTrun:5     TOTVOrun:3    TOTwait:0     TOTVOwait:0
===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-qlong TOTrun:43     TOTVOrun:41    TOTwait:0     TOTVOwait:0
===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-qshort TOTrun:7     TOTVOrun:6    TOTwait:0     TOTVOwait:0
===> CE:yildirim.grid.boun.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0     TOTVOrun:0    TOTwait:1     TOTVOwait:0

<big>CMS service</big>
- No report.
Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY)
<big> LHCb service </big>
- Issue at CERN still waiting to be answered. (Remedy ticket from Philippe)
  When we run jobs reading files that are on lhcbdata (SRM endpoint srm-durable-lhcb.cern.ch) we expect that the files are actually on the lhcbdata pool and then suddenly available for being opened. However it seems that querying the stager for one of these files its status is STAGEIN We would like to know whether this is an expected behaviour of the CERN durable SE, in which case we shall pass all our jobs through the DIRAC stager in order to cope with this. Our assumption was that most of the analysis jobs accessing TxD1 data would not need to unduly overload the service.
Speaker: Dr roberto santinelli (CERN/IT/GD)
<big> ALICE service </big>
- No report.
Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD)
<big> Service Coordination </big>

The CMS CSA07 service challenge Tier 0 reconstruction and Tier 1 data export phase should now start on Tuesday 25 September and run for 30 days. See https://twiki.cern.ch/twiki/bin/view/CMS/CSA07Plan

Speaker: Harry Renshall / Jamie Shiers

16:55 → 17:00
OSG Items 5m
17:00 → 17:05

Review of action items 5m

list of actions
17:10 → 17:15
AOB 5m
- .

Choose timezone

WLCG-OSG-EGEE Operations meeting

28-R-15

CERN conferencing service (joining details below)