WLCG-OSG-EGEE Operations meeting
28-R-15
CERN conferencing service (joining details below)
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0157610
OR click HERE
NB: Reports were not received in advance of the meeting from:
-
-
16:00
→
16:05
Feedback on last meeting's minutes 5m
-
16:01
→
16:30
EGEE Items 29m
-
<big> Grid-Operator-on-Duty handover </big>From: ROC Germany-Switzerland / ROC CentralEurope
To: ROC South-Western/ ROC France
NB: Please can the grid ops-on-duty teams submit their reports no later than 12:00 UTC (14:00 Swiss local time).
Issues:
- (CE ROC)Backup team
Ticket summary extended : 41 opened : 55 closed : 12 2-nd mail : 4 total 112
Problem with Dashboard - tickets list is not refreshing itself since Wednesday. Notification was send.
- (CE ROC)Backup team
-
<big> PPS Report & Issues </big>PPS reports were not received from these ROCs:
AP, FR, IT, NE, SWE
Issues from EGEE ROCs:
- Nothing to report
- Release of gLite 3.1.0 Update06 to PPS done
- new voms certificate for US-ATLAS server (sync to production)
- uberftp for the glite-UI node
- lcg-tags added to glite 3.1 UI, WN
- lcg-infosites added to the glite 3.1 WN
- Javier Lopez and Esteban Freire from PPS-CESGA have joined the PPS Coordination team. Their main task is to follow-up the roll-out of the gLite middleware updates from Certification to PPS
-
<big> EGEE issues coming from ROC reports </big>None.
-
<big>The lcg-RB to be moved no maintanece mode.</big> 15mJRA1 and SA3 have now moved the old lcg-RB to zero maintenance mode. glite 3.1 WMS whether on SL3 or SL4 is the supported solution.
-
-
16:30
→
17:00
WLCG Items 30m
-
<big> Tier 1 reports </big>
- GridKa
SRM lockups. Possibly caused by memory subsystem interaction with the java vm. Severity: moderate -
BNL-LCG
Monday 17 - Thursday 20 Users job get stuck Cause: too few movers available on some pool nodes Severity: one user affected Remediation: increase the number of movers Problems: dCache has bad local account mappings for some critical users. It caused USATLAS/ATLAS data transfer failures. Cause: We did an in-place GUMS (Grid user management system) upgrade on Thursday afternoon. Some critical users were mapped to wrong accounts due to the configuration file problem. This problem was undetected because GUMS still provided mapping service, and the generated grid map appeared to be "OK" while it was not. The symptom did not show up several hours later until midnight when dCache regenerated its map file from the GUMS server, and experienced data transfer failures thereafter. Even the GUMS update had been properly announced, we might not be able to discover this type of problem. Cause: A fraction of USATLAS data transfer failed between midnight and 10:30AM, Friday monitoring. Solution: Short term, we corrected the configuration file errors, and let dCache regenerate dCache map file, recovered the data transfer problem. In order to fix this problem and prevent the future occurrence, we will do two improvements: 1) The dCache team might want to consider regenerating the grid map file during prime business hour, i.e. 9:00AM, and 3:00PM. 2) We will develop Nagios probes to validate the certificate mapping for the critical users: Nurcan''''s production certificate, Hiro''''s data transfer certificates. (Please let us know any other critical certificates). Main symptom was a slower throughput to HPSS. Files kept being flushed to HPSS and stayed precious. Cause: due to a Solaris/Linux difference, the script that stages files to HPSS is not recognizing flushed files correctly and was not working reliably on the Thumper Severity: Apart from a minor slowdown, few files got corrupted. In the case that a file was written, delete, and then rewritten with the same name, it could happen that the old copy was the one actually kept in HPSS. Remediation: Thumper assigned only as a read node Long term solution: need to do more work to guarantee that Thumpers in the write pool work reliably Friday 14 - Saturday 15 Production had problems reading files. Cause: no pool was actually assigned to production, because of the reconfiguration done prior to the HPSS upgrade Severity: USATLAS production affected Remediation: reset the configuration dCache went down Cause: unknown - still investigating Severity: site down Remediation: restart dCache core servers Saturday 15 User client hang because of suspended requests Cause: due to HPSS upgrade instability, many requests were suspended Severity: USATLAS production affected Remediation: retry the requests Ongoing File disappear. More so during HPSS upgrade. Cause: still unknown - user activity is primary suspect Severity: some files are lost Remediation: manually retrieve the list of file lost, and clean up the data catalog so users do not request files that are not available Monday 17 - Thursday 20 Users job get stuck Cause: too few movers available on some pool nodes Severity: one user affected Remediation: increase the number of movers Monday 17 - Thursday 20 Users job get stuck Cause: too few movers available on some pool nodes Severity: one user affected Remediation: increase the number of movers Friday 21 dCache was down Cause: power failure in the facility brought down, and the UPS servicing that rack was not working properly. One of the machines in the rack was the PNFS servers. Severity: system down for less than an hour Remediation: system restarted Long term solution: fix the UPS
-
USCMS-FNAL-WC1
Test shows SE and SRM down, but this is not true. Many ongoing transfers.
- GridKa
-
<big> WLCG issues coming from ROC reports </big>None.
-
<big>WLCG Service Interventions (with dates / times where known) </big>Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
Time at WLCG T0 and T1 sites.
-
<big> ATLAS service </big>See also https://twiki.cern.ch/twiki/bin/view/Atlas/TierZero20071 and https://twiki.cern.ch/twiki/bin/view/Atlas/ComputingOperations for more information.
-
From Rod Walker
A user with EMail in the DN had problems at 2 sites, SARA and BNL, to access dcache. I think it`s the old problem of java GSI chewing up the EMail part. The workaround put various permutations in the dcache mapfile. The response from SARA indicates that they have fewer permutations than TRIUMF. TRIUMF has
# rpm -qf /opt/d-cache/bin/grid-mapfile2dcache-kpwd d-cache-lcg-6.2.0-1.noarch
Still not clear what is in BNL. In SARA maybe there is a log saying which DN was refused. This issue is reported to clarify what should be in the mapfile, and whether all sites provide this. -
From Campana et al.
In many sites there is a mismatch between all inclusive information published for a CE and information published in the VOViews. As an example, the queue mentioned below supports only ATLAS, therefore, the number of waiting jobs in the inclusive view should be the same as the one for the ATLAS VoView. But it is not. The VOView publishes all zeroes. Moreover, there are some queues where the number of waiting jobs for all views do not add up to the total published in the inclusive view. In total more than 130 ATLAS queues are affected, among which almost all T1s. Since the WMS uses information in the VOView and the latest one is generally the wrongly published one, ATLAS is submitting jobs almost randomly with accumulation of jobs at small sites. The issue is extremely severe. We would like the operation team to investigate the reason for so many mismatches and chase site by site to have the problem cured. Attached below is a list of currently problematic CEs as of today - The average ATLAS job requires 1.1GB of memory per core. CERN publishes 1GB of RAM, therefore CERN is empty of ATLAS production jobs. We would like to ask CERN to evaluate if 1GB ram is completely realistic. In case it is, we should start thinking about publishing different subclusters for different types of machines.
- The requirements for the size of the ATLAS SW installation area have
been discussed 4 years ago, and they are surely obsolete. The ATLAS SW
manager would like to ask for 100 GB of shared space for ATLAS where to
install software at every ATLAS T1 and T2 site.
# tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas, RO-02-NIPNE, grid dn: GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas,mds-vo- name=RO -02-NIPNE,o=grid objectClass: GlueCETop objectClass: GlueCE objectClass: GlueSchemaVersion objectClass: GlueCEAccessControlBase objectClass: GlueCEInfo objectClass: GlueCEPolicy objectClass: GlueCEState objectClass: GlueInformationService objectClass: GlueKey GlueCEHostingCluster: tbat01.nipne.ro GlueCEName: atlas GlueCEUniqueID: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas GlueCEInfoGatekeeperPort: 2119 GlueCEInfoHostName: tbat01.nipne.ro GlueCEInfoLRMSType: torque GlueCEInfoLRMSVersion: 2.1.6 GlueCEInfoTotalCPUs: 44 GlueCEInfoJobManager: lcgpbs GlueCEInfoContactString: tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas GlueCEInfoApplicationDir: /opt/exp_soft GlueCEInfoDataDir: unset GlueCEInfoDefaultSE: tbat05.nipne.ro GlueCEStateEstimatedResponseTime: 390175 GlueCEStateFreeCPUs: 3 GlueCEStateRunningJobs: 29 GlueCEStateStatus: Production GlueCEStateTotalJobs: 750 GlueCEStateWaitingJobs: 721 GlueCEStateWorstResponseTime: 186883200 GlueCEStateFreeJobSlots: 0 GlueCEPolicyMaxCPUTime: 2880 GlueCEPolicyMaxRunningJobs: 93 GlueCEPolicyMaxTotalJobs: 0 GlueCEPolicyMaxWallClockTime: 4320 GlueCEPolicyPriority: 1 GlueCEPolicyAssignedJobSlots: 0 GlueCEAccessControlBaseRule: VO:atlas GlueForeignKey: GlueClusterUniqueID=tbat01.nipne.ro GlueInformationServiceURL: ldap://tbat01.nipne.ro:2135/mds-vo- name=local,o=gri d GlueSchemaVersionMajor: 1 GlueSchemaVersionMinor: 2 # atlas, tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas, RO-02-NIPNE, grid dn: GlueVOViewLocalID=atlas,GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager- lcg pbs-atlas,mds-vo-name=RO-02-NIPNE,o=grid objectClass: GlueCETop objectClass: GlueVOView objectClass: GlueCEInfo objectClass: GlueCEState objectClass: GlueCEAccessControlBase objectClass: GlueCEPolicy objectClass: GlueKey objectClass: GlueSchemaVersion GlueVOViewLocalID: atlas GlueCEAccessControlBaseRule: VO:atlas GlueCEStateRunningJobs: 0 GlueCEStateWaitingJobs: 0 GlueCEStateTotalJobs: 0 GlueCEStateFreeJobSlots: 15 GlueCEStateEstimatedResponseTime: 0 GlueCEStateWorstResponseTime: 0 GlueCEInfoDefaultSE: tbat05.nipne.ro GlueCEInfoApplicationDir: /opt/exp_soft/atlas GlueCEInfoDataDir: unset GlueChunkKey: GlueCEUniqueID=tbat01.nipne.ro:2119/jobmanager-lcgpbs- atlas GlueSchemaVersionMajor: 1 GlueSchemaVersionMinor: 2
[campanas@lxb0709 BDII]$ python VOViewsConsist.py | grep '==>' ===> CE:atlasce.lnf.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:27 TOTVOrun:0 TOTwait:531 TOTVOwait:0 ===> CE:atlasce.phys.sinica.edu.tw:2119/jobmanager-lcgcondor-atlas TOTrun:1 TOTVOrun:0 TOTwait:20 TOTVOwait:0 ===> CE:atlasce01.na.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:18 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:atlasce01.na.infn.it:2119/jobmanager-lcgpbs-atlas_short TOTrun:6 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:bigmac-lcg-ce.physics.utoronto.ca:2119/jobmanager-lcgcondor-atlas TOTrun:12 TOTVOrun:0 TOTwait:3 TOTVOwait:4444 ===> CE:cclcgceli02.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:4 TOTVOrun:2 TOTwait:0 TOTVOwait:0 ===> CE:cclcgceli04.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:6 TOTVOrun:3 TOTwait:0 TOTVOwait:0 ===> CE:cclcgceli05.in2p3.fr:2119/jobmanager-bqs-atlas_long TOTrun:14 TOTVOrun:7 TOTwait:2 TOTVOwait:1 ===> CE:ce.bfg.uni-freiburg.de:2119/jobmanager-pbs-atlas TOTrun:18 TOTVOrun:16 TOTwait:0 TOTVOwait:0 ===> CE:ce.epcc.ed.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:3 TOTVOrun:0 TOTwait:29 TOTVOwait:0 ===> CE:ce.gina.sara.nl:2119/jobmanager-pbs-medium TOTrun:141 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:ce.gina.sara.nl:2119/jobmanager-pbs-short TOTrun:14 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce.hpc.csie.thu.edu.tw:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:3 TOTVOwait:2 ===> CE:ce.keldysh.ru:2119/jobmanager-lcgpbs-atlas TOTrun:6 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce.phy.bg.ac.yu:2119/jobmanager-pbs-atlas TOTrun:14 TOTVOrun:0 TOTwait:16 TOTVOwait:2 ===> CE:ce.ulakbim.gov.tr:2119/jobmanager-lcgpbs-atlas TOTrun:11 TOTVOrun:9 TOTwait:0 TOTVOwait:0 ===> CE:ce00.hep.ph.ic.ac.uk:2119/jobmanager-sge-72hr TOTrun:177 TOTVOrun:3717 TOTwait:0 TOTVOwait:0 ===> CE:ce001.grid.uni-sofia.bg:2119/jobmanager-lcgpbs-atlas TOTrun:6 TOTVOrun:5 TOTwait:0 TOTVOwait:0 ===> CE:ce01-lcg.projects.cscs.ch:2119/jobmanager-lcgpbs-atlas TOTrun:8 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce01.afroditi.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:5 TOTVOrun:2 TOTwait:2 TOTVOwait:1 ===> CE:ce01.ariagni.hellasgrid.gr:2119/jobmanager-lcgpbs-atlas TOTrun:6 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce01.athena.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:1 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce01.ific.uv.es:2119/jobmanager-pbs-atlas TOTrun:29 TOTVOrun:0 TOTwait:299 TOTVOwait:1 ===> CE:ce01.ific.uv.es:2119/jobmanager-pbs-atlasL TOTrun:28 TOTVOrun:0 TOTwait:295 TOTVOwait:1 ===> CE:ce01.kallisto.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:17 TOTVOrun:16 TOTwait:0 TOTVOwait:0 ===> CE:ce01.marie.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:10 TOTVOrun:6 TOTwait:2 TOTVOwait:0 ===> CE:ce02.athena.hellasgrid.gr:2119/blah-pbs-atlas TOTrun:1 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce02.lip.pt:2119/jobmanager-lcgsge-atlasgrid TOTrun:8 TOTVOrun:6 TOTwait:0 TOTVOwait:0 ===> CE:ce02.marie.hellasgrid.gr:2119/jobmanager-pbs-atlas TOTrun:9 TOTVOrun:8 TOTwait:4 TOTVOwait:3 ===> CE:ce03-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:29 TOTVOrun:6 TOTwait:0 TOTVOwait:0 ===> CE:ce04-lcg.cr.cnaf.infn.it:2119/blah-lsf-atlas TOTrun:10 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce05-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-slc4_debug TOTrun:881 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-atlastest TOTrun:4 TOTVOrun:0 TOTwait:163 TOTVOwait:0 ===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19 TOTVOrun:3 TOTwait:0 TOTVOwait:0 ===> CE:ce05.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9 TOTVOrun:8 TOTwait:4 TOTVOwait:4 ===> CE:ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:9 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce06-lcg.cr.cnaf.infn.it:2119/jobmanager-lcglsf-debug TOTrun:9 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ce06.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19 TOTVOrun:3 TOTwait:0 TOTVOwait:0 ===> CE:ce06.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9 TOTVOrun:8 TOTwait:4 TOTVOwait:4 ===> CE:ce07.pic.es:2119/jobmanager-lcgpbs-glong TOTrun:19 TOTVOrun:3 TOTwait:0 TOTVOwait:0 ===> CE:ce07.pic.es:2119/jobmanager-lcgpbs-gshort TOTrun:9 TOTVOrun:8 TOTwait:4 TOTVOwait:4 ===> CE:ce1-egee.srce.hr:2119/jobmanager-sge-dteam TOTrun:12 TOTVOrun:12 TOTwait:0 TOTVOwait:4444 ===> CE:ce1.egee.fr.cgg.com:2119/jobmanager-lcgpbs-atlas TOTrun:10 TOTVOrun:0 TOTwait:2 TOTVOwait:0 ===> CE:ce1.triumf.ca:2119/jobmanager-lcgpbs-atlas TOTrun:18 TOTVOrun:9 TOTwait:0 TOTVOwait:0 ===> CE:ce101.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10 TOTVOrun:7 TOTwait:2 TOTVOwait:2 ===> CE:ce102.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10 TOTVOrun:7 TOTwait:2 TOTVOwait:2 ===> CE:ce106.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10 TOTVOrun:7 TOTwait:0 TOTVOwait:0 ===> CE:ce107.cern.ch:2119/jobmanager-lcglsf-grid_2nh_atlas TOTrun:100 TOTVOrun:99 TOTwait:0 TOTVOwait:0 ===> CE:ce107.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:6 TOTVOrun:3 TOTwait:11 TOTVOwait:5 ===> CE:ce108.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10 TOTVOrun:7 TOTwait:0 TOTVOwait:0 ===> CE:ce123.cern.ch:2119/jobmanager-lcglsf-grid_atlas TOTrun:10 TOTVOrun:7 TOTwait:0 TOTVOwait:0 ===> CE:ce2.triumf.ca:2119/jobmanager-lcgpbs-atlas TOTrun:18 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ceitep.itep.ru:2119/jobmanager-lcgpbs-atlas TOTrun:2 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:clrlcgce02.in2p3.fr:2119/jobmanager-lcgpbs-atlas TOTrun:12 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:cs-grid0.bgu.ac.il:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:31 TOTVOwait:0 ===> CE:cs-grid1.bgu.ac.il:2119/blah-pbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:31 TOTVOwait:0 ===> CE:dgce0.icepp.jp:2119/jobmanager-lcgpbs-atlas TOTrun:16 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:egee.irb.hr:2119/jobmanager-lcgpbs-grid TOTrun:16 TOTVOrun:15 TOTwait:0 TOTVOwait:0 ===> CE:epgce1.ph.bham.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:6 TOTVOrun:0 TOTwait:457 TOTVOwait:0 ===> CE:epgce1.ph.bham.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:3 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:fal-pygrid-18.lancs.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:13 TOTVOrun:7 TOTwait:0 TOTVOwait:0 ===> CE:fornax-ce.itwm.fhg.de:2119/jobmanager-lcgpbs-atlas TOTrun:8 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:g03n02.pdc.kth.se:2119/jobmanager-pbs-atlas TOTrun:1 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:gcn54.hep.physik.uni-siegen.de:2119/jobmanager-lcgpbs-atlas TOTrun:1 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:glite-ce-01.cnaf.infn.it:2119/blah-pbs-lcg TOTrun:3 TOTVOrun:3 TOTwait:3 TOTVOwait:0 ===> CE:glite-ce01.marie.hellasgrid.gr:2119/blah-pbs-atlas TOTrun:10 TOTVOrun:6 TOTwait:2 TOTVOwait:0 ===> CE:golias25.farm.particle.cz:2119/jobmanager-lcgpbs-lcgatlas TOTrun:2 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:grid-ce.physik.uni-wuppertal.de:2119/jobmanager-lcgpbs-dg_long TOTrun:0 TOTVOrun:0 TOTwait:2 TOTVOwait:0 ===> CE:grid-ce.rzg.mpg.de:2119/jobmanager-sge-long TOTrun:1 TOTVOrun:0 TOTwait:0 TOTVOwait:26664 ===> CE:grid-ce3.desy.de:2119/jobmanager-lcgpbs-default TOTrun:143 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:grid-ce3.desy.de:2119/jobmanager-lcgpbs-testing TOTrun:2 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:grid.uibk.ac.at:2119/jobmanager-lcgpbs-atlas TOTrun:6 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:grid0.fe.infn.it:2119/jobmanager-lcgpbs-lcg TOTrun:7 TOTVOrun:4 TOTwait:0 TOTVOwait:0 ===> CE:grid001.fi.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:13 TOTVOrun:12 TOTwait:0 TOTVOwait:0 ===> CE:grid002.ca.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:0 TOTVOrun:0 TOTwait:36 TOTVOwait:10 ===> CE:grid002.jet.efda.org:2119/jobmanager-lcgpbs-atlas TOTrun:3 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:grid003.roma2.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:30 TOTVOrun:29 TOTwait:0 TOTVOwait:0 ===> CE:grid01.cu.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:6 TOTVOrun:5 TOTwait:0 TOTVOwait:0 ===> CE:grid109.kfki.hu:2119/jobmanager-lcgpbs-atlas TOTrun:4 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-infinite TOTrun:51 TOTVOrun:0 TOTwait:287 TOTVOwait:0 ===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-long TOTrun:13 TOTVOrun:0 TOTwait:84 TOTVOwait:0 ===> CE:gridba2.ba.infn.it:2119/jobmanager-lcgpbs-short TOTrun:0 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:gridce.ilc.cnr.it:2119/jobmanager-lcgpbs-atlas TOTrun:2 TOTVOrun:1 TOTwait:3 TOTVOwait:1 ===> CE:gridce.pi.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:3 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:gridit-ce-001.cnaf.infn.it:2119/jobmanager-lcgpbs-lcg TOTrun:3 TOTVOrun:3 TOTwait:3 TOTVOwait:0 ===> CE:grim-ce.iucc.ac.il:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:hep-ce.cx1.hpc.ic.ac.uk:2119/jobmanager-pbs-heplt2 TOTrun:274 TOTVOrun:5206 TOTwait:353 TOTVOwait:6707 ===> CE:heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:19 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:heplnx206.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:8 TOTVOrun:0 TOTwait:2 TOTVOwait:0 ===> CE:heplnx207.pp.rl.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:19 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:heplnx207.pp.rl.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:8 TOTVOrun:0 TOTwait:2 TOTVOwait:0 ===> CE:i101.hpc2n.umu.se:2119/jobmanager-lcgpbs-ngrid TOTrun:20 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:ifaece01.pic.es:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:1304 TOTVOwait:0 ===> CE:ifaece01.pic.es:2119/jobmanager-lcgpbs-atlas2 TOTrun:14 TOTVOrun:0 TOTwait:162 TOTVOwait:0 ===> CE:ituce.grid.itu.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:lapp-ce01.in2p3.fr:2119/jobmanager-pbs-atlas TOTrun:42 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:lcg-ce.lps.umontreal.ca:2119/jobmanager-lcgpbs-atlas TOTrun:10 TOTVOrun:0 TOTwait:171 TOTVOwait:0 ===> CE:lcg-ce.rcf.uvic.ca:2119/jobmanager-lcgpbs-general TOTrun:14 TOTVOrun:3 TOTwait:0 TOTVOwait:0 ===> CE:lcg-ce0.ifh.de:2119/jobmanager-lcgpbs-atlas TOTrun:12 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:lcg-ce01.icepp.jp:2119/jobmanager-lcgpbs-atlas TOTrun:10 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:lcg-ce1.ifh.de:2119/jobmanager-lcgpbs-atlas_blade TOTrun:80 TOTVOrun:0 TOTwait:108 TOTVOwait:0 ===> CE:lcg-lrz-ce.lrz-muenchen.de:2119/jobmanager-sge-atlas TOTrun:4 TOTVOrun:0 TOTwait:0 TOTVOwait:4444 ===> CE:lcgce0.shef.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:1 TOTVOrun:0 TOTwait:2 TOTVOwait:0 ===> CE:lcgce01.jinr.ru:2119/jobmanager-lcgpbs-atlas TOTrun:8 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:lcgce01.phy.bris.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:3 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:lcgce01.phy.bris.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:13 TOTVOrun:0 TOTwait:24 TOTVOwait:0 ===> CE:lcgrid.dnp.fmph.uniba.sk:2119/jobmanager-lcgpbs-atlas TOTrun:8 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:lgdce01.jinr.ru:2119/jobmanager-lcgpbs-atlas TOTrun:2 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-10min TOTrun:1 TOTVOrun:18 TOTwait:0 TOTVOwait:0 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-12hr TOTrun:40 TOTVOrun:720 TOTwait:1 TOTVOwait:18 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-1hr TOTrun:6 TOTVOrun:108 TOTwait:0 TOTVOwait:0 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-24hr TOTrun:70 TOTVOrun:1260 TOTwait:33 TOTVOwait:594 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-30min TOTrun:3 TOTVOrun:54 TOTwait:0 TOTVOwait:0 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-3hr TOTrun:14 TOTVOrun:252 TOTwait:1 TOTVOwait:18 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-6hr TOTrun:25 TOTVOrun:450 TOTwait:1 TOTVOwait:18 ===> CE:mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge-72hr TOTrun:99 TOTVOrun:1782 TOTwait:33 TOTVOwait:594 ===> CE:mu6.matrix.sara.nl:2119/jobmanager-pbs-medium TOTrun:0 TOTVOrun:0 TOTwait:117 TOTVOwait:0 ===> CE:mu9.matrix.sara.nl:2119/jobmanager-pbs-batch TOTrun:354 TOTVOrun:0 TOTwait:529 TOTVOwait:0 ===> CE:node001.grid.auth.gr:2119/jobmanager-pbs-atlas TOTrun:3 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:paugrid1.pamukkale.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:pc90.hep.ucl.ac.uk:2119/jobmanager-lcgpbs-lcgatlas TOTrun:13 TOTVOrun:0 TOTwait:32 TOTVOwait:0 ===> CE:serv03.hep.phy.cam.ac.uk:2119/jobmanager-lcgcondor-atlas TOTrun:3 TOTVOrun:0 TOTwait:4 TOTVOwait:4444 ===> CE:skurut17.cesnet.cz:2119/jobmanager-lcgpbs-atlas TOTrun:8 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:snowpatch.hpc.sfu.ca:2119/jobmanager-lcgpbs-atlas TOTrun:8 TOTVOrun:0 TOTwait:274 TOTVOwait:0 ===> CE:spacin-ce1.dma.unina.it:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:2 TOTVOwait:1 ===> CE:svr016.gla.scotgrid.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:13 TOTVOrun:0 TOTwait:0 TOTVOwait:0 ===> CE:t2-ce-01.mi.infn.it:2119/jobmanager-lcgpbs-atlas TOTrun:13 TOTVOrun:8 TOTwait:0 TOTVOwait:0 ===> CE:t2-ce-02.lnl.infn.it:2119/jobmanager-lcglsf-atlas TOTrun:3 TOTVOrun:0 TOTwait:7 TOTVOwait:1 ===> CE:t2ce02.physics.ox.ac.uk:2119/jobmanager-lcgpbs-atlas TOTrun:7 TOTVOrun:0 TOTwait:1 TOTVOwait:0 ===> CE:t2ce02.physics.ox.ac.uk:2119/jobmanager-lcgpbs-short TOTrun:0 TOTVOrun:0 TOTwait:2 TOTVOwait:0 ===> CE:tbat01.nipne.ro:2119/jobmanager-lcgpbs-atlas TOTrun:30 TOTVOrun:0 TOTwait:659 TOTVOwait:0 ===> CE:tbit01.nipne.ro:2119/jobmanager-lcgpbs-atlas TOTrun:20 TOTVOrun:19 TOTwait:0 TOTVOwait:0 ===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-atlas TOTrun:5 TOTVOrun:3 TOTwait:0 TOTVOwait:0 ===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-qlong TOTrun:43 TOTVOrun:41 TOTwait:0 TOTVOwait:0 ===> CE:tbn20.nikhef.nl:2119/jobmanager-pbs-qshort TOTrun:7 TOTVOrun:6 TOTwait:0 TOTVOwait:0 ===> CE:yildirim.grid.boun.edu.tr:2119/jobmanager-lcgpbs-atlas TOTrun:0 TOTVOrun:0 TOTwait:1 TOTVOwait:0
-
From Rod Walker
-
<big>CMS service</big>
- No report.
Speaker: Mr Daniele Bonacorsi (CNAF-INFN BOLOGNA, ITALY) -
<big> LHCb service </big>
- Issue at CERN still waiting to be answered. (Remedy ticket from Philippe)
When we run jobs reading files that are on lhcbdata (SRM endpoint srm-durable-lhcb.cern.ch) we expect that the files are actually on the lhcbdata pool and then suddenly available for being opened. However it seems that querying the stager for one of these files its status is STAGEIN We would like to know whether this is an expected behaviour of the CERN durable SE, in which case we shall pass all our jobs through the DIRAC stager in order to cope with this. Our assumption was that most of the analysis jobs accessing TxD1 data would not need to unduly overload the service.
Speaker: Dr roberto santinelli (CERN/IT/GD) - Issue at CERN still waiting to be answered. (Remedy ticket from Philippe)
-
<big> ALICE service </big>
- No report.
Speaker: Dr Patricia Mendez Lorenzo (CERN IT/GD) -
<big> Service Coordination </big>The CMS CSA07 service challenge Tier 0 reconstruction and Tier 1 data export phase should now start on Tuesday 25 September and run for 30 days. See https://twiki.cern.ch/twiki/bin/view/CMS/CSA07PlanSpeaker: Harry Renshall / Jamie Shiers
-
-
16:55
→
17:00
OSG Items 5m
- Discussion of open tickets for OSG.
- https://gus.fzk.de/pages/download_escalation_reports_roc.php
- 17:00 → 17:05
-
17:10
→
17:15
AOB 5m
- .
-
16:00
→
16:05