WLCG-OSG-EGEE Operations meeting
→
Europe/Zurich
28-R-15 (CERN conferencing service (joining details below))
28-R-15
CERN conferencing service (joining details below)
Nick Thackray
Description
grid-operations-meeting@cern.ch
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
GGUS representatives
VO representatives
Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
Attendees:
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
OR click HERE
-
-
16:00
→
16:01
Feedback on last meeting's minutes 1m
-
16:01
→
16:30
EGEE Items 29m
-
<big> Grid-Operator-on-Duty handover </big>From: UK/Ireland and CentralEurope
To: Taiwan and France
Report from UKI COD:- #8637 - couldn't get SAM results
- #8907 - site removed from GOCDB, but SAM tests still available -> unsolvable
- no other major issues
Report from CE COD:- No issues for this week.
-
<big> PPS Report & Issues </big>Please find Issues from EGEE ROCs and general info in:
https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps -
<big> gLite Release News</big>Please find gLite release news in:
https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingGliteReleases
Now in Production:
Soon in Production:
-
<big> EGEE issues coming from ROC reports </big>
- French ROC report:
- Concerning the centralized distribution of gLite client software to EGEE sites, the site answers (5/15) were mainly for disapproval. A common concern is the troubleshooting that would become more complicated as a third party (SA3) is introduced by this mechanism. Other concerns are technical, as for example the overload of NFS/AFS, the difficulty to take into account site-specific configuration (Dcache, rfio, MPI, etc).
- Germany Switzerland
- Request of a DECH site: Is there a timescale when LCG plans to integrate the latest VDT patches? The one of interest is the client upgrade for gridftp, as they solve a lot of issues.
Answer: the present distribution includes VDT 1.6, with gridftp2 compatible clients. If there are important updates needed, we will have a look at them, and they will be back ported (VDT is in 1.10 now)
- Request of a DECH site: Is there a timescale when LCG plans to integrate the latest VDT patches? The one of interest is the client upgrade for gridftp, as they solve a lot of issues.
- UKI
- GGUS#40608 submitted in respect with the Gridview problems experienced on Saturday 6 Sep and Sunday 7 Sep (unsched d/time was not taken into account)
- French ROC report:
-
Top BDII Publishing 15mA collection of Top BDIIs that are publishing is visible here. If you have a gLite 3.1 top level BDII then it should appear on this page. Please check.
-
-
16:30
→
17:00
WLCG Items 30m
-
<big> WLCG issues coming from ROC reports </big>
-
<big>WLCG Service Interventions (with dates / times where known) </big>Link to CIC Portal (broadcasts/news), scheduled downtimes (GOCDB) and CERN IT Status Board
Many interventions scheduled this week. Please consult the URLs above for details.Time at WLCG T0 and T1 sites.
-
<big> WLCG Operational Review </big>Speaker: Harry Renshall / Jamie Shiers
-
<big> Alice report </big>
-
<big> Atlas report </big>
-
<big> CMS report </big>
- T0 workflows: In running mode. Highlights from the weekend. On Saturday: several small runs with special tests, no processing failures observed (just one run got stuck in the DAQ, still repacked to 100% though), then a couple of long runs (/BeamHalo and /Cosmics, all in apart Pixel, Tracker, ECAL endcap). Activated new offline DQM harvesting. ALCARECO migrated to global DBS, injected into PhEDEx and subscribed to CAF and CERN_MSS. Transfers ongoing with no major problems. Some blind regions in Lemon monitoring (reported by shifters) [*1]. --- On Sunday: some more long cosmic runs some stay in PromptReco for long time (but e.g. one took 4.4E6 cosmics evts..). --- This morning: a couple of hrs of slower data taking due to set-up problems in the trigger chain (DTTF), now OK.
- T1 workflows: ASGC: Typhon in Taipei --- IN2P3: issue with the transfer of a custodial /Cosmics sample, seems to be related to PhEDEx Ops issues, a first diagnosis available and out soon, being tracked internally as [*2]. --- FZK: small tmp glitches in SE/CE SAM tests, may be related to the problems they had during the weekend with the power supply of the dCache system (file-open errors were triggered).
- T2 workflows: some CMS JobRobot failures at some T2s, sites informed as appropriate. --- CMSSW installation problem at T2_UK_London_IC: being addressed.
[*1]
https://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Ft1transfer&cluster=1&type=host
https://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Ft0export&cluster=1&type=host
https://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Ft0input&cluster=1&type=host
http://lemonweb.cern.ch/lemon-status/info.php?time=0.0.5&offset=0&entity=c2cms%252Fcmscaf&cluster=1&type=host
[*2]
http://savannah.cern.ch/support/?105610
Speaker: Daniele Bonacorsi - T0 workflows: In running mode. Highlights from the weekend. On Saturday: several small runs with special tests, no processing failures observed (just one run got stuck in the DAQ, still repacked to 100% though), then a couple of long runs (/BeamHalo and /Cosmics, all in apart Pixel, Tracker, ECAL endcap). Activated new offline DQM harvesting. ALCARECO migrated to global DBS, injected into PhEDEx and subscribed to CAF and CERN_MSS. Transfers ongoing with no major problems. Some blind regions in Lemon monitoring (reported by shifters) [*1]. --- On Sunday: some more long cosmic runs some stay in PromptReco for long time (but e.g. one took 4.4E6 cosmics evts..). --- This morning: a couple of hrs of slower data taking due to set-up problems in the trigger chain (DTTF), now OK.
-
<big> LHCb report </big>
-
<big> Storage services: Recommended base versions </big>The recommended baseline versions for the storage solutions can be found here: https://twiki.cern.ch/twiki/bin/view/LCG/GSSDCCRCBaseVersions
-
<big> Storage services: this week's updates </big>Refer to the wiki page here: https://twiki.cern.ch/twiki/bin/view/LCG/CCRC08StorageStatus
-
-
17:00
→
17:30
OSG Items 30mSpeaker: Rob Quick (OSG - Indiana University)
-
Discussion of open tickets for OSG
- https://gus.fzk.de/ws/ticket_info.php?ticket=37059
-
-
17:30
→
17:35
Review of action items 5m
-
17:35
→
17:36
AOB 1m
-
16:00
→
16:01