28-R-15 (CERN conferencing service (joining details below))
CERN conferencing service (joining details below)
firstname.lastname@example.org Weekly OSG, EGEE, WLCG infrastructure coordination meeting.
We discuss the weekly running of the production grid infrastructure based on weekly reports from the attendees. The reported issues are discussed, assigned to the relevant teams, followed up and escalated when needed. The meeting is also the forum for the sites to get a summary of the weekly WLCG activities and plans
OSG operations team
EGEE operations team
EGEE ROC managers
WLCG coordination representatives
WLCG Tier-1 representatives
other site representatives (optional)
To dial in to the conference:
a. Dial +41227676000
b. Enter access code 0140768
<big>FTS SL4 - required by the experiments or tier-1 sites?</big>
Alice: Neutral (as long as there is no disruption to the service.
ATLAS: Prefer not to; to avoid introducing problems this close to data taking.
CMS: Priority is stability for data taking days. Whatever is scheduled in advance *and* allows some pre-testing can be negotiated, though. On CERN migration, instead, PhEDEx /Prod vs /Debug instance can be played with to allow testing before going into prod (talked to Gavin)
LHCb: Neutral (as long as there is no disruption to the service.
ASGC:BNL: Has a fairly pressing need to move to SL/RHEL4 because of our site security situation. If it is made available in production soon, we would definitely switch over.
CERN:FNAL: Hardware is dating fast. May be issues with maintenance.
FZK: IN2P3: INFN: NDGF: PIC: RAL: SARA/Nikhef: TRIUMF:
<big>WLCG Service Interventions (with dates / times where known) </big>
general on CRUZET-4 and T0 workflows:
CRUZET-4 over at ~8am in the morning, ~38 ml evts collected during the exercise, most interesting part from Thursday on, >25 ml evts only in last weekend. Plenty of precious info and feedback on a real-life exercise. CRUZET Jamboree on Wednesday afternoon. CRUZET-like activities will restart again with magnetic field at the end of the week. --- SLS reported "CMS Online databases" at 0% availability, due to a CMS DB intervention in the Online, now over and status is OK.
Distributed Data Transfers:
We see 1) issues with the stager agent (experts aware and investigating) + 2) some Castor issues causing problems to the CAF (2 tickets to CERN-IT still pending over the weekend, see [$1] and [$2]) + 3) issue with download agents in at least 2 T1 sites. This overall causes PhEDEx service to be labelled as 'degraded' in SLS. These are being addressed/closed right now- as from news from the WLCG daily call [$1] http://remedy01.cern.ch/cgi-bin/consult.cgi?caseid=CT0000000546182email@example.com [$2] http://remedy01.cern.ch/cgi-bin/consult.cgi?caseid=CT0000000546181firstname.lastname@example.org
The high-profile Summer'08 production is on-going, still ramping up to full speed though.
<big> LHCb report </big>
LHCb is wondering (and wants to be seriously taken into account) whether it is valid that any downtime announced less than 24 hours must be considered Unscheduled rather than scheduled (with obvious different implication at the site reliability computation level)
LHCb wants to remind all sites that the Shared Area is also a critical service and sites must guarantee the adequate QoS required. The problem at CNAF teaches us that this is important. How can this message be conveyed efficiently to all sites and the quality improved by adopting/writing adequate fabric sensors?
The last week SAM sensors http://lblogbook.cern.ch/Operations/375 pointed out a problem about SAM critical services (used by Gridview algorithms to computing reliability) and services effectively used by the VOs. The 20th of August StoRM at CNAF stopped to be published as SRM sensor (it is now only SRMv2 sensor in SAM dictionary) and then SAM clients fail to publish results. The net effect is that, for the still critical SRM service, there are not results available for CNAF since then. Open a GGUS for GridVIEW team: https://gus.fzk.de/pages/ticket_details.php?ticket=40087
<big> Storage services: Recommended base versions </big>