-- JamieShiers - 07 Feb 2008

Week of 080204

Open Actions from last week:

Monday:

See the weekly joint operations meeting

Tuesday:

No meeting due to CCRC'08 face-to-face meeting. See http://indico.cern.ch/conferenceDisplay.py?confId=26922

elog review:

Experiment report(s):

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

Questions from sites/experiments:

AOB:

Wednesday

No meeting due to GDB. See http://indico.cern.ch/conferenceDisplay.py?confId=20226

elog review:

Experiment report(s):

Core services (CERN) report:

DB services (CERN) report:

Monitoring / dashboard report:

Release update:

Questions from sites/experiments:

AOB:

Thursday

elog review:

Four significant items- mostly fixed or understood. 1) cms Tier0 staging stopped while afs31 down but there were also stager_get timeouts. Partial explanation - stager job_Manager LSF job submission hung on user with homedir on afs31 (why this dependency ?) and submission is serialised. A suivre. 2)gfal failing with long VO.site.domain names - seen at GRIF. Does not affect LHC. Said to be fixed in gfal 1.10.8 patch 1674. 3)myproxy-fts bind failing from IN2P3 to T2. Site must set a tcp port range. 4)myproxy-fts started failing with credentials expired from Legnaro T2. Traced to having started using two different versions of the UI.

Experiment report(s):

CMS (DB): CCRC activities a bit behind following the stager blockage. Starting to activate T1 to T1 transfers of data received from T0. T1 not getting T0 data this week (e.g. Italy) are exercising their T2 transfers.

ATLAS (SC): Started reprocessing at T0. Plan to start exports today with all raw, esd and aod to all T1 and aod to be replicated to T2. A bit outside the ATLAS model (later decision was to stick to the model!).

LHCb (RS): Up to 18th will be checking each component of the full data chain. Currently systematically checking SRM at various sites (e.g. found case dependency in space token description) and setting up for reprocessing.

Core services (CERN) report:

JvE reported a configuration file issue of the TRANSFER1 variable needing defintion in the SRMV2 endpoints affecting cms FTS transfers - easy to fix for ALICE and CMS, will be harder for LHCb and ATLAS. FTS team planning with CMS to reduce numbers of streams in a tuning exercise. FTS patch 1671 (see below) about to enter certification.

DB services (CERN) report:

Monitoring / dashboard report: Adding functional test results into the CMS dashboard and ATLAS shifters to give feedback on what is needed in their dashboards.

Release update:

  • The FTS server does not pass correctly the information about the space token at destination to the underlying srm layer. A bug has been filed and a patch has been made - see issues page.

Questions from sites/experiments:

RAL (AS) announced they had recently had a power failure but were coming back with WLCG Services soon and a full Castor service tomorrow. An issue was they could not send an EGEE broadcast. JDS reminded sites this is a known problem and that CERN keeps a list of T1 phone contacts. He suggested RAL could phone the CERN console operator at 5011 to advertise the failure.

AOB:

Friday

elog review:

There are ongoing authentication problems to Nikhef that are stopping FTS transfers for ATLAS and LHCb. The CERN SRM server is returning a turl that misses a leading / and this is (was) stopping some file transfers for CMS to FNAL.

Experiment report(s):

CMS (AS): Tier0 tests are continuing and the transfer tests are being reviewed this afternoon. T1 to T1 and T1 to T2 are going well but T0 to T1 started late and are having some problems (e.g. to FNAL above, data not going to tape at PIC and transfers to FZK being by SRMv1). CMS will continue with its plan so move now to other Tier1 and decide later how to follow up on Tier1 that failed to reach their desired metrics this week.

LHCb (RS): Export from the pit to T0 Castor continues. There was an interruption to this yesterday and LHCb will forward the resulting GGUS ticket number to JvE. They are setting up automated transfer T0 to T1 and the associated automatic job submission to the T1. They are seeing the intermittent voms-proxy failures to Nikhef and also a low transfer rate there.

ALICE (PM): Have been following up on failing sites this week. Storage at T1 is still their biggest issue. An xrootd expert will be helping out at GSI next week.

Core services (CERN) report: A workaround for yesterdays Legnaro myproxy-fts problem has been produced. The FTS tuning tests (reducing number of streams) has had no effect to ASGC (there is not enough traffic to IN2P3 to tell).

DB services (CERN) report: Streams replication to RAL LFC was interrupted by their power cut. DM group will be taking over service responsibility for the experiment online databases starting with the CMS 6-node RAC cluster next week.

Monitoring / dashboard report:

Release update:

Questions from sites/experiments: RAL (AS) explained their power cut as due to excessive vibration when one of two transformers was returned to service after a maintenance bringing them both down. Most services were back by 14.00 Thursday, GOCDB this morning and Castor is expected this afternoon (so a 24 hour Castor downtime). PIC (GM) said the load from CMS had caused them some server problems which are now solved. Data is being migrated to tape and he was not aware it had not been.

AOB: JDS reminded that the Monday meeting will be joint with the weekly operations meeting with the segment scheduled for 16.30. The agenda of the segment will now be largely as for this meeting.

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2008-02-08 - HarryRenshall
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback