WLCG MW Readiness WG 13th meeting Minutes - October 28th 2015
WG twiki
Agenda
Summary
- The xrootd 4 monitoring plugin issue with dCache v. 2.13.8 was fixed by the developer immediately after the last MW Readiness meeting. ATLAS confirmed the plugin is important for them. CMS finds it 'nice to have', not indispensable.
- One host of the CERN FTS-3 pilot will be running CentOS7. In this way ATLAS and CMS will be able to test FTS3, via their workflows, on both OS environments.
- We need a contact at DESY to arrange the dCache on Prometheus testing.
- NDGF is the only Volunteer Site where the pakiti client installation is still due.
- The MW Readiness App will move to production real soon now. Volunteer Sites will be called to comment on its functionality.
- PIC-CERN PhEDEx transfers failing are not yet understood; they are possibly due to a bug in one of the involved MW components (EOS, dCache, FTS-3, Globus, ...) or a misconfiguration somewhere. The MW Officer and EOS and FTS-3 experts at CERN are looking into this.
- There was an ARGUS Collaboration meeting on Oct. 9th. The next one will be on Nov. 6th. The periodical sudden bursts of high load on the CERN ARGUS servers still persist and are not yet explained. A number of other issues from which CMS suffered are now understood. There is one more FTE now working in the ARGUS dev. team. Moving to CentOS7 is hoped to solve a number of issues currently due to historical dependencies.
- Less than usual participants joined this meeting from the Volunteer Sites. Suggested date for the next one is Wednesday 2nd December at 4pm CET. Objections with alternative dates should be sent to wlcg-ops-coord-wg-middleware@cernNOSPAMPLEASE.ch
Attendance
- Local: Maria Dimou (chair & notes), Alberto Peon (T0), Maarten Litmaath (ALICE & notes), Andrea Manzi (MW Officer), Andrea Sciaba (CMS), Vincent Brillault (security), David Cameron (ATLAS), Lionel Cons (MW Package Reporter developer), Alberto Aimar (CERN IT mgnt).
- Remote: Raul Lopes (Brunel Univ.), Antonio Maria Perez-Calero Yzquierdo (PIC), Vincenzo Spinoso (EGI Ops Officer). Raja Nandakumar (LHCb).
- Apologies: Jeremy Coles (GridPP), Alessandra Doria (Napoli)
Minutes of previous meeting
The minutes of the
last (12th) meeting HERE were approved.
MW Officer report
ATLAS workflow Readiness Verification Status:
CMS workflow Readiness Verification Status:
WLCG MW Readiness Software Status
Decommissioning of the old MW PKG DB nodes:
- only 2 nodes still publishing to the old collectors, (one test node from LHCb and one from LAL), we have contacted the responsibles but not reply so far
- the old MW PKG DB nodes are going to be decommissioned by the end of the month
Recent and on-going development items by the MW Officer:
Next:
- Deployment of the Volunteer site view in production
Sites' feedback
- PIC report:
- Pre-production storage using dCache 2.13.9,
- Running with the most recent xrootd monitoring plugin (now available in the WLCG software repository).
- Phedex transfers:
- currently using PhEDEx: 4.1.3-comp3, moving to 4.1.7, as requested by CMS
- agents configured to use FTS3 pilot at CERN
- links: PIC->CERN and PIC->GRIF_LLR not working for the last weeks.
- links: GRIF_LLR -> PIC OK, CERN -> PIC not working for the last weeks
- HC jobs running fine
- CNAF
- Napoli
- Edinburgh
- QMUL
- Brunel: Raul reported the glidein issues experienced so far are now fixed and the test pilot jobs started appearing.
- CERN
- ... other sites
Report from the ARGUS meeting
- Argus meeting held on Oct 9th agenda with minutes. Maarten's notes on 12/10:
- Investigations of tickets related to Argus at CERN:
- SAM test failures due to gridmapdir corruption (GGUS:116468).
- CMS VOBOX unexpected usage of host proxy instead of user proxies (GGUS:116092).
- User mapping failures due to lack of
lcg-expiregridmapdir
cron job (also GGUS:116092).
- CREAM wrongly remembering Argus troubles for many hours (GGUS:116791).
- Argus meeting. Gridmapdir will become replaceable by a simple DB.
- main points for MW Readiness:
- Argus team expansion
- 1 FTE started working with the ARGUS dev. team. The other position is not yet covered.
Actions
Action items
Done from past meetings can be found
HERE.
- 20150916-01: Andrea S. & David C. to ask CMS and ATLAS whether the xrootd 4 monitoring plugin is important for them. DONE Right after the last MW Readiness meeting, Ilija Vukotic developed a fix that solved the issue with dCache.ATLAS position: the xrootd monitoring plugin is important for them. CMS position: nice to have but not indispensable.
- 20150506-03: NDGF to install the pakiti client. Updated instructions here. Pending.
- 20150318-02: Ben to set-up the ARGUS testbed at the T0. The testbed is there, the load testing is in the list but Postponed
- 20141119-03: Andrea M. to contact the GRIF site to proceed with WN testing via the CMS workflow POSTPONED
Next meeting
AOB
--
MariaDimou - 2015-10-12