WLCG MW Readiness WG 13th meeting Minutes - October 28th 2015

WG twiki

Agenda

Summary

  • The xrootd 4 monitoring plugin issue with dCache v. 2.13.8 was fixed by the developer immediately after the last MW Readiness meeting. ATLAS confirmed the plugin is important for them. CMS finds it 'nice to have', not indispensable.
  • One host of the CERN FTS-3 pilot will be running CentOS7. In this way ATLAS and CMS will be able to test FTS3, via their workflows, on both OS environments.
  • We need a contact at DESY to arrange the dCache on Prometheus testing.
  • NDGF is the only Volunteer Site where the pakiti client installation is still due.
  • The MW Readiness App will move to production real soon now. Volunteer Sites will be called to comment on its functionality.
  • PIC-CERN PhEDEx transfers failing are not yet understood; they are possibly due to a bug in one of the involved MW components (EOS, dCache, FTS-3, Globus, ...) or a misconfiguration somewhere. The MW Officer and EOS and FTS-3 experts at CERN are looking into this.
  • There was an ARGUS Collaboration meeting on Oct. 9th. The next one will be on Nov. 6th. The periodical sudden bursts of high load on the CERN ARGUS servers still persist and are not yet explained. A number of other issues from which CMS suffered are now understood. There is one more FTE now working in the ARGUS dev. team. Moving to CentOS7 is hoped to solve a number of issues currently due to historical dependencies.
  • Less than usual participants joined this meeting from the Volunteer Sites. Suggested date for the next one is Wednesday 2nd December at 4pm CET. Objections with alternative dates should be sent to wlcg-ops-coord-wg-middleware@cernNOSPAMPLEASE.ch

Attendance

  • Local: Maria Dimou (chair & notes), Alberto Peon (T0), Maarten Litmaath (ALICE & notes), Andrea Manzi (MW Officer), Andrea Sciaba (CMS), Vincent Brillault (security), David Cameron (ATLAS), Lionel Cons (MW Package Reporter developer), Alberto Aimar (CERN IT mgnt).
  • Remote: Raul Lopes (Brunel Univ.), Antonio Maria Perez-Calero Yzquierdo (PIC), Vincenzo Spinoso (EGI Ops Officer). Raja Nandakumar (LHCb).
  • Apologies: Jeremy Coles (GridPP), Alessandra Doria (Napoli)

Minutes of previous meeting

The minutes of the last (12th) meeting HERE were approved.

MW Officer report

ATLAS workflow Readiness Verification Status:

MW Product version Volunteer Site(s) pakiti client installation status Comment on testing on CentOS7 Other comments
DPM 1.8.10 Edinburgh OK up-to-date test machine setup at Glasgow for this, waiting to complete the Atlas specific configuration upgraded , waiting for reconfiguration as per https://its.cern.ch/jira/browse/MWREADY-82
StoRM 1.11.9 QMUL & CNAF OK up-to-date n/a problems running panda jobs https://its.cern.ch/jira/browse/MWREADY-61 @CNAF
dCache 2.10.42 & 2.13.8 Triumf (2.10.42) & NDGF (2.13.8) not yet installed at NDGF n/a Triumf all OK. The NDGF verification cannot progress due to a blocking issue with IPv6. A.O.B. Can someone help to make progress with the DESY offer to test dCache on Prometheus as per https://its.cern.ch/jira/browse/MWREADY-36?
HT-Condor (condor-g) 8.3.2 CERN OK up-to-date n/a new HTCondor version released today 8.4.1.to ask for upgrade
FTS3 3.3.3 CERN OK up-to-date n/a new version 3.4.0 to be tested soon
other?          

CMS workflow Readiness Verification Status:

MW Product version Volunteer Site pakiti client installation status Comment for testing on CentOS7 Other comments
DPM 1.8.10 GRIF OK up-to-date GRIF plans to run tests on CentOS7 in Q4 2015 as per https://its.cern.ch/jira/browse/MWREADY-71. What exactly can be done now? problems showed-up with DPM-DSI 1.9.5-8 and gridftp redirection as per https://its.cern.ch/jira/browse/MWREADY-89
dCache 2.13.9 PIC OK up-to-date n/a PIC please comment on status in https://its.cern.ch/jira/browse/MWREADY-91
EOS 0.3.129-aquamarine CERN OK up-to-date n/a Issue with PIC -> CERN PhEDEX transfers as per https://its.cern.ch/jira/browse/MWREADY-81
FTS3 3.3.3 CERN OK up-to-date n/a new version 3.4.0 to be tested soon
ARC-CE 5.0.3 Brunel OK up-to-date n/a Issues in the glidein factory as per https://its.cern.ch/jira/browse/MWREADY-84

WLCG MW Readiness Software Status

Decommissioning of the old MW PKG DB nodes:

  • only 2 nodes still publishing to the old collectors, (one test node from LHCb and one from LAL), we have contacted the responsibles but not reply so far
  • the old MW PKG DB nodes are going to be decommissioned by the end of the month

Recent and on-going development items by the MW Officer:

Next:

  • Deployment of the Volunteer site view in production

Sites' feedback

  • PIC report:
    • Pre-production storage using dCache 2.13.9,
    • Running with the most recent xrootd monitoring plugin (now available in the WLCG software repository).
    • Phedex transfers:
      • currently using PhEDEx: 4.1.3-comp3, moving to 4.1.7, as requested by CMS
      • agents configured to use FTS3 pilot at CERN
      • links: PIC->CERN and PIC->GRIF_LLR not working for the last weeks.
      • links: GRIF_LLR -> PIC OK, CERN -> PIC not working for the last weeks
    • HC jobs running fine

  • CNAF
  • Napoli
  • Edinburgh
  • QMUL
  • Brunel: Raul reported the glidein issues experienced so far are now fixed and the test pilot jobs started appearing.
  • CERN
  • ... other sites

Report from the ARGUS meeting

  • Argus meeting held on Oct 9th agenda with minutes. Maarten's notes on 12/10:
    • Investigations of tickets related to Argus at CERN:
      • SAM test failures due to gridmapdir corruption (GGUS:116468).
      • CMS VOBOX unexpected usage of host proxy instead of user proxies (GGUS:116092).
      • User mapping failures due to lack of lcg-expiregridmapdir cron job (also GGUS:116092).
      • CREAM wrongly remembering Argus troubles for many hours (GGUS:116791).
    • Argus meeting. Gridmapdir will become replaceable by a simple DB.

  • main points for MW Readiness:
    • Argus team expansion
      • 1 FTE started working with the ARGUS dev. team. The other position is not yet covered.

Actions

Action items Done from past meetings can be found HERE.

  • 20150916-01: Andrea S. & David C. to ask CMS and ATLAS whether the xrootd 4 monitoring plugin is important for them. DONE Right after the last MW Readiness meeting, Ilija Vukotic developed a fix that solved the issue with dCache.ATLAS position: the xrootd monitoring plugin is important for them. CMS position: nice to have but not indispensable.
  • 20150506-03: NDGF to install the pakiti client. Updated instructions here. Pending.
  • 20150318-02: Ben to set-up the ARGUS testbed at the T0. The testbed is there, the load testing is in the list but Postponed
  • 20141119-03: Andrea M. to contact the GRIF site to proceed with WN testing via the CMS workflow POSTPONED

Next meeting

AOB

-- MariaDimou - 2015-10-12

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng HC_test_T1_ES_PIC.png r1 manage 32.0 K 2015-06-16 - 16:32 AntonioPerezCalero HC jobs reading from dcache validation storage at PIC
Edit | Attach | Watch | Print version | History: r54 < r53 < r52 < r51 < r50 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r54 - 2018-02-28 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback