WLCG MW Readiness WG 12th meeting Minutes - September 16th 2015

WG twiki

Agenda

Summary

  • The new DPM version is being tested via the ATLAS workflow by the Edinburgh Volunteer site.
  • Many new sites showed interest to participate in MW Readiness testing with CentOS7. It is useful to anticipate the MW behaviour in the event of new HW purchase. DPM validation on CentOS/SL7 is already ongoing at Glasgow.
  • ATLAS and CMS are asked to declare whether the xrootd 4 monitoring plugin is important for them or not. As it is now, it doesn't work with dCache v. 2.13.8
  • Despite the fact that FTS3 runs at very few sites we decided to test it for Readiness. The FTS3 pilot at CERN will be used for this purpose.
  • PIC successfully tested dCache v.2.13.8 for CMS.
  • CNAF has obtained Indigo-DataCloud effort to strengthen the ARGUS development team. The ARGUS collaboration will meet again early October. The problems faced at CERN with a CMS VOBOX are being investigated in ticket GGUS:116092.
  • The next MW Readiness WG vidyo meeting will take place on Wednesday 28 October at 4pm CET.

Attendance

  • Local: Maria Dimou (chair & notes), Ben Jones (T0), Maarten Litmaath (ALICE & notes), Andrea Manzi (MW Officer), Andrea Sciaba (CMS), Vincent Brillault (security), David Cameron (ATLAS), Lionel Cons (MW Package Reporter developer).
  • Remote: Marc Caubet (PIC Storage expert), Raul Lopes (Brunel Univ.), Jeremy Coles (GridPP), Antonio Maria Perez-Calero Yzquierdo (PIC), Samuel Cadellin Skipsey (Glasgow), Vincenzo Spinoso (EGI Ops Officer), Ievgen Sliusar (Kiev university, ALICE), Steve Jones (Liverpool), Yuri Ivanov (JINR), Anton Jose Gamel (Freiburg, ATLAS), Alessandra Doria (Napoli), Peter Gronbech (Oxford), Ewan Mac Mahon (Oxford), Karlis Dreimanis (Liverpool), Michel Jouvin (ARGUS collaboration), Daniele Cesini (CNAF).
  • Apologies: Joel Closier (LHCb)

Minutes of previous meeting

The minutes of the last (11th) meeting HERE were approved.

MW Officer report

Discussion:

  • ATLAS and CMS are asked to use the FTS-3 pilot in their transfer test workflows
    • so that the pilot gets steady (not only ad-hoc) exposure to realistic activities

  • CentOS/SL7 validation of the WN is desirable from the perspective of sites
    • it is the best OS for new HW
    • sites may be forced to run their WN as SL6 VMs instead of physical machines
      • not so easy for some fraction of the sites
      • possible performance issues
    • an SL6 build of the experiment SW may work just fine on the newer OS
      • possibly some compatibility libraries would have to be installed

  • DPM validation on CentOS/SL7 is already ongoing at Glasgow
    • rpms are available, but the configuration does not yet work out of the box
    • other interested sites can contact the DPM team

WLCG MW Readiness Software Status

Input by the developer Lionel Cons: The most important point is the migration from the old collectors to the new collectors (in fact simply changing the configuration file) but the work (asking our users to change) has been done by the MW Officer Andrea Manzi. See also actions for sites which do not yet run the pakiti client.

Sites' feedback

  • PIC report:
    • Preproduction SRM, srm-pps.pic.es, updated to dCache 2.13.8, which supports xrootd 4
    • Monitoring plugin running in the pool not working with 2.13.8:
      • Pool unstable
      • PIC to/from GRIF loadtest injections failing during the last days due to this
      • Monitoring plugin disabled for now, loadtest transfers now OK
    • New phedex Dev loadtest injections PIC from/to CERN set up.
    • Xrootd 4 interaction with SRM from ui.pic.es (xrootd-client 4.2.1) and lxplus (xrootd-client 4.2.3)
      • xrdfs commands work fine from both. CMS TFC enabled ok, can use LFNs.
      • xrdcp not currently working from outside PIC, probably port misconfiguration

  • CNAF news:
    • Pakiti client should be done later this week
    • an issue with the StoRM info provider was fixed this Wed morning
      • the MWR endpoint currently looks OK from the site's perspective

Discussion:

  • the developer of the dCache monitoring plugin (Ilija Vukotic of ATLAS) has already been contacted
  • we may want to formalize its support chain a bit better:
    • how important is this plugin for ATLAS, CMS and maybe LHCb?
      • what if a dCache production instance does not have it?
    • a support list ought to be advertised in a proper location in the WLCG Operations web

Report from the ARGUS meeting

  • Argus meeting held on Sep 4
  • also summarized in the GDB introduction of Sep 9

  • main points for MW Readiness:
    • Argus team expansion
      • 1 Indigo-DataCloud FTE started at CNAF on Sep 1
        • partially available for Argus
      • a 2nd FTE may start Oct 1
    • EL7 support looking good
      • a complete set of rpms is available and simple tests worked OK
      • all work with Java 8
      • the PAP component has the desired Java dependencies
      • the PDP and PEPd components run with older dependencies
        • code changes are needed for newer versions of jetty and bouncycastle
        • expected to take a number of days rather than weeks
      • all dependency versions can be used in parallel on the same host
      • since we are not in a hurry yet, we decided to wait for the "clean" release
        • a single set of dependencies
    • CERN is available for stress testing of the EL7 (pre-)release
      • one preprod EL7 node will be added to the site-argus.cern.ch alias
    • the recurrent issue at CERN has made some progress
      • on at least 2 occasions in the past weeks a correlation was observed between Argus overload and abnormally high rates from a few CMS VOBOXes
        • a DoS can kill any service
        • we may establish separately if Argus should be able to handle more
      • a set of particular failures experienced by CMS VOBOXes are being investigated through GGUS:116092
      • one client mistake on the CMS side has been identified
        • not yet clear how much it contributed to the problems

Actions

Action items Done from past meetings can be found HERE.

  • 20150916-02: David C. to ask ATLAS to run Rucio tests on the FTS3 pilot. Done
  • 20150916-01: Andrea S. & David C. to ask CMS and ATLAS whether the xrootd 4 monitoring plugin is important for them. New
  • 20150617-02: Andrea S. to discuss with CMS mgmt whether to stay with dCache testing with xrootd3 or move to xrootd4. JIRA:MWREADY-66 Done
  • 20150617-01: Antonio Y. (PIC) to follow progress on the xrootd monitoring plugin issue found via the dCache testing at PIC for CMS. JIRA:MWREADY-65 Done
  • 20150506-03: NDGF, Triumf, CNAF, PIC to install the pakiti client. Updated instructions here. PIC Done. CNAF will be done this week. NDGF, Triumf Pending.
  • 20150506-02: Joel and Stefan to state if and how they wish to participate in the MW Readiness verification effort. The voms client is their contribution. Done
  • 20150506-01: Maarten to check with ALICE which sites use which xrootd version and if they wish to participate in the MW Readiness verification effort. Less than 10 ALICE sites moved to xrootd >= 4.1 so far, but those newer versions have been shown to work in the meantime. MW Readiness activities are not a priority for the experiment at this point. Done
  • 20150318-02: Ben to set-up the ARGUS testbed at the T0. The testbed is there, the load testing is in the list but Postponed
  • 20150318-01: Manuel to communicate to EOS and FTS managers the reminder of the Pakiti client installation instructions here. FTS is done. EOS done on 2015/09/17. Done
  • 20141119-03: Andrea M. to contact the GRIF site to proceed with WN testing via the CMS workflow POSTPONED
  • 20140702-06: Andrea M. & Lionel Discuss the visualization of testing results. Done

Next meeting

  • Wed 28 October 2015 at 4pm CET

AOB

-- MariaDimou - 2015-06-15

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng HC_test_T1_ES_PIC.png r1 manage 32.0 K 2015-06-16 - 16:32 AntonioPerezCalero HC jobs reading from dcache validation storage at PIC
Edit | Attach | Watch | Print version | History: r42 < r41 < r40 < r39 < r38 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r42 - 2018-02-28 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback