WLCG MW Readiness WG 19th meeting Minutes - November 2nd 2016
WG twiki
Agenda
Summary
- The pakiti client is in cvmfs now. Details here.
- LHCb will participate in the FTS verification effort, a way to avoid, as much as possible, surprises like the checksum problem (GGUS:124136) met on Sept. 28th. They will also participate in the verification of the CE and storage types that they use.
- CMS will discuss internally participation in the EL7 UI bundle/rpm testing.
- The experiment plans around EL7 migration will be discussed in this WG. Today's situation is, mostly, with the exception of an ATLAS update as reported at the dedicated Ops Coord meeting of Sept. 1st.
- The WG Mandate was reviewed and confirmed as still valid.
- The date for the next meeting is not yet defined Please email the e-group of the WG as soon as a vidyo meeting is desirable and to accelerate exchanges in jira. Our tracker is https://its.cern.ch/jira/projects/MWREADY. The jira dashboard view always shows a snapshot of open tickets.
- Please observe the actions and communicate progress to the e-group.
Attendance
- local: Maria Dimou (chair & notes), Maarten Litmaath (ARGUS report), Andrea Manzi (MW Officer), Vincent Brillault (WLCG Security), Julia Andreeva (WLCG Ops), Stefan Roiser (LHCb).
- remote: Christoph Wissing & Daniele Bonacorsi (CMS), Matt Doidge (Lancaster), Vincenzo Spinoso (EGI), Frederique Chollet (LAPP).
- apologies: Andrea Sciabà (CMS), Raul Lopes (Brunel), Jeremy Coles (GridPP).
Minutes of previous meeting
The minutes of the
last (18th) meeting HERE are accepted.
Verification status report
The
MWREADY JIRA dashboard shows the latest status info of open tickets. Summary of progress since our last meeting is in the tables below.
Maria closed
JIRA:MWR-36 and
JIRA:MWR-100 as per last meeting's
decision to close idle tickets for a great amount of months.
ATLAS workflow Readiness Verification Status:
CMS workflow Readiness Verification Status
Verifications for both ATLAS & CMS
During the discussion about this table:
- LHCb is encouraged to participate in the MW Readiness verification effort, e.g. with FTS testing, to start with. The way to go is to:
- Announce to our WG a contact person in the experiment
- This person will contact those Volunteer Sites which support LHCb and prepare the test environment (dedicated batch queues, announcement of the end-points, as appropriate).
- The set-up will be sent to the WG chair, Maria for update of the Experiment workflows' section of the WG twiki.
- Christoph will discuss internally in CMS about the EL7 UI testing. There are some MW components missing from the bundle so far. Data Management parts are included but for example CREAM CE is missing. Andrea M., MW Officer will be informing the e-group as additions arise.
During this discussion Stefan noticed that the section
Tasks overview of the
twiki is out-of-date. Maria will move this section to the archive part of the same twiki.
Discussions around EL7 following the Sept 1st Ops Coord theme
- ATLAS update from 25 October
- Stefan said LHCb uses in operation and for a long time already SL6 binaries on EL7 (simulation workflow).
- Maarten confirmed that ALICE has the same approach. They built every package on SL5 and this works on EL7.
- Christoph said there are no changes in CMS to the ones presented at WLCG Ops Coord on Sept 1st (all slides linked from the agenda).
- Maarten said that in case pure EL7 builds cannot be used for quite a while, CMS experts are looking into containers that present an SL6 environment to the jobs.
- Julia said the MW Readiness WG should be the forum where updates from the experiments on EL7 migration are reported.
- Maarten said we cannot hide behind the original statement that SL6 would be the official OS until the end of Run 2, because some sites (e.g. NDGF, IN2P3) will need to run EL7 on new HW and/or feel a steadily mounting pressure from other customers asking for the OS to be upgraded.
WLCG MW Readiness Software Status
- No more developments are planned. Info by Vincent and Andrea M.:
The pakiti client is now available also via CVMFS grid.cern.ch . In order to send data to the MW readiness collector site managers can mount the cvmfs grid.cern.ch and use this command in their cron:
/cvmfs/grid.cern.ch/pakiti/bin/pakiti-client --site <site_name> --conf /cvmfs/grid.cern.ch/pakiti/conf/WLCG-MWR.conf
Andrea M. will update the Pakiti documentation accordingly. See also
GGUS:124207.
Sites' feedback
- Brunel
- 3 issues encountered with ARC-CE 5.1.1 + HTCondor 8.5.6
- GGUS:123947, HTCondor 8.5.6 changed the default condor_q output such that only the current user's jobs are returned. This broke the job monitoring in ARC. setting CONDOR_Q_ONLY_MY_JOBS=false fixes the issue.
- GGUS:124253, ( on going investigation) It seems that in the presence of job flocking the ArcCE that initially receives the job submission removes the job's directory, this affects APEL Accounting
- http://bugzilla.nordugrid.org/show_bug.cgi?id=3604, reported by Thomas Hartmann. job submission breaks when updating globus-gssapi-gsi from 11.22-1 to 12.5-2. Problem fixed, needs a rebuild of ARC-CE.
- 1 issue reported to ARGUS
- GGUS:124315 : Configuration problem when using pure IPV6 WN
- moved production DPM DB to MariaDB on EL7, no issues so far
Special topic
Major releases coming out this year ( that we are aware of)
- CREAM-CE ( with EL7 support)
- dCache 3.0.0 ( already released, Running at NDGF-T1)
- DPM 1.9.0 ( already released)
MW readiness mandate review and products review?
Feedback received until 24/10 is:
- The MW Readiness WG is still useful and should be kept alive with a meeting frequency 'dictated' by the MW products' changes.
- It is via this group that the info on EGI package status gets to the sites.
- The WLCG Ops community counts on the Volunteer Sites of the MW Readiness WG for CentOS7 testing
- We've been lucky to have a calm MW development in the past few months but we wouldn't be able to do without this verification process under a future intense release activity.
During the discussion, Maria re-read the
WG Mandate which was agreed as still valid. About active participation by the experiments, Stefan said that LHCb only tests services, as clients come from cvmfs. Individual package versions come at random times, so we can't decide on the frequency of this meeting because the testing process is continuous. The major milestones that need to be achieved should dictate the frequency of this meeting.
Julia and all agreed on this. Maria and Andrea said that our report at the WLCG Ops Coord meeting monthly and the Monday 3pm is regularly communicating progress on our work. Meetings will be called when there is something special to discuss.
Report from recent ARGUS meetings
- Argus meeting held Sep 2
- Next meeting Nov 4
- main items for MW Readiness:
- CERN is running Argus 1.7 in production since Aug 11
- Release notes have been provided
- EGI has verified the update for inclusion in the upcoming UMD 4.3.0 (Nov)
- Staged Rollout reports have been provided by Brunel and CERN
Future releases will keep being tested on the QA nodes at CERN but the development will mostly concern new functionality not necessarily concerning us, so the tests will simply make sure that what we need still works.
Tests on IPv6-only deployment takes place at Brunel.
Actions
Action items
Done from past meetings can be found
HERE.
- 20161102-05: Christoph to investigate EL7 UI testing by CMS. Keep Andrea S. informed as maintainer of the workflow twiki.
- 20161102-04: Andrea M. to update the pakiti documentation.
- 20161102-03: Maria to remove the out-of-date Tasks overview from the WG twiki. DONE twiki up-to-date and announced on 20161201.
- 20161102-02: Stefan to appoint a LHCb member to join the WG. DONE Marcello is appointed.
- 20161102-01: Andrea S. to update the CMS workflow twiki.
- 20160518-02: EL7 experiments' intentions Done via Ops Coord on Sep 1st - see details on the agenda
Next meeting
- No meeting planned for now. MW Releases, updates in Ops Coord and the Mon 3pm will dictate when we should fix a date for a meeting.
AOB
Stefan said the existing Volunteer Sites which happen to support LHCb should be approached to also take care of LHCb services' verification. This is minuted in the relevant sections earlier in these minutes to keep the whole issue complete.
--
MariaDimou - 2016-09-21