- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
- This is the weekly GridPP ops & sites meeting
- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6
-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.
-- The London (UK) service is on +442030510622.
-- The meeting extension is 109308582. PIN 1234
Chair: Kashif M
Minutes:
Apologies:
Present:
Andrew McNab
Daniela Bauer
Darren Moore
Elena Korolkova
Emanuele (new at Glasgow)
Kashif M
Matt Doidge
Gordon Stewart
John Hill
Raja K
Rob Currie
Duncan Rand
Mark Slater
Winnie Lacesso
Sam Skipsey
Vip
Robert Frank
David Crooks
Alessandra Forti
Chris Brew
Steve Jones
Leo Rojas (joined late)
*** Review of weekly issues by experiment/VO
* LHCb (Raja): major problem with DIRAC infrastructure and VAC and vcycle
after update to DIRAC. Pilot version was downgraded this seemed to fix
it. Still working on it. Some minor problems at UK sites, but all under control
* CMS (Daniela)
All sites green on monitoring. Brunel in Waiting Room after misdeclared
downtime, but should be out now. Smooth running over Christmas.
* ATLAS (Elena)
Two open tickets:
RAL-LCG2 and singularity: https://ggus.eu/?mode=ticket_info&ticket_id=138033
RALPP and ipv6 transfers:
https://ggus.eu/?mode=ticket_info&ticket_id=139127
Chris: ipv6: No external firewall, only affects Atlas, CMS seems to
work. Will have another look.
Chris (in chat window):
Got it! Suddenly occurred to me as I was speaking that a problem affecting Atlas and not CMS might be down to a problem on the pool node, not the GridFTP mover or higher up the network stack. And looking at the Pools I find a couple on SL6 nodes that only appeared to have working IPv6. Restarting the network on those seems to have solved the problem
Kashif: Local data disk is full, what to do ? Elena: It's a known problem,
Atlas working on it.
* T2K and the missing checksums
Some discussion about DPM without DOME. Conclusion: Dump all available
checksums from database and ask T2K if they are still interested in the
files without checksums. Attach this list to your t2k ticket, please.
Jeremy's activity webpage, please fill it out:
https://www.gridpp.ac.uk/wiki/Engagements_and_commitments
GDB 16th: ipv6, storage accouting
Tier1: All quiet.
EGI ops meeting (Kashif): No new releases that affect UK sites. Preparing
IPV6 report, Kashif updated UK status
*** Security
David Crooks: systemd vulnerability, updates from Redhat, Centos available,
another advisory will be out, no reboot required
Please attend workshop if you can:
WLCG Security Operations Center WG Workshop/Hackathon
https://indico.cern.ch/event/775579/
*** Tickets (Matt):
40 Open UK Tickets this week.
T2K DFC Migration on DPMs
Liverpool: https://ggus.eu/?mode=ticket_info&ticket_id=138648
Oxford: https://ggus.eu/?mode=ticket_info&ticket_id=138647
Sheffield: https://ggus.eu/?mode=ticket_info&ticket_id=138649
Lancaster: https://ggus.eu/?mode=ticket_info&ticket_id=138365
[Already discussed.]
A quick summing up of these tickets- to provide the information T2K need (namely adler32 checksums for files that don't already have them) it appears your DPM needs to be DOME'd. At Lancaster seem to be having the most luck with this so far so please feel free to prod me about it.
v6-looking transfer problems
Liverpool (lhcb): https://ggus.eu/?mode=ticket_info&ticket_id=138943 (19/12) (fixed)
RALPP: (atlas): https://ggus.eu/?mode=ticket_info&ticket_id=139127 (10/11)
(discussed earlier)
Bristol LHCB Ticket
https://ggus.eu/?mode=ticket_info&ticket_id=138402 (21/11/18)
Are the issues described in this ticket still happening? That might be a
question for the VO rather then the site. (6/12/18)
Now works after Bristol disabled SL7 workernodes for LHCb. Bristol still
working on it.
Last Year's Tier 1 Tickets:
https://ggus.eu/?mode=ticket_info&ticket_id=138665 (LFC access issues)
https://ggus.eu/?mode=ticket_info&ticket_id=138500 (CMS transfer failures)
(needs an update)
https://ggus.eu/?mode=ticket_info&ticket_id=138361 (T2K DFC migration) (under
control -- Daniela)
Matt intends to go over all the ipv6 tickets next week, so please update them !
Site round table:
Manchester: (Alessandra) Storage upgrade, (Andrew): Vac and VCycle wrt IRIS
RALPP: (Chris): more nodes onto ipv6, nothing urgent happening, storage is dual
stack
Imperial (Daniela): Getting IRIS storage and compute into the racks up and running.
RAL-LCG2 (Darren): All good.
Sheffield (Elena): CentOS7.
Cambridge (John): Personar, new CPU
Oxford (Kashif): move to CentOS7
Birmingham (Mark): ipv6 - dualstacking perfsonar, orders for new storage, last
DPM user evicted, should be able to decommission DPM. Goal: just EOS and VAC
Lancaster (Matt): Racking up new kit. Looking at HTCondorCEs. Updating systemd
:-)
Bristol (Winnie): Debug why LHCb doesn't work on SL7, replacing some hardware.
Edinburgh (Rob): DPM 1.11, Cloud, ipv6
Glasgow (Sam): waiting for DPM 1.11 to be stable, hope this will help for a
variety of problems, considering DOME
Liverpool (Steve): can't do anything on ipv6, HTCondorCE
Sussex (Leo): Nothing to report, Physics looking for someone to replace Leo,
buying kit for 10k, ipv6 works
[Raja had to leave at 11:45]