- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
- This is the weekly GridPP ops & sites meeting
- The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6
-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers.
-- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002
-- The meeting extension is 109308582. PIN 1234
Chair: Jeremy C
Minutes: David C
Apologies:
Andrew McNab, Chris Brew, Daniela Bauer, Dan Traynor, David Crooks (minutes), Duncan Rand, Elena Korolkova, Gareth Roy, Gordon Stewart, Ian Loader, Jeremy Coles (chair), John Hill, Leo Rojas, Linda Cornwall, Mark Slater, Matt Doidge, Pete Gronbech, Raul Lopes, Robert Frank, Rob Currie, Sam Skipsey, Steve Jones, Vip Davda
Andrew: Looked at Ops meeting yesterday, nothing UK specific
Jeremy: Nothing in WLCG Ops either
Daniela: Fairly quiet, Bristol has 4 tickets (only just after holiday weekend)
IPv6, please go and get it.
Elena: Problem with VOMS proxy, many queues went into test mode. Wrote to experts, disable HC test and put queues online. Problem was solved later, EPEL packages renamed? Think it's fixed.
Everything working after 2 hours.
General UK sites:
- Sussex to go CPU only, use T1 or RHUL for storage
- Create SL7 queues for Brunel (for consistency, currently working OK)
- QMUL transfers, open ticket. Alessandra has checked settings for queues, not sure of situation.
- Sheffield situation, increase time in gridftp config, didn't help
Steve: Settings change didn't work for Liverpool?
Elena: Ask DDM experts
Discuss on Thursday
Jeremy: Re WLCG Ops, ATLAS EOS crashed on Friday. VOMS proxy, issue with renewal
Other VO updates?
Pete G: Enabled access for LZ/SKA at Oxford, might want some testing, haven't added new VO for a while - will investigate this week
Gareth R: VAC pool, updated 50% to V3, added vac pipe, all enabled VOs. Notice intermittent rate of jobs. Fair share setting? Have seen LZ/LSST, mostly pheno, LHCb.
Andrew M: Mostly get pheno with this.
Daniela: LZ only send to targeted sites. Need 4GB, if have that then let me know. Also discovered bug in sim, halted work.
Mark Slater: Switched to V3, have seen MICE, small number of others, not that much running.
Gareth: OK, checking that I haven't messed it up.
Andrew M: Accounting portal is good to see shares, permissions (which accounting portal, DIRAC?)
Daniela: Meant to be open
Andrew: Can't see VOs not a member of ?
Daniela: May have got lost on upgrade of web interface
Gareth R: Daniela, LZ wants 4G/VM? I think we can, need to change config. Can mess around, will get back to you.
Elena: LZ don't currently use VAC queue
Daniela: But could do if the config works
Jeremy: Check back on DIRAC portal next week.
- EOSC-Goc
- Transfer failures? No update, interesting topic
- OMB
- Hardware survey, 4 yet to come in
- Brunel (passed on to Duncan), Imperial, UCL, Manchester (being worked on)
NTR: Sam gives credit to Brian for gridftp timeout settings
- Vac 3.0/Vac pipes: https://www.gridpp.ac.uk/wiki/Vac_configuration_for_GridPP_DIRAC
Updates to Interoperation Key Docs
EGI IGTF CA update
Reminder that when doing updates that some worker nodes can get left behind
Gareth R: Do we know when perfsonar 4.1 is due?
Duncan: No, was meant to be Q1.
https://ggus.eu/?mode=ticket_info&ticket_id=134899 now closed
IPv6 in Condor (see transcript)
- Final stages of adding extra hardware
- Pretty much final setup of ARC service
- Bringing on storage
Nothing to add from Robert
- WN/storage in boxes, unpack this week
- 600 TB storage, although half replacing decommissioned estate
- IPv6, really need time to bring to PerfSonar
Nothing to add from Ian Loader
- David, documentation and handover
- Gareth: VAC nodes to new version
- restructuring how compute is provided
- fill rates, poor fill rates on multicore, worth keeping?
- smaller vac pool supports small VOs
- WIP, need numbers
- opportunity to audit site
- not in huge rush to upgrade to C7
- generally tidy up. changes with central services, pushing IPv6 but slow progress. DC delayed again, planning permission... rely on future plansImperial (Daniela)
- mostly working on DiRAC data mover
- LZ had prod run, resume soon
- couple of open DIRAC issues, workshop at end of May
- Duncan: webdav transfers with Brian Bockelman
- Going slowly, hired new local sysadmin, hasn't started yet.
- SL7, going slowly
- couldn't move most of storage to C7, old/built on software RAID, difficult to upgrade
Vac V3, C7 soon. Forward looking services already on C7. Ones being planned to decommission, don't need to
I'm working on perfsonar upgrade to centos 7
I am also working on our new centos 7 cluster implementing singularity. We are now able to sent jobs to the grid using the Singularity image provided via centos 7
(See transcript)
- Workers in place, takes central IT a while to sort out naming.
- Storage (200TB) in rack, needs names.
- Vac slowly upgrading to V3
- Storage, believe to be in place for ATLAS EOS
- reduce DPM, look to decommission in long term, waiting on ATLAS
- IPv6 got gateway talking, then PerfSonar, then get dual stacking.
- Don't have timeline, new domain handling with central services. Almost at point of just waiting on them
- Upgrading PerfSonar to C7
- trouble getting IPv6 to work, auto conf got turned off, lots of debugging
- Continue looking to move to C7
- WNs have been C7 for ages, services as they are added, next storage
- Singularity: had to move some WNs to C6 for one community, so try singularity
- need to build own copy of Singularity
- ATM configuring new VOs, debugging
- Moved ~5 WNs to C7
- taken small steps towards dual stacking storage (head node done, need local config for pool nodes)
- local monitoring
- T3 in a box
- Updated UMD3->4
- Installed new hardware, 600 slots, E5-2630v4
- HTCondor smaller VO scheduling, gave small slices, smaller VOs come through in fits and starts
- De facto cap
- Replaced by using large accounting group which works better
- Convert to C7 (close to) complete
- Need to do Vac v3/pipes
- Lots of certificates - currently use Cert Wizard, PeCR?
David: That's how we renew certificates, works very well
Chris B: Also suggest updating all certs at same time, even if not needed for some, to synchronise renewals, move to same timeline
Steve: That's good advice.
- moving to new build system (same as the old build system but reimplemented)
- General cleanup
- XFS kernel crashes, under investigation
Revisit later
Proposal for Lightweight Sites WG: Andrew: bring together different initiatives, give WG that is site oriented
David: Advert for Workshop, registration open now: https://indico.cern.ch/event/717615/
Pete: 10 people registered, would very much like people to register/suggest talks. Dan Traynor has suggested having a discussion over role of HEPSYSMAN in light of changing context/outsourcing/etc.
Daniela's Test account: (08/05/2018 12:00)
https://ggus.eu/?mode=ticket_search&show_columns_check%5B%5D=TICKET_TYPE&show_columns_check%5B%5D=AFFECTED_VO&show_columns_check%5B%5D=AFFECTED_SITE&show_columns_check%5B%5D=PRIORITY&show_columns_check%5B%5D=RESPONSIBLE_UNIT&show_columns_check%5B%5D=STATUS&show_columns_check%5B%5D=DATE_OF_CHANGE&show_columns_check%5B%5D=SHORT_DESCRIPTION&show_columns_check%5B%5D=SCOPE&ticket_id=&supportunit=&su_hierarchy=0&former_su=&vo=cms&user=&keyword=&involvedsupporter=&assignedto=&affectedsite=UKI-SOUTHGRID-BRIS-HEP&specattrib=none&status=open&priority=&typeofproblem=all&ticket_category=all&mouarea=&date_type=creation+date&tf_radio=1&timeframe=lastyear&from_date=08+May+2018&to_date=09+May+2018&untouched_date=&scope=&orderticketsby=REQUEST_ID&orderhow=desc&search_submit=GO%21
David Crooks: (12:05 PM)
Sorry. mic was open
Jeremy Coles: (12:14 PM)
https://www.gridpp.ac.uk/gridpp-dirac-sam?vo=skatelescope.eu
Steve Jones: (12:18 PM)
VO status at sites (not VAC): http://pprc.qmul.ac.uk/~lloyd/gridpp/votable.html
Robert Frank: (12:25 PM)
manchester working on it
raul: (12:26 PM)
I emailed it to Duncan
David Crooks: (12:27 PM)
I've lost Jeremy, is it just me?
John Hill: (12:28 PM)
No I did as well
Matt Doidge: (12:28 PM)
I lost him for a bit as well.
Jeremy Coles: (12:28 PM)
Sorry - am I back?
John Hill: (12:28 PM)
yes
David Crooks: (12:28 PM)
Yes, sorry, can hear you fine
raul: (12:33 PM)
IPv6 in condor in the CMS factory is not properly configured
Sorry no mic
Brunel: Storage is all on Centos6. I'm starting to move it to CentOS7. Everything else is CentOS 7
New hardware being commissioned this week
Leo Rojas (Sussex): (12:45 PM)
Hey, mi mic is active and working but you cannot hear me
I'm working on perfsonar upgrade to centos 7
I am also working on our new centos 7 cluster implementing singularity. We are now able to sent jobs to the grid using the Singularity image provided via centos 7
Mark Slater: (12:47 PM)
Forgot to mention: Also updated perfsonar to CentOS 7 as well :)
That was Mark Slater (I obviously put the password in the wrong box when signing in!)
Leo Rojas (Sussex): (12:48 PM)
I mean(the singularity image) provided via CVMFS
David Crooks: (12:59 PM)
https://indico.cern.ch/event/717615/