- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
- This is the weekly GridPP ops & sites meeting
- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6
-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.
-- The London (UK) service is on +442030510622.
-- The meeting extension is 109308582. PIN 1234
Chair: JeremyC
Minutes: SamS
Apologies:
Minutes GridPP Operations Meeting 22 Jan 2019
Chair: Jeremy Coles
Minutes: Sam Skipsey
Attending [highest count]:
Alessandra Forti
Daniel Traynor
Daniela Bauer
Darren Moore
Elena Korolkova
Emanuele Simili
Gareth Roy
Gordon Stewart
Ian Loader
Jeremy Coles
John Hill
Kashif Mohammed
Linda Cornwall
Mark Slater
Matt Doidge
Winnie Lacesso
Pete Clarke
Raja Nandakumar
Raul Lopez
Robert Frank
Sam Skipsey
Ste Jones
Teng Li
Vip Davda
Raja:
LHCb - having problems with DIRAC-matcher (matcher seems to be under load, so not matching pilots with payloads). Possibly we'll see an increase in pilots exiting with no work / timeout.
Some GGUS tickets open in UK, but the bulk of them are resolved.
Still see aborted pilots on some times at Liverpool. [Thanks to Catalin for fixing the RALPP issue]
[Raja gives apologies in advance for not being able to attend for next week's Ops meeting]
DUNE still have issue with 5GB transfer limit into RAL [via dynafed/S3 - S3 base problem]. Darren Moore notes that confirmation of fix will be in liaison meeting.
Daniela:
"CMS is incredibly quiet"
NTR.
Elena:
ATLAS had a problem with HTCondorCE at Liverpool (due to misconfig in AGIS).
Alessandra sent email to the sites this morning re: Centos 7 (ATLAS wants to force migration to CC7 resource by June); scratchdisk space (ATLAS would like quota to 100TB/1000 analysis slots); IPv6 reminder that there's an ongoing WLCG campaign towards this.
site levels: no CC7 resources at Sheffield, or Glasgow (or Birmingham?). Camb and Birm have VAC nodes, so "don't need to migrate batch system". [John Hill notes Cambridge VAC is CC7]
*ATLAS Jamboree at beginning of Match - this is a sites Jamboree (5-8 March); 5th will have discussion of "hyperconverged" resources.
Alessandra would like to understand migration for sites. Gareth notes that our planned migration at Glasgow is tied with our new machine room, but this would be after the June deadline (so we'd need to make new plans to hit this).Alessandra things delay past the deadline is fine, with a good excuse (and new machine room is a good excuse).
*all sites - can we update the batch systems status wiki page?
-
Jeremy ops updates: all basically well for experiments.
-
Other VOs updates:
NTR
-
GridPP DIRAC status:
Lancaster from yesterday looked v slightly slow to start jobs - Matt notes that power-issues caused them to lose a rack, with concomitant effect on slots.
-
Meetings and updates:
T1 update: Darren - issue with cvmfs over the weekend, which harmed our efficiency, but now recovering.
T2 evolution: new VM definitions.
Interoperation: EGI OMB last week - Kashif notes it was a short meeting, mostly about HTCondorCE effort.
Security: site patch status? Matt update for David: [Sites advised to patch as this is fairly trivial and doesn't need any dt]. Thanks to everyone who attended the security edition of the Technical Meeting.
Today is the last day to register for next months SOC meeting at Cosner's House. https://indico.cern.ch/event/775579/
-
Services: NTR
-
Tickets (by Matt);
IPv6 tickets
Oxford (Kashif notes this is evolving - 1 router might be updated for IPv6, but DNS etc not so far)
Pete Clarke mentioned that GridPP was "headlined" by a JISC meeting recently due to all our work on IPv6 migration and perfsonar [thanks to Duncan?]
Tier 1 Mice LFC ticket - some kind of weird connection issue?
RALPP Chris debugging error of webdav test (ROD ticket) - error code of "7" (and there's no docs)
QMUL LHCb data transfers ticket.
-
GDB updates:
Upcoming meetings mentioned in GDB - WLCG HSF OSG workshop, HEPIX Spring, ISGC2019, DIRAC Users workshop, DPM Workshop, etc.
SKA-CERN collabo update: [mostly updates we've seen before]. CERN/SKA collabs on OpenStack, OpenLab, ESCAPE {"Exascale science"). Common interest: PRACE, GPGPU etc.
IPV6 deployment presentation: [timescales as mentioned]. Interesting notes on "reasons why sites have not moved" - most common are waiting on the infrastructure in which the site is embedded.
WLCG Storage Accounting: [need a way of reporting storage space which is not based on SRM - we call this SRR]. All SEs can publish to SRR now - but there's dev work needed to implement this. (Lots of work mostly on the nice API for inspection)
Monitoring and Infrastructure: CERN moving to "MONIT" unified monitoring as a service. (Dashboards, Alarms, Search and Archiving all together). Impl based on Kafka/Spark for transport+processing. "need to impl. GDPR"
DOMA-QoS: progress on plans for this project - principle is based on abstracting out the "types" of data modality ("needs REPLICAs", "COLD", "needs FAST access", "just OUTPUT" etc) from our hardware-bound ideas of "DISK" and "TAPE" storage, to potentially save money by allowing infrastructure to automatically provide QoS characteristics by any mechanism which meets the QoS tag's performance/reliability/etc requirements. It was mentioned that the QoS group would welcome new members and input, including from Experiments which have not previously been strongly engaged.
Summary: Stashcache, an introduction to this, and motivation for it [this has fed into some DOMA-QoS and DOMA-ACCESS discussions]. GeoIP in CVMFS v effective.
-
AOB
No AOB.
—
Chat log:
John Hill: (22/01/2019 11:09)
Cambridge VAC CentOS7 already
Elena Korolkova: (11:13 AM)
All uk sites but Sheffield and Glasgow have Centos7 resources
and corresponding queues for Centos7
It's a matter of more resources to be moved Centos7
Matt Doidge: (11:23 AM)
https://indico.cern.ch/event/775579/
Mark Slater: (11:29 AM)
Afraid I've got to go - I still need to get IPV6 DNS entries in for perfsonar
Gareth Roy: (11:31 AM)
https://ps-dash.dev.ja.net/perfsonar-graphs/?source=ps-londhx1.ja.net&dest=ps001.gla.scotgrid.ac.uk&displaysetdest=&url=https://ps-londhx1.ja.net/esmond/perfsonar/archive&reverseurl=https://ps001.gla.scotgrid.ac.uk/esmond/perfsonar/archive&displaysetsrc=#start=1547551180&end=1548155980&summaryWindow=3600&timeframe=1w
Duncan Rand: (11:35 AM)
https://ps-dash.dev.ja.net/maddash-webui/details.cgi?uri=/maddash/grids/UK+Mesh+Config+-+UK+IPv6+Latency+-+Loss/ps002.gla.scotgrid.ac.uk/ps-londhx1.ja.net/Packet+Loss
Gareth Roy: (11:37 AM)
Thanks Duncan
Jeremy Coles: (11:39 AM)
https://www.gridpp.ac.uk/wiki/Batch_system_status