Operations team & Sites
EVO - GridPP Operations team meeting
- This is the weekly GridPP ops & sites meeting
- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6
-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.
-- The London (UK) service is on +442030510622.
-- The meeting extension is 109308582. PIN 1234
Chair: Kashif
Minutes:
Apologies: Alessandra , Elena, Daniela
Andrew McNab
Brian Davies
Darren Moore
David Crookes
Duncan Rand
Emanuele Simili
Gordon Stewart
Ian Collier
John Hill
Kashif Mohamed
Linda Cornwall
Matt Doidge
Paige Winslowe Lacesso
Pete Clarke
Raja Nandakumar
Raul Lopez
Sam Skipsey
Ste Jones
Teng Li
Vip Davda
Chair: Kashif
Minutes: Sam Skipsey
Apologies: Daniela, Elena, Alessandra
————————————
LHCb Update: Raja
QMUL in D/T, waiting for update
What happened to Lancs late last week - LHCb jobs dropped to 0 for a bit then recovered
[Matt in Chat: Matt Doidge: (14/05/2019 11:05)
We had an accident at Lancaster where most of our queues on our WNs went into an error state and the monitoring didn't pick it up ]
T1 is fine.
Dune Update: Raja
T1 has allocated Tape for DUNE, but needs tape robot.
No jobs from DUNE running at T1 right now, T1 investigating
[ comment from Darren & Kashif, Darren will update Raja offline]
CMS update:
apologies Daniela
ATLAS update:
Elena apologies, Alessandra apologies
Matt notes Elena has said that people with issues should email UK Cloud support, as Tim/Stewart are holding the fort.
Other VOs:
NTR
——————————
General Updates
HEPSYSMAN registration still open.
- David Crooks notes deadline for Security day is end of tomorrow [Wed 15th] especially if accomm needed.
ARC man meeting is next week?
Last week there was EGI conference & GDB.
- benchmarking updates.
are people still using HEPSPEC06?
Steve notes that he doesn't know people use anything else. (Matt notes that non-HEP people do - but of course in Grid, we care about using the metric everyone else does for comparison).
Ian Collier notes that we *have* to use it for comparison / how compute is pledged. Benchmarking working group's long-term task has been to resolve the divergence between HEPSPEC performance and the real scaling of various HEP workflows for different experiments / how we come up with an alternative, which is suitable for all the allocation tasks as well.
Benchmarking working group was waiting for the next SPEC release, but this has the same issue as SPEC06 [and HEPSPEC06] did in terms of reflecting scaling. Working on containerised Experiment workflows for testing scaling / and then working with SPEC to make a good benchmark.
Post-CREAM-CE - recommendation is now ARC CE and HTCondor CE.
Security status talk is similar to what was given by David at GridPP42.
General updates
There's a WLCG Ops Coordination meeting this Thursday - who attends it for the UK now (Jeremy used to). Matt has been volunteered.
EGI Ops May meeting has been cancelled.
On-Duty: Andrew McNab reports "status unremarkable"
Security update: David Crooks
People are encouraged to sign up to the HEPSYSMAN Security pre-workshop. DC hopes that this session will help to establish what we need sites to know, and have as foundation training in future.
People who can't make it should let David know so he can make other arrangements to support people. It's *possible* Vidyo access to some parts of the training session might be available.
Matt's tickets update:
IPv6 tickets (at various levels of progression, see Matt's notes in ticket roundup)
Bham Cream CE decomm ticket - as far as we know this has gone well [but Mark not here to comment]
Glasgow ATLAS analysis ticket [which is really just ATLAS complaining at job throughput being low]
Glasgow MICE DFC ticket [waiting on MICE to confirm if user needs space]
ECDF tickets = Teng notes that all the ECDF tickets were due to our ARC CE being broken [and we're in the process of building a new one to stop this problem].
Sheffield tickets (Elena not here to comment)
Liverpool ticket with Biomed wanting "automatic spacetoken" - plan is to leave this until Liverpool DOMEs and then use quotatoken.
UCL have VAC-in-a-Box not working ticket, but it looks like no-one in UCL is looking at it.
QMUL tickets mostly due to powercut yesterday.
Tier 1 tickets:
DUNE tickets [see discussion with Raja in DUNE update]
———————————
EGI Conference topics: Steve Jones giving overview. ]
CREAM CE "future" [what we move to from it]
HTCondor CE talk from Steve[on APEL accounting], Brian Bockleman
ARC CE talk from Balaz
"Steve thinks that either could be used to replace CREAM"
Federated Data
[ a discussion of the technologies discussed]
No other business.
[We should plan minute-takers in advance]
———————————————
Pete Clarke volunteers a update on GridPP6 progress. (And we can't say some things, especially because you can't pre-judge how a panel really feels).
2 Panels last week -
GridPP6 Review Panel "was fine". It was clear that the Panel and STFC understand and appreciate the effort applied by GridPP staff to support non-LHC work. Most of the questions were actually on the hardware fund [see DB's GridPP42 talk for the origins of this disparity].
Oversight Committee (with new membership) . Panel were supportive.
—————————————————
Chat log
Matt Doidge: (14/05/2019 11:05)
We had an accident at Lancaster where most of our queues on our WNs went into an error state and the monitoring didn't pick it up
Raja: (11:06 AM)
Oh - okay. Thanks!
Ste Jones: (11:12 AM)
EGI COnf Link (will exmlain later...) https://indico.egi.eu/indico/event/4431/timetable/#20190507
David Crooks: (11:25 AM)
Apologies for the phone ringing on my end
We've lost you Ste
Paige Winslowe Lacesso: (11:53 AM)
THANKS for hosting, Kashif!
Thanks for that update