Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.

-- The London (UK) service is on +442030510622.

-- The meeting extension is 109308582. PIN 1234

Chair:  Kashif

Minutes:

Apologies: Alessandra , Elena, Daniela

Andrew McNab

Brian Davies

Darren Moore

David Crookes

Duncan Rand

Emanuele Simili

Gordon Stewart

Ian Collier

John Hill

Kashif Mohamed

Linda Cornwall

Matt Doidge

Paige Winslowe Lacesso

Pete Clarke

Raja Nandakumar

Raul Lopez

Sam Skipsey

Ste Jones

Teng Li

Vip Davda

 

Chair: Kashif

Minutes: Sam Skipsey

 

Apologies: Daniela, Elena, Alessandra

————————————

 

LHCb Update: Raja

QMUL in D/T, waiting for update

What happened to Lancs late last week - LHCb jobs dropped to 0 for a bit then recovered

 

[Matt in Chat: Matt Doidge: (14/05/2019 11:05)

We had an accident at Lancaster where most of our queues on our WNs went into an error state and the monitoring didn't pick it up ]

T1 is fine.

 

Dune Update: Raja

 

T1 has allocated Tape for DUNE, but needs tape robot. 

No jobs from DUNE running at T1 right now, T1 investigating

[ comment from Darren & Kashif, Darren will update Raja offline]

 

CMS update:

apologies Daniela

 

ATLAS update:

Elena apologies, Alessandra apologies

 

Matt notes Elena has said that people with issues should email UK Cloud support, as Tim/Stewart are holding the fort.

 

Other VOs:

 

NTR

 

——————————

 

General Updates

 

HEPSYSMAN registration still open.

- David Crooks notes deadline for Security day is end of tomorrow [Wed 15th] especially if accomm needed.

ARC man meeting is next week?

 

Last week there was EGI conference & GDB.

 

- benchmarking updates.

are people still using HEPSPEC06?

  Steve notes that he doesn't know people use anything else. (Matt notes that non-HEP people do  - but of course in Grid, we care about using the metric everyone else does for comparison).

Ian Collier notes that we *have* to use it for comparison / how compute is pledged. Benchmarking working group's long-term task has been to resolve the divergence between HEPSPEC performance and the real scaling of various HEP workflows for different experiments / how we come up with an alternative, which is suitable for all the allocation tasks as well. 

Benchmarking working group was waiting for the next SPEC release, but this has the same issue as SPEC06 [and HEPSPEC06] did in terms of reflecting scaling. Working on containerised Experiment workflows for testing scaling / and then working with SPEC to make a good benchmark.

 

Post-CREAM-CE - recommendation is now ARC CE and HTCondor CE.

 

Security status talk is similar to what was given by David at GridPP42.

 

General updates

 

There's a WLCG Ops Coordination meeting this Thursday - who attends it for the UK now (Jeremy used to). Matt has been volunteered. 

 

EGI Ops May meeting has been cancelled.

 

On-Duty: Andrew McNab reports "status unremarkable"

 

Security update: David Crooks

People are encouraged to sign up to the HEPSYSMAN Security pre-workshop. DC hopes that this session will help to establish what we need sites to know, and have as foundation training in future. 

People who can't make it should let David know so he can make other arrangements to support people. It's *possible* Vidyo access to some parts of the training session might be available.

 

 

Matt's tickets update:

 

IPv6 tickets (at various levels of progression, see Matt's notes in ticket roundup)

 

Bham Cream CE decomm ticket - as far as we know this has gone well [but Mark not here to comment]

 

Glasgow ATLAS analysis ticket [which is really just ATLAS complaining at job throughput being low]

 

Glasgow MICE DFC ticket [waiting on MICE to confirm if user needs space]

 

ECDF tickets = Teng notes that all the ECDF tickets were due to our ARC CE being broken [and we're in the process of building a new one to stop this problem].

 

Sheffield tickets (Elena not here to comment)

 

Liverpool ticket with Biomed wanting "automatic spacetoken" - plan is to leave this until Liverpool DOMEs and then use quotatoken.

 

UCL have VAC-in-a-Box not working ticket, but it looks like no-one in UCL is looking at it.

 

QMUL tickets mostly due to powercut yesterday. 

 

Tier 1 tickets: 

DUNE tickets [see discussion with Raja in DUNE update]

 

———————————

 

EGI Conference topics: Steve Jones giving overview. ]

 

CREAM CE "future" [what we move to from it]

 

HTCondor CE talk from Steve[on APEL accounting], Brian Bockleman

ARC CE talk from Balaz

"Steve thinks that either could be used to replace CREAM"

 

 

Federated Data

[ a discussion of the technologies discussed] 

 

 

No other business.

 

[We should plan minute-takers in advance]

 

———————————————

 

Pete Clarke volunteers a update on GridPP6 progress. (And we can't say some things, especially because you can't pre-judge how a panel really feels). 

 2 Panels last week - 

GridPP6 Review Panel "was fine". It was clear that the Panel and STFC understand and appreciate the effort applied by GridPP staff to support non-LHC work. Most of the questions were actually on the hardware fund [see DB's GridPP42 talk for the origins of this disparity].

Oversight Committee (with new membership) . Panel were supportive.

 

—————————————————

 

Chat log

 

Matt Doidge: (14/05/2019 11:05)

We had an accident at Lancaster where most of our queues on our WNs went into an error state and the monitoring didn't pick it up

Raja: (11:06 AM)

Oh - okay. Thanks!

Ste Jones: (11:12 AM)

EGI COnf Link (will exmlain later...) https://indico.egi.eu/indico/event/4431/timetable/#20190507

David Crooks: (11:25 AM)

Apologies for the phone ringing on my end

We've lost you Ste

Paige Winslowe Lacesso: (11:53 AM)

THANKS for hosting, Kashif!

Thanks for that update

 

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam

    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion topics 40m

      EGI Conference 2019
      https://indico.egi.eu/indico/event/4431/timetable/#20190506

      1. Future after CREAM-CE https://indico.egi.eu/indico/event/4431/session/15/?slotId=0#20190507

      2.

    • 12:20 12:25
      Actions & AOB 5m