Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers.

-- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002

-- The meeting extension is 109308582. PIN 1234

Chair:  Jeremy C

Minutes: Gareth

Apologies:

Videoconference Rooms
GridPP-Operations
Name
GridPP-Operations
Description
- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the Janet(UK) Community area. Direct link http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MDMaM82v2nD2Du999sD99D - The phone bridge number is +44 (0)161 306 6802. The phone bridge ID is 1001002 with code: 4880. Apologies:
Extension
109308582
Owner
Alessandra Forti
Auto-join URL
Useful links
Phone numbers

26th June 2018

Chair: Jeremy Coles
Minutes: Gareth Roy
Attending: 

Andrew McNab
Brian Davies
Chris Brew
Dan Traynor
Darren Moore
Daniela Bauer
David Crooks
Elena Korolkova
Gareth Roy
Gordon Stewart
Jeremy Coles
John Hill
Linda Cornwall
Matt Doidge
Paige Winslowe Lacesso
Peter Gronbech
Peter Clarke
Rob Currie
Raja Nandakumar
Robert Frank
Sam Skipsey
Steve Jones
Teng Li
Vip Davda

Experiment Reports
==================

LHCb
- Problem with pilot jobs failing at RAL
- A few disk servers at RAL unstable, LHCB waiting for ECHO hardware upgrades before completing migration.

CMS
- Nothing to report

ATLAS
- Computing and Software week
- IC storage decomssioning ongoing
    - Discussion about how to carry out file deletion.
    - Brian to reply to email and organise.
- Local group disk, reminder sent to UK ATLAS to check requirements before cleanup.
- Birmingham migration to EOS ongoing, some issues but Mark on holiday.

Ops meeting information:
https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMeetingWeek180625

Other VOs
- Jeremy had looked at the incubator and had not identified any updates.
- LZ (Elena) finished production stage of mock data challenge
    - Produced 3 months of data
    - Now at reprocessing stage.
- Dune (Andrew) now having weekly meetings with FermiLab Dune
    - Now have a production payload working at Liverpool and Manchester (originally at IC)
    - Storage is now working from FermiLab, allowing 3rd party transfers.
    - Attempting to get 2PB of Storage in the UK.
    - Any site which wants to provide CPU is welcome.
    - https://www.gridpp.ac.uk/wiki/DUNE
    - Elena asks how storage is handled, Andrew states they are running at Manchester outside of space tokens. Discussion about necessity of spacetokens and non-SRM access (xrootd)
- SKA (Jeremy) rucio storage access and ACL issues.
- EUCLID (Jeremy) keen to get access to resources, Jeremy asks if any contact has been made?
- GALDYN (Matt) no contact made so far, Matt will attempt to make contact.


Meetings and Updates
====================

General Updates
---------------

- July pre-GDB: Authorization & Authentication Infrastructure for WLCG.
- 100GbE networking workshop, London, 4th July
- Steve: who knows how to use xrdcp?
    - Steve was able to get it working to and from Manchester, problems may be at the US end.
- 18th June WLCG ops meeting.
- 25th June WLCG ops meeting. 


WLCG Operations Co-ordination
-----------------------------
- Nothing to report


Tier-1 Status
-------------
- Pilot job issue for LHCB appears to be a machine that had run out of SWAP space.
- LHCB disk issues due to old hardware


Storage and Data Management
---------------------------
- Sam asked for peoples opinions on Tier-2 storage for CHEP presentation
- CMS now off Castor and fully on ECHO

Tier-2 Evolution
----------------
- Production testing for VAC LHCB Docker containers
- Universal DIRAC VMs for LHCB and GridPP in testing
- LZ GFAL fix due to upgraded Universal VMs


Accounting
----------
- Update at GDB 


Documentation
-------------
- Server awaiting kernel upgrades before documentation can be migrated.


Interoperation
--------------
- IRIS technical workig group today, attempting to identify interfaces to resources
- Next EGI 9th July
    - Goc information to be update and verified.

Monitoring
----------
- Nothing to report


On-duty
-------
- Nothing to report


Security
--------
- CSIRT Face to Face at Glasgow.
- EOSC-HUB transformations discussed.


Services
--------
- Nothing to report


Tickets
-------

38 Open UK Tickets this week.

BRUNEL 133956 (9/3) requires a kick.

TIER 1 135455 (31/5) closed by Chris
TIER 1 135293 (23/5) can this be closed?

SHEFFIELD 134947 (4/5) better now, ATLAS has DDM issue at present

BRISTOL 134820 (29/4) conversation restarted, can be closed?

RALPP 135552 (7/6) closed by Chris

ECDF 135404 (30/5) put on hold?

Tools
-----
- Nothing to report

VOs
---
- Nothing to report

Discussion
==========

HEPSYSMAN feedback
------------------

Hackathon was useful, examining the nuts and bolts of and Argus server. Steve found this quite useful, rest of HEPSYSMAN was standard site reports.
Matt suggested one of the outcomes was another Hackathon is scheduled at GridPP41 (room booked during PMB).


GridPP6 Position Documents
--------------------------
Will be an invitation to prepare proposal for GridPP6, likely due Feb 2019. Dave Britton would like to understand what our positions are on various areas.
Over the summer period position papers to be prepared on (then discussed at GridPP41):
    - Storage 
    - Experiment Support
    - Security (all aspects)
    - Tier-1
    - Tier-2 (focus on operations/devops)


Storage Position Paper
----------------------
- New technologies (EOS vs Standard interfaces)
- DPM evolution/migration
- Wider role of Storage support positions
    - Value added to project needs to be identified (to justify posts)

GDB
---
Summary from https://indico.cern.ch/event/651354/


Actions & AOB
=============

Actions
-------

*** Remember to update count on actions page when you take minutes. ***

O-171031-01    No update.
O-171031-03    No update.
O-170711-04    No update.
O-170711-07    No update.
O-170131-01    No update.
O-160524-02    No update.
O-161108-00    No update.


AOB
---


Chat Window
=========== 
Linda Ann Cornwall: (26/06/2018 11:03:17)
I agree you are a bit quiet, I've turned the volume up so its O.K. though
Brian Davies @RAL-LCG2: (11:07 AM)
I ll reply to th eemail
Jeremy Coles: (11:07 AM)
https://twiki.cern.ch/twiki/bin/view/LCG/WLCGOpsMeetingWeek180625
https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator
https://www.gridpp.ac.uk/wiki/DUNE
https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel
http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest
Brian Davies @RAL-LCG2: (11:23 AM)
please fill in my keydoc table on hardware for SEs
ill blog it
Steve Jones: (11:25 AM)
WWho are you askeing, Brian?
Andrew McNab?
Brian Davies @RAL-LCG2: (11:26 AM)
Anyone can add examples of storage purchases.
its a wiki...
Jeremy Coles: (11:37 AM)
https://www.gridpp.ac.uk/wiki/Suggestions_for_suitable_hardware_to_run_a_Grid_SE#Examples_of_Hardware_Purchases
Paige Winslowe Lacesso: (11:39 AM)
I've updated the Bristol CMS ticket to ask/suggest it be closed
Peter Gronbech: (11:43 AM)
sorry can't talk now
Jeremy Coles: (11:47 AM)
Sorry Pete... only just saw your message!
https://indico.cern.ch/event/651354/
Paige Winslowe Lacesso: (11:59 AM)
Sorry sorry must go

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel

      Please see attached notes.

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam

    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion 40m
      • HEPSYSMAN feedback
      • GridPP6 position documents
    • 12:20 12:25
      Actions & AOB 5m

      *