Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.

-- The London (UK) service is on +442030510622.

-- The meeting extension is 109308582. PIN 1234

Chair:  Matt

Minutes:

Apologies:

Videoconference Rooms
GridPP-Operations
Name
GridPP-Operations
Description
- This is the weekly GridPP ops & sites meeting - The intention is to run the meeting in EVO: http://evo.caltech.edu/evoGate/. Join the meeting in the Janet(UK) Community area. Direct link http://evo.caltech.edu/evoNext/koala.jnlp?meeting=MDMaM82v2nD2Du999sD99D - The phone bridge number is +44 (0)161 306 6802. The phone bridge ID is 1001002 with code: 4880. Apologies:
Extension
109308582
Owner
Alessandra Forti
Auto-join URL
Useful links
Phone numbers

Minutes 30/4/2019
=================

Present:
========
* Alessandra Forti
* Andrew McNab
* Brian Davies
* Chris Brew
* Daniel Traynor
* Daniela Bauer
* Darren Moore
* David Crooks
* Elena Korolkova
* Emanuele Simili
* Gareth Roy
* Gordon Stewart
* Ian Loader
* John Hill
* Kashif Mohammad
* Linda Cornwall
* Matt Doidge
* Paige Lacesso
* Pete Clarke
* Pete Gronbech
* Raja Nandakumar
* Robert Frank
* Sam Skipsey
* Steve Jones
* Teng

LHCB Report
===========
* Problems with aborted Pilots at ECDF
* Another storage pool moved from CASTOR to ECHO at RAL, this leaves only the USER space to move

CMS
===
* Brunel has data transfer problems, nothing else to report for CMS>

ATLAS
=====
* Brunel having problem with file deletions for ATLAS. BD says this is an issue with WebDAV being unstable.
* RHUL has an issue with Squid servers being shown as red when on of the HA servers is available


VO Incubator
============
* Discussion of DUNE queues at Lancaster, jobs being submitted to old SL6 queue. MD to email AMcN to have queues moved over.
* LFC to DFC migration for T2K and Mice ongoing, progress being made by DB and SF


Meetings and Update
===================


General Updates
---------------
* Discussion of GridPP42 and associated presentations.
* Consultation on GridPP43 potential dates and locations (nominally located at Ambleside 20-22nd August).
    - Problems were raised in respect to the timing being close to the end of the summer holidays.
    - PC encouraged emails to DB to raise any questions/concerns.

* Technical meeting regarding DPM and future within the UK.
    - AF comments that many sites are planning to move from DPM (whether as smaller sites or moves to different storage solutions).
    - KM comments that Oxfords plan is to update to DOME.

* HEPSYSMAN
    - Registration now appears to be open for HEPSYSMAN.
    - DC asks if anyone has security topics to cover as part of the training if the could let him know.

* TB-SUPPORT discussion about SW areas and whether or not they are needed. It appears that now all VOs are either using CMVFS or containers so it is unlikely that a SW area is needed.


Tier-1 Status
-------------
* High CMS failures seem but this appears to have improved.


Storage
-------
* no report


Tier-2 Evo
----------
* no report


Accouting
=========
* no report


Documentation
=============
* SJ will check Fermilab VO information


Interop
=======
* no report


Monitoring
==========
* no report


On Duty
=======
* DB on duty, nothing to report


Security
========
* Nothing ongoing that sites need to be concerned about.
* Dockerhub breach for sites that may have been using this for any reason.
* Trust anchors have been updated, sites should install as soon as they are able.


Services
========
* no report


Tickets
=======
* 131608 - Needs an update as being escalated to VO Manager
* 139101 - no news
* 140679 - ongoing


Discussion & AOB
================
* Site Round Table

Manchester     - going into downtime to upgrade DPM headnode
RALPP        - nothing to report
QMUL        - nothing to report
IC        - preparing for the move to Slough and IRIS cloud
Sheffield    - nothing to report
Glasgow        - CEPH and HTCondor-CE
Cambridge    - nothing to report
Oxford        - SL6 CE now in downtime to be retired (CentOS7 only)
Lancaster    - Robin leaving Lancaster
Bristol        - Condor and OS upgrades.
Liverpool    - Attending EGI to talk about CREAM-CE migration
Edinburgh    - working on a second CE as current is having problems


* HEPSYSMAN - registration open please register if you'd like to attend.

* GridPP6 - PC updates on current status, panel to take place next week.
* PC congratulates GridPP on it's ability to aid LSST and getting it's payloads running via GridPPs infrastructure.
* Indigo IAM vs EGI Check-In as a SSO solution, both AF and DC felt that IAM was a better solution then the EGI Check-In system.

 

Chat
====

Matt Doidge: (30/04/2019 11:03)

Gareth is kindly taking minutes

Elena Korolkova: (11:06 AM)

https://ggus.eu/index.php?mode=ticket_info&ticket_id=140848

https://ggus.eu/index.php?mode=ticket_info&ticket_id=140890

Matt Doidge: (11:12 AM)

https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

David Crooks: (11:31 AM)

https://success.docker.com/article/docker-hub-user-notification

Raja: (11:33 AM)

Apologies - got to go now

Paige Winslowe Lacesso: (11:33 AM)

apologies back soon

John Hill: (11:43 AM)

https://wiki.egi.eu/wiki/PROC12

David Crooks: (11:43 AM)

I was just about to post that as well :)

Alessandra: (11:51 AM)
>14k

 

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      Also from Daniela:
      *T2K (LFC to DFC): We really really need QMUL which is a major T2K site to deal with their storage. Details in:
      https://ggus.eu/?mode=ticket_info&ticket_id=138364
      We haven't quite got round to the three small sites (LIV, OX, SHEF) still missing (because I spend all my time setting up an IRIS cloud), but we haven't forgotten.

      *MICE (LFC to DFC): This is going much better (less sites, less data).

      *LZ changed one of their voms servers. The Operations Portal has updated now. If you support LZ, please check if:
      [root@gfe02 ~]# cat /etc/grid-security/vomsdir/lz/voms.hep.wisc.edu.lsc
      /DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu
      /C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA
      is up to date.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam
    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion topics 40m
      • February GDB: https://indico.cern.ch/event/739875/
      • Site roundtable.
    • 12:20 12:25
      Actions & AOB 5m