Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.

-- The London (UK) service is on +442030510622.

-- The meeting extension is 109308582. PIN 1234

Chair:  Matt

Minutes:

Apologies:

GridPP Operations Team Meeting – 26th March 2019


 

Chair: Matt Doidge

Minutes: Vipul Davda

Present: Andrew McNab, Brian Davies, Dan T, Darren M, David C, Alaistair D, Elena, Emanuele, Gareth R, Gordon S, Ian L, Kashif, Me, Winnie, Pete Clarke, Raja, Rob C, Robert F, Sam S, Steve Jones, Teng and Vip Davda.

 

Apologies: Daniela, Alessandra

 

Experiment Problems/Issues


 

LHCB - (Raja)

 

  • Still fixing a few small bits from the jumbo DIRAC update of 10 days

  • Transfers problem between QMUL and CNAF (Italian Tier-1) believed solved now (GGUS:140190)

  • ARC CEs losing track of pilots in ECDF believed solved now (GGUS:140396)

  • Ongoing issue: pilots having thread issues in Glasgow (GGUS:140151)

  • Ongoing issue: LHCb migration to ECHO. Find and fixing minor bugs in testing with latest DIRAC.

 

CMS (Daniela Bauer) – Via Email

  • CMS: Brunel still has problems, Raul is working on it; other sites are fine.

  • The Imperial Phedex had a slight hiccup last night due to the disk being full, that is now fixed. Daniela submitted two tickets about file transfer issues at RAL, they are being worked on. They only affect a tiny bit of data, so the impact for the average user should be zero.

  • All other CMS sites are ok.

 

ATLAS (Elena Korolkova):

 

  • Sussex have request to disable analyse queue

  • RAL requested to disable SL6 queue

  • There was discussion of diskless sites

 

Other VOs (Daniela Bauer) – Via email

 

  • T2K (LFC to DFC): Storage issue at QMUL which is a major T2K site, need to be fixed their storage. Details in:

https://ggus.eu/?mode=ticket_info&ticket_id=138364

 

  • The three small sites (LIV, OX, SHEF) still missing and will be worked on

 

  • MICE (LFC to DFC): This is going much better (less sites, less data).

 

  • LZ changed one of their voms servers. The Operations Portal has updated now. If you support LZ, please check if:

[root@gfe02 ~]# cat /etc/grid-security/vomsdir/lz/voms.hep.wisc.edu.lsc

/DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu

/C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA is up to date.

Meetings and Updates

Please refer to the bulletin at http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest for more details

 

General Updates:

 

 

  • There was a discussion on security and how to improve communication.

 

 

  • Slate:

 

 

Next Tech meeting – Steve Jones will present HTCondor CE

 

  • WLCG ops Coordination

  • Tier1 (Darren Moore): Patching the batch farm. Adding more CPUs and Storage for next year’s pledge

  • Storage and Data Management (Sam) – there was discussion on Xcache at Birmingham

  • Tier2 Evolution – no update

  • Accounting – Please update the benchmarking page

  • Documentation – Changes to VOs and major update to HTCondor CE by Steve Jones

  • Interoperation – no updates

  • Monitoring – no updates

  • On-duty – Kashif is on duty - nothing to report.

  • Roll Out: Batch system

  • Services – no update.

 

  • Security – David Crooks – There was a long discussion the latest security challenge. The challenge started on the Tuesday 12th of March but the email to the sites were not sent until 15th March Friday afternoon, this was not well received. David mentioned that it was not intentional but expected the sites to detect it well before.

    • All sites to complete the report by Friday.

 

  • Tickets – Matt Doidge: There are few Open UK tickets see Latest tickets for more details.

  • Tools – no update

  • VOs – no updates

  • AOB

GridPP42 meeting will be at RAL, please register - https://indico.cern.ch/event/780766/timetable/?view=standard

 

Group Chat

Matt

Vip is taking minutes - thanks Vip!

https://indico.cern.ch/event/803629/

(also David is recording - thanks David!)

MD

Elena

https://ggus.eu/index.php?mode=ticket_info&ticket_id=140103

https://ggus.eu/index.php?mode=ticket_info&ticket_id=140350

https://ggus.eu/index.php?mode=ticket_info&ticket_id=139723

https://ggus.eu/index.php?mode=ticket_info&ticket_id=140134

E

Daniel

i keep on finding other things todo but do need to start the move

DT

Matt

https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

MD

Elena

/etc/grid-security/vomsdir/lz/voms.hep.wisc.edu.lsc<br><br>(currently: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Service/CN=voms.hep.wisc.edu<br>/DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1)<br><br>with<br>/DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu<br>/C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA

EK

Matt

https://indico.cern.ch/event/759388/

M

Dewhurst

The joint High Energy Physics Software Foundation, Open Science Grid and Worldwide Large Hadron Collider Computing Grid 2019 Workshop

D

Elena

etc/grid-security/vomses/lz<br>should now read:/etc/vomses/lz <br>"lz" "voms.hep.wisc.edu" "15001" "/DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu" "lz" "24"<br>"lz" "lzvoms.grid.hep.ph.ic.ac.uk" "15001" "/C=UK/O=eScience/OU=Imperial/L=Physics/CN=lzvoms.grid.hep.ph.ic.ac.uk" "lz" "24"

EK

David

The talk which Pete was referencing: https://indico.cern.ch/event/759388/sessions/295225/attachments/1813716/2963439/WLCGEvolutionJLAB.pdf

SLATE talk: https://indico.cern.ch/event/759388/contributions/3361774/attachments/1815564/2967154/Central_Ops_with_SLATE_and_PRP_3.pdf

https://indico.cern.ch/event/759388/sessions/295063/#20190321

(all that days sessions)

DC

Matt

https://indico.cern.ch/event/780766/<--GridPP42

MD

Elena

https://indico.cern.ch/event/770307/contributions/3301647/attachments/1807906/2951426/Deploying_Services_with_SLATE_1.pdf


 

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel

      From Daniela: CMS: Brunel still has problems, Raul is working on it, the other sites look fine.
      The Imperial Phedex had a slight hiccup last night due to the disk being full, that is now fixed. I submitted two tickets about file transfer issues at RAL, they are being worked on. They only affect a tiny bit of data, so the impact for the average user should be zero.
      Apart from Brunel, which is understood, all CMS sites look good in the monitoring.

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      Also from Daniela:
      *T2K (LFC to DFC): We really really need QMUL which is a major T2K site to deal with their storage. Details in:
      https://ggus.eu/?mode=ticket_info&ticket_id=138364
      We haven't quite got round to the three small sites (LIV, OX, SHEF) still missing (because I spend all my time setting up an IRIS cloud), but we haven't forgotten.

      *MICE (LFC to DFC): This is going much better (less sites, less data).

      *LZ changed one of their voms servers. The Operations Portal has updated now. If you support LZ, please check if:
      [root@gfe02 ~]# cat /etc/grid-security/vomsdir/lz/voms.hep.wisc.edu.lsc
      /DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu
      /C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA
      is up to date.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam
    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion topics 40m
      • February GDB: https://indico.cern.ch/event/739875/
      • Site roundtable.
    • 12:20 12:25
      Actions & AOB 5m