Operations team & Sites
EVO - GridPP Operations team meeting
- This is the weekly GridPP ops & sites meeting
- The intention is to run the meeting in VidyoConnect: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6
-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone.
-- The London (UK) service is on +442030510622.
-- The meeting extension is 109308582. PIN 1234
Chair: Matt
Minutes:
Apologies:
GridPP Operations Team Meeting – 26th March 2019
Chair: Matt Doidge
Minutes: Vipul Davda
Present: Andrew McNab, Brian Davies, Dan T, Darren M, David C, Alaistair D, Elena, Emanuele, Gareth R, Gordon S, Ian L, Kashif, Me, Winnie, Pete Clarke, Raja, Rob C, Robert F, Sam S, Steve Jones, Teng and Vip Davda.
Apologies: Daniela, Alessandra
Experiment Problems/Issues
LHCB - (Raja) –
-
Still fixing a few small bits from the jumbo DIRAC update of 10 days
-
Transfers problem between QMUL and CNAF (Italian Tier-1) believed solved now (GGUS:140190)
-
ARC CEs losing track of pilots in ECDF believed solved now (GGUS:140396)
-
Ongoing issue: pilots having thread issues in Glasgow (GGUS:140151)
-
Ongoing issue: LHCb migration to ECHO. Find and fixing minor bugs in testing with latest DIRAC.
CMS (Daniela Bauer) – Via Email
-
CMS: Brunel still has problems, Raul is working on it; other sites are fine.
-
The Imperial Phedex had a slight hiccup last night due to the disk being full, that is now fixed. Daniela submitted two tickets about file transfer issues at RAL, they are being worked on. They only affect a tiny bit of data, so the impact for the average user should be zero.
-
All other CMS sites are ok.
ATLAS (Elena Korolkova):
-
Outstanding Tickets:
-
Lancaster issue is on hold because of the ongoing IPV6 infrastructure problems – ON HOLD: https://ggus.eu/index.php?mode=ticket_info&ticket_id=140103
-
Liverpool DPM server keep crashing of RAID controller - RESOLVED https://ggus.eu/index.php?mode=ticket_info&ticket_id=140350
-
RAL – Upload to Scratch disk fails because of permission – RELSOVED https://ggus.eu/index.php?mode=ticket_info&ticket_id=139723
-
Oxford – one of the DPM SE crashed. System is back up gain, however, still there deletion issues. https://ggus.eu/index.php?mode=ticket_info&ticket_id=140134
-
-
Sussex have request to disable analyse queue
-
RAL requested to disable SL6 queue
-
There was discussion of diskless sites
Other VOs (Daniela Bauer) – Via email
-
T2K (LFC to DFC): Storage issue at QMUL which is a major T2K site, need to be fixed their storage. Details in:
https://ggus.eu/?mode=ticket_info&ticket_id=138364
-
The three small sites (LIV, OX, SHEF) still missing and will be worked on
-
MICE (LFC to DFC): This is going much better (less sites, less data).
-
LZ changed one of their voms servers. The Operations Portal has updated now. If you support LZ, please check if:
[root@gfe02 ~]# cat /etc/grid-security/vomsdir/lz/voms.hep.wisc.edu.lsc
/DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu
/C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA is up to date.
Meetings and Updates
Please refer to the bulletin at http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest for more details
General Updates:
-
Joint HSF/OSG/WLCG Workshop HOW2019 see
-
https://indico.cern.ch/event/759388/sessions/295063/#20190321 for details
-
There was a discussion on security and how to improve communication.
-
Discussion on WLCG about talking to wider community: https://indico.cern.ch/event/759388/sessions/295225/attachments/1813716/2963439/WLCGEvolutionJLAB.pdf
-
Slate:
Next Tech meeting – Steve Jones will present HTCondor CE
-
WLCG ops Coordination –
-
Tier1 (Darren Moore): Patching the batch farm. Adding more CPUs and Storage for next year’s pledge
-
Storage and Data Management (Sam) – there was discussion on Xcache at Birmingham
-
Tier2 Evolution – no update
-
Accounting – Please update the benchmarking page
-
Documentation – Changes to VOs and major update to HTCondor CE by Steve Jones
-
Interoperation – no updates
-
Monitoring – no updates
-
On-duty – Kashif is on duty - nothing to report.
-
Roll Out: Batch system
-
Services – no update.
-
Security – David Crooks – There was a long discussion the latest security challenge. The challenge started on the Tuesday 12th of March but the email to the sites were not sent until 15th March Friday afternoon, this was not well received. David mentioned that it was not intentional but expected the sites to detect it well before.
-
All sites to complete the report by Friday.
-
-
Tickets – Matt Doidge: There are few Open UK tickets see Latest tickets for more details.
-
Tools – no update
-
VOs – no updates
-
AOB
GridPP42 meeting will be at RAL, please register - https://indico.cern.ch/event/780766/timetable/?view=standard
Group Chat
Matt Vip is taking minutes - thanks Vip! https://indico.cern.ch/event/803629/ (also David is recording - thanks David!) MD Elena https://ggus.eu/index.php?mode=ticket_info&ticket_id=140103 https://ggus.eu/index.php?mode=ticket_info&ticket_id=140350 https://ggus.eu/index.php?mode=ticket_info&ticket_id=139723 https://ggus.eu/index.php?mode=ticket_info&ticket_id=140134 E Daniel i keep on finding other things todo but do need to start the move DT Matt https://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest MD Elena /etc/grid-security/vomsdir/lz/voms.hep.wisc.edu.lsc<br><br>(currently: /DC=org/DC=opensciencegrid/O=Open Science Grid/OU=Service/CN=voms.hep.wisc.edu<br>/DC=org/DC=cilogon/C=US/O=CILogon/CN=CILogon OSG CA 1)<br><br>with<br>/DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu<br>/C=US/O=Internet2/OU=InCommon/CN=InCommon IGTF Server CA EK Matt https://indico.cern.ch/event/759388/ M Dewhurst The joint High Energy Physics Software Foundation, Open Science Grid and Worldwide Large Hadron Collider Computing Grid 2019 Workshop D Elena etc/grid-security/vomses/lz<br>should now read:/etc/vomses/lz <br>"lz" "voms.hep.wisc.edu" "15001" "/DC=org/DC=incommon/C=US/ST=WI/L=Madison/O=University of Wisconsin-Madison/OU=OCIS/CN=voms.hep.wisc.edu" "lz" "24"<br>"lz" "lzvoms.grid.hep.ph.ic.ac.uk" "15001" "/C=UK/O=eScience/OU=Imperial/L=Physics/CN=lzvoms.grid.hep.ph.ic.ac.uk" "lz" "24" EK David The talk which Pete was referencing: https://indico.cern.ch/event/759388/sessions/295225/attachments/1813716/2963439/WLCGEvolutionJLAB.pdf https://indico.cern.ch/event/759388/sessions/295063/#20190321 (all that days sessions) DC Matt https://indico.cern.ch/event/780766/<--GridPP42 MD Elena |