Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers.

-- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002

-- The meeting extension is 109308582. PIN 1234

Chair:  Jeremy C

Minutes: Duncan R

Apologies:

Ops meeting 20180814

present: Davies, Traynor, Bauer, Darren, Crooks, Rand, Korolkova, Cole, Cornwall, Slater, Doidge, Lacesso, Raja, Jones, Tony, Brew, Vip, Forti, Gronbech, Frank, .

LHCb (Raja): Echo is down. Opened a couple of tickets regarding data transfers at RAL (https://its.cern.ch/jira/browse/LBCORE-1410 https://indico.cern.ch/event/657662/ ) Working on getting LHCb Dirac working with xrootd.

CMS (Daniela): Tier-1 has problems (Echo).

ATLAS (Elena): Last week there was a report on the Rucio issue. At the WLCG meeting ATLAS reported change in pilot use resulted in …. Tickets: ECDF jobs are failing on RDF wrong protocol is used to access the files. Elena has checked this. Shifter think something is wrongly configured in AGIS. Lancaster: Matt fighting with new DPM head node. Liverpool (136667): deletion errors but DPM expert is away. Anyway now looking resolved. Manchester: problem with a disk server. Fixed. Sheffield: transfers time out, failure rate has decreased significantly. Birmingham: EOS working happily. Now need to move site to EOS, create a new site or move storage in AGIS. RALPP: want to move a CE to SL7. Elena created new SL7 site and queue. RHUL: problem, ticket opened, jobs filling up CREAM CE disk.

WLCG Ops report: many short jobs reducing CMS efficiency. Data loss at KIT. ATLAS: change in pilot

Other VOs (https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator):

DUNE (Raja): trying to get transfers to and from RAL for all sites including FNAL, CERN. EOS doesn’t support http as push. Other 4 sites are Ox, Liv, Mancs, Edinburgh. Will work on Imperial next.
Jeremy: Incubator page suggests attempting to recruit more sites. Also 2PB storage (at RAL, Mancs, Ed?). Not sure what status is at Manchester. Alessandra: DPM http/webdav 3rd party copy should work (works for SKA).

SKA regional centre: test of transformation system (VOI-SKA-007). Daniela: set up a test Dirac server, set up a test transformation system, they have done extensive testing. Latest version does allow multi-VO transfers. As regards the action the testing has been done. SKA transfers from South Africa don’t work - likely to be related to ports for gridftp. SKA (Rohini) have started to look at Panda (BNL instance on Amazon).

T2K (AndyM): made some changes to what is in the VMs.  Seem to be some T2K jobs running.

LSST (Alessandra): jobs running, some effort went into tuning it (~400 jobs/day) they requested to run more. Had issues with Dirac. Wanted to use Manchester SL7 IRIS hardware. Jobs only going to sites with data at QMUL etc. Now waiting to a new release of Ganga to fix a bug. Have run 4000KHS06 hours. Very inefficient jobs. Steve: how bad? Answer 30-40%.

GridPP Dirac SAM tests: AndyM: looks much the same as last time. Mostly OK apart from CERN with VCycle problems.

Bulletin : http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

TBsupport notes: RHUL and KIT also seem to have issues. No update. Dan: Latest UMD release of SL6 version of CREAM doesn’t work.

Tier-1: Echo is down (since Friday night). Were running routine ATLAS+CMS production and adding new hardware and now swapping heavily, causing machines to drop out. Nodes seem to be underspecified with respect to memory.

Services (Duncan): perfSONAR version 4.1 released tomorrow.

Tickets (Matt): See TB-SUPPORT email for details. Lots of IPv6 tickets. Birmingham ready for production with EOS.

GridPP41 agenda: a number of talks still to be scheduled, especially in Tier-2 and Storage sessions (which are based around the position papers for GridPP6):

Deployment: https://docs.google.com/document/d/1pfkcClU3a7eE9TtB-Q40BE9XTxN1y39aWheircDtXjs/edit?usp=sharing
https://docs.google.com/document/d/1zH-R9si2JUhrjHqk7rWKqWSp_BVDV61ql1KvZUj1wbM/edit?ts=5b3deb18

AOB: no AOB


Chat window:

Linda
I'm using the old system, I wasn't able to install the new, and I can hear fine
LA
Jeremy
Anyone not hearing the chat?
JC
Daniela
I'm on Vidyo Connect Fedora28 and it seems to work.
DB
Jeremy
Good.
JC
Daniela
I'm using Vidyo Connect
DB
Mark
Me neither if it matters
MS
Daniela
I can hear everyone.
DB
Raja
I am using vidyowebrtc
R
Chris
I can hear Daniela
CB
Daniela
I'm going to borrow Duncan mike give me a minute
DB
Mark
Now I can hear Daniela!
MS
Raja
That was Linda
R
Chris
Role call?
CB
David
Yeah, that might be a good idea at this point
DC
Mark
Yes!
MS
Linda
Microsoft Windows 10 with the old system
Can hear everyone
LA
Raja
https://its.cern.ch/jira/browse/LBCORE-1410
https://indico.cern.ch/event/657662/
R
Linda
I can hear Daniela
I can still hear
LA
Tony
I can hear Daniela. Using Vidyo Desktop
T
Chris
I can hear, VIdyoDesktop on MacOS
CB
David
I'm on VidyoConnect and can't hear Daniela at 11:11
DC
Andrew
Vidyo Connect. Cannot hear her. macOS. Installed VC yesterday
AM
Daniela
VidyoDesktop doesn't work on my Fedora version
DB
Tony
I changed from Web based Vydio to Desktop because I wasn't able to hear everybody
T
Daniela
I'm just going to chat for the rest of the time, I can hear you all.
@Raja, the indico page you link mist be LHCb sepcific I'm not allowed to see it. I can see the Jira ticekt.
DB
Raja
Quite possibly Daniela. The JIRA ticket stemmed from the presentations made in the indico page - and essentially reflect it.
R
Jeremy
Access denied for me too.
JC
Raja
But this is indeed a LHCbDirac issue rather than DIRAC
R
Daniela
@Raja, no worries, but I like to keep an eye on LHCb DIRAC, so I have an inkling what might come our way.
@Raja, what about Imperial ?
:-)
DB
Alessandra
We tried to start a panda instance at Lancaster but setting it up without even a medium term plan wasn't worth it
AF
Daniela
I think the VAC problem is fixed.
DB
Alessandra
so BNL offered to help
AF
Daniela
t2k currently has about 200 jobs running.
at lot of them seem to finish successfully in VAC, so I think this issue can be closed.
VAC - Glasgow
- was meant to be ==
DB
Linda
sorry I slipped away for a couple of mins but back
LA
Jeremy
Please check your last update of https://www.gridpp.ac.uk/wiki/IPv6_site_status is less than 6 months ago! Thank you.
JC
Paige
sorry must go
PW
Jeremy
https://indico.cern.ch/event/736483/timetable/
Deployment: https://docs.google.com/document/d/1pfkcClU3a7eE9TtB-Q40BE9XTxN1y39aWheircDtXjs/edit?usp=sharing
https://docs.google.com/document/d/1zH-R9si2JUhrjHqk7rWKqWSp_BVDV61ql1KvZUj1wbM/edit?ts=5b3deb18
JC
Today at 12:12 PM

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel

      Please see attached notes.

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam

    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion 40m
      • Contributions to GridPP41
    • 12:20 12:25
      Actions & AOB 5m
      • Move to VidyoConnect: https://home.cern/cern-people/announcements/2018/07/video-conference-vidyoconnect-replace-current-clients