Operations team & Sites

Europe/London
EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting

Description

- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers.

-- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002

-- The meeting extension is 109308582. PIN 1234

Chair:  Jeremy C

Minutes: Ian L

Apologies: David C, Daniela B, Andrew M

-- LHCb: NTR

-- CMS: SSB problems last week, campaign to update FTS. CMS testing singularity which sends falls alarms.

-- ATLAS: several tickets. Problem for transfers at Lancaster with missing files at the source. Declaring files missing solves the problem, but we should involve DDM support. Problem with transfers timeouts at 3 sites. They have DPM on SL7 and there is a JIRA ticket for DPM for this. GGUS tickets for sites have been updated with this information. JIRA ticket has a low priority. We should ask the DPM developers to increase it. The bug puts a strain on the network too. Advise at the moment is not to upgrade the head node until this is fixed.

-- For DUNE things are starting to be up and running. They are not going to use ganga.

-- Euclide: actions from 2016 to be closed.

-- Ligo: 1 open action from 2017 to have a secure CVMFS. Glasgow has interested in having news about what is happening, because they have local people who might want to run too.

-- LSST: UK people are setting up their CVMFs are in /cvmfs/gridpp.egi.eu because there is not yet a clear direction. So instead of setting up yet another repo we use something existing. LSST-Panda also should start to test the sites this week or the next in preparation of what they want to do in June. AF isn't updating the wiki because there are no running jobs to talk about.

-- LZ: running mock data challenge 2 smoothly

-- NA62: seems to be running ok.

-- SKA: problems with high memory jobs. First rucio tests were successful and transferred several TB, there was a problem at QMUL, Dan says he hopefully fixed it but needs some feedback. There are also still some problems with checksums.

-- SNO+: using RAL

-- SuperNemo: no news

-- Storage and data management

* Alastair will give a talk on rucio at the storage meeting.

-- Feedback from Hepix: Jupiter notebooks sites are setting up clusters to do this type of analysis and IPv6. Nebraska that was known to work since forever, has instead issues, it is not a done deal setting up a site.

Jupiter notebooks will be used by SKA a lot and ATLAS is also planning to expand the usage and they users are tlaking about having kubernetes clusters on the grid.

-- Documentation: debate of wiki vs github.

There are minutes attached to this event. Show them.
    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel

      Please see attached notes.

      • ATLAS

      • Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      • GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam

    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion 40m
      • HEPiX
        ** Immediate observations/feedback from HEPiX: https://indico.cern.ch/event/676324/timetable/#all.detailed
        ** List of tracks: https://indico.cern.ch/event/676324/program.

      • Documentation
        ** Is GitHub or the GridPP wiki the right place?

    • 12:20 12:25
      Actions & AOB 5m
      • https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items