Operations team & Sites

EVO - GridPP Operations team meeting

EVO - GridPP Operations team meeting


- This is the weekly GridPP ops & sites meeting

- The intention is to run the meeting in Vidyo: https://vidyoportal.cern.ch/flex.html?roomdirect.html&key=zXhsqAxVnaT6

-- The PIN is 1234. To join via phone see http://information-technology.web.cern.ch/services/fe/howto/users-join-vidyo-meeting-phone for dial in numbers.

-- The London (UK) service is on +44 (0)161 306 6802. Phone bridge ID 1001002

-- The meeting extension is 109308582. PIN 1234

Chair:  Alessandra F

Minutes: Ian L or Chris B

Apologies: Jeremy C, Elena K

    • 11:00 11:01
      Ops meeting minutes 1m
      • This is a reminder that this is an important task. The minute taker gives access to the discussions for those not present and provides a reference for others to refer back to afterwards.

      • The team composition has been changing. If everybody contributes then the task comes around less often.

      • Please extract actions from the meeting and add them to our table here: https://www.gridpp.ac.uk/wiki/Operations_Team_Action_items#Action_list.

      • Recent allocations: See above link. The page should be updated each week by the minute taker (if they don't the task will keep coming to them!).

      • Upcoming allocations:

    • 11:01 11:20
      Experiment problems/issues 19m

      Review of weekly issues by experiment/VO

      • LHCb
      • Echo at RAL is still closed for lhcb. Minor issues are being handled by GGUS tickets.

      • CMS
        T1: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL
        T2: https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T2_UK_London_Brunel

      Please see attached notes.

      • ATLAS

      From Elena:

      1. Configuring ECDF’s new storage
      2. DDM configuration was confirmed with AGIS experts.
      3. HC tests was OK and after discussion UKI-SCOTGRID-ECDF-RDF was put in production. Jobs started to fail(too many stakeout failures). Set in TestOnly mode again. HC jobs are successful again.

      It is not clear why real jobs tried to access the SRM though. The only configured mover is xrootd.

      1. LocalGroupDisk consolidation
      2. most >2year old datasets removed 4 July. Should free 488 TB. Tim will check the rest.
      3. Tim will check the default lifetime in R2D2 and whether an email reminder can be sent to users.

      4. Pledge data on Peter's ukdata page needs updating (Peter)

      5. BHAM migration to EOS, Mark requests to move now

      6. see ADCINFR-87
      7. Bham queue are set to write output to Manchester storage. Jobs are running successfully.
      8. Cedric is cleaning DATADISK and move LOCALGROUPDISK to Manchester. SCRATCHDISK removed.
      9. Bham storage will be switched to EOS after all files are deleted from DATADISK

      10. Singularity enabled at LANCS, configure and test
        Needed to upgrade Singularity. Should work now.

      11. RALPP SL7 migration (Elena)
        SL7 queue is now working, jobs successfully running

      12. FTS to Edinburgh (Rob)

      13. Very flaky performance with GridFTP, which fails when too many transfers coming in.
      14. Requested to limit max simultaneous connections for FTS in AGIS for ECDF-RDF, but can do ECDF as well.
      15. Tim will make the change.

      16. Moving UK Cloud to Harvester (Peter/Alessandra)

      17. Plan was for migration this week (10-14 Sep). U queues are already in panda.

      18. Other: Updates should be recorded in https://www.gridpp.ac.uk/wiki/GridPP_VO_Incubator.

      19. GridPP DIRAC status [Andrew McNab]
        -- https://www.gridpp.ac.uk/gridpp-dirac-sam

    • 11:20 11:40
      Meetings & updates 20m

      With reference to: http://www.gridpp.ac.uk/wiki/Operations_Bulletin_Latest

      • General updates
      • WLCG ops coordination
      • Tier-1 status
      • Storage and data management
      • Tier-2 Evolution
      • Accounting
      • Documentation
      • Interoperation
      • Monitoring
      • On-duty
      • Security
      • Services
      • Tickets
      • Tools
      • VOs
      • Site updates
    • 11:40 12:20
      Discussion topics 40m
      • The future (and need) of the BDII (Alastair to lead)

      • Tier-2 and storage background documents for GridPP6.

      • Further contributions and timeline (the aim is to have the drafts completed by the end of September).

      • Updating https://www.gridpp.ac.uk/wiki/IPv6_site_status.

      • See also https://twiki.cern.ch/twiki/bin/view/LCG/WlcgIpv6.

      The WLCG IPv6 statement in September 2017 was:
      * The WLCG management and the LHC experiments approved several months ago (+) a deployment plan for IPv6 (++) which requires that:
      ** all Tier-1 sites provide dual-stack access to their storage resources by April 1st 2018
      ** all Stratum-1 and FTS instances for WLCG need to be dual-stack by April 1st 2018
      ** the vast majority of Tier-2 sites provide dual-stack access to their storage resources by the end of Run2 (end of 2018).

    • 12:20 12:25
      Actions & AOB 5m
      • Move to VidyoConnect: https://home.cern/cern-people/announcements/2018/07/video-conference-vidyoconnect-replace-current-clients