RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Update on last week's reported SAM test issues:

      1. Timeout failures on svc20 (AAA server) on Friday - Jyothish removed from cluster. Telegraf and Icinga were also down. Jyothish has ticket with Fabric. - UPDATE: was running out of RAM. Jyothish added memory limits which were missing and re-instated to the cluster just before the meeting.
      2. Network problems - continuing with several problem periods throughout the week. 
      3. After 2. the other AAA servers and manager failed 'federation' test fairly consistently since. Restarts of the usual services by Katy and Jyothish has not fixed it. - UPDATE : restarts on the UK redirector helped with this
      4. ARC-CE xrootd-access test requires AAA. - Has not been a problem this week.
      5. New tokens tests for CEs are generally working, but the 'basic' test is in warning due to jobs almost entirely landing on 2018/9 WNs which do not have IPv6 (Tom Birkett might comment). UPDATE: CMS said they are ok with the test being yellow
      6. 'Connection' test for Antares endpoints in warning due to no IPv6 - how are the tests for the new EOS nodes going? UPDATE: perf tests ongoing but some improvement.

      CMS took advantage of other VOs dropping out and claimed a huge number of WNs over the weekend. In general job performance has been good, with just a couple of clear efficiency drops or failure spikes throughout the week. 

      Transfers:

      Periods of excellent transfer rate to buffer and tape. Some file exists errors likely due to network disruption - Katy investigating if clean-up is necessary if auto-mechanism is not effective.

      Disk transfer failures have calmed with Echo as destination (could be other end of transfer in any case). With Echo as source errors still look bad - investigating. 

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues:

      • Network issues affect LHCb transfers (including transfers over LHCOPN), as well as jobs. GGUS ticket.
        • LHCOPN transfers were affected e.g. yesterday due to problems with hostname resolution (e.g. https://fts3-lhcb.cern.ch/fts3/ftsmon/#/job/43c41a4c-40af-11f0-b9e9-fa163e4e8fd9).


      CVMFS:

      • RAL Stratum-1 as an official EESSI repository mirror?
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • Data movement for DC2 now complete thanks to the WN lent for the purpose
      • Working on RC2 data ingestion into the Metadata service now with multiple sections now ingested, but working on others
        • Want to get this done soon for data from pipeline tests to be part of an LSST Technote next month
      • Some issues with SLAC infrastructure have meant Voms server at SLAC not reliable - defaulting to read only voms at lancs
        • There was raised the transition to IAM/VOMS server and it was asked if RAL could run it due to other technical knowledge of the service
      • While data movement to RAL has not been greatly effected, nor jobs LSST have noticed issues with RAL FTS transfers with 500 errors, and its not clear if this is now an FTS issue, or a site core issue that FTS noticed around a month or so ago
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Repack Progress

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
      • The graduate student (Terence Lobo: LM Tom Birkett) is here now -- integration of GPUs into the Tier1 batch farm
      • We need to devise a strategy to work as a team.  
    • 14:45 14:46
      AOB 1m
    • 14:46 14:55
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 14:55 15:00
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore