RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:34
      Site Operations 4m
    • 13:34 13:35
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Very little to report from CMS.

      I saw the GGUS ticket (https://helpdesk.ggus.eu/#ticket/zoom/1002467) from ATLAS reporting some failing transfers between Antares and Echo, but does not appear to have affected CMS as yet.

      Still on my to-do list: deal with the /unmerged/ files that are not deleted.

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      There were a few issues at CERN that affected the LHCb performance across the grid:

      • Late host certificate renewal on the CERN IAM server
        • We use long-lived proxies, lifetime is usually up to 5 days
        • Host certificate was renewed 1 day before the expiration
        • VOMS ACs have IAM server cert embedded into them.
        • Therefore proxies, generated up to 4 days before renewal, expired simultaneously with the host certificate
          • That caused lost of job and transfer failures
      • Problem with pilot time-management system after new dirac release last Thursday
        • Buggy pilots are unable to do time management properly, causing many job failures
        • The problem is still present to some extent since our pilots are long-lived and some buggy ones still remain in the system.
      • Problems with DIRAC FileCatalogue and SandboxSE this morning
        • Overloaded server
        • Resulted in a spike of completed and rescheduled jobs

       

      On the positive side, RAL Tier-1 performance was OK during the last two weeks.

       

    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      xrootd accounting is fixed, should we ask ALICE to remove excess data from ECHO?

    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Thomas Birkett, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • Running old pipeline (hsc_pdr2_multisite) as dp2_prep has been discontinued due to the excessive storage requirements for a single run - may be returned to once outputs are addressed
      • Blips in jobs due to PanDA going down at SLAC - otherwise RAL is running well for the hsc_pdr2_multisite jobs
      • Raw data still flowing to RAL in preparation for DP2 and DR1 (14TB in last week) - total on Echo now approaching 1PB of 10PB quota
        • Still need to identify the files which are on Echo and not needed - CM team US and UK have been busy dealing with the over subscription of Lancs due to DP2_prep outputs
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:08 14:13
      XRootD Development 5m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:14 14:19
      Utilizing GPUs 5m
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:25 14:26
      AOB 1m
    • 14:27 14:36
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore, Thomas Birkett
    • 14:45 14:50
      Any other Business 5m
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore