RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30
      Experiment Operational Issues
    • 1
      ATLAS Operations Report
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 2
      CMS Operations Report
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      What to do with pledges? 

      For CMS:

      CPU : 62800 -> 72767 (9967)

      Echo: 7686 -> 9394 (1708)

      Antares: 24528 -> 29438 (4910)

       

      Another period of low efficiency CMS jobs coinciding with long read-times (for small amounts of data). Many T1s see the same. Job failure rate was again fine. CMS CompOps investigated and there is some issue with remote reads. Remote reads were turned off (not sure how the data is then accessed..?). There is also a lot of discussion over the number of cores being used by the jobs.

      As discussed previously on the subject of reading data from Echo using AAA. Reading jobs were timing out because they couldn't get hold of data at RAL. Increasing  the throttling level to allow more connections - IOPS went up higher than we are comfortable with. There is an associated ticket. https://helpdesk.ggus.eu/#ticket/zoom/2837 

    • 3
      LHCb Operations Report
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      NTR, LHCbDIRAC is still down. The plan is to restart it on Monday.

    • 4
      ALICE Operations Report
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 5
      LSST Operations Report
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))

      Permissions are now being organised by Role and not by Group, allowing for more fine-grain control (Thanks Jyothish for the suggestion)

      AuthDb now correct and accepting data and jobs again from LSST

      DC2 run by the CM team at SLAC been failing due to OOM errors. this has finally been found due to a config error on their end that was causing entire data sets to be loaded and assessed rather than the two small patches of sky.

          new config meant it takes 6 mins to do something that was failing after several hours before.

       

      Nalin joined the team from Monday as a graduate, to work on monitoring and testing for the DRP jobs and infrastructure at RAL.

      Now successfully running DC2 weekly jobs at RAL for the first time, including creating and running a 'campaign' from scratch.

       

       

      RAL LSST jobs:

       

    • 14:00
      Tier-1 Projects
    • 6
      Anatares Upgrade

      New EOS nodes
      Tape Robotics downtime

      Speakers: George Patargias, Thomas Byrne
    • 7
      XRootD Development
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 8
      Utilizing GPUs
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45
      AOB
    • 9
      Summary of Operational Status and Issues
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 10
      Any other Business
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore