RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Apologies from Katy - attending the OTF.

      Good SAM tests this week. A short period of failures Tuesday evening on Antares, due to filling of buffer. There was a big spike in both write successes and write failures - CMS still hit a very high throughput to the buffer and files were quickly archived. 

      SAM test warnings fixed for the 'squid' tests - thanks Alex for tracking this!

      Still seeing the 'basic' token test in warning when it lands on 18/19 WN tranches (currently around 50% of the time) - this is due to missing IPv6 on those nodes, but CMS seems happy to keep the test in warning. 

      Good performance for CMS jobs - best CPU efficiency among T1s for this week again!

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      News:

      • LHCb is going to run some test MC productions on ARM
        • at CERN and Glasgow

       

      Operational issues:

      • Jobs failed to get files from Echo via local gateways (GGUS 683524).
        • connectivity issue between 201[89] gens and new storage node, fixed
        • Last Friday a slow OSD also contributed to this issue
          • It was removed from the cluster, that fixed the issue
      • Failed uploads to ECHO last Friday (GGUS 683588).
        • One of the gateways became problematic due to stuck connections. It was fixed by restarting the gateway. Ticket closed.
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))

       

      • RC2 runs have not been succeeding due to no outputs from jobs after 2 hours
        • Think this may boil down to a similar issue we saw before with the Quantum Graph issue (if not exactly the same issue) 
          • not pulling data into WN, just remote reading small portions over and over and over (peak was over 250,000 times for one job)
        • Coordinating with Jyothish for the XRootD monitoring that died to investigate file by file access patterns
      •  
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Repack Progress

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)

      streamed checksum data collection in progress, ceph-gw9 in prep for getting added to prod

       

    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45 14:46
      AOB 1m
    • 14:46 14:55
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 14:55 15:00
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore