RAL Tier1 Experiments Liaison Meeting

Access Grid (RAL R89)

Access Grid



Please attend via the following Zoom meeting:



    • 13:38 13:39
      Major Incidents Changes 1m
    • 13:39 13:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)) , Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:40 13:41
      GGUS /RT Tickets 1m


    • 13:41 13:42
      Site Availability 1m




    • 13:42 13:43
      Experiment Operational Issues 1m
    • 13:44 13:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)) , Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      New Rucio RSE: RAL-LCG2-ECHO_TEST for ATLAS FT of webdav transfers: points to /atlas:test/
      Initial set of transfers looks ok.

      Checksums from Xrootd:  xrootd-ceph does not by default add XrdCks.adler32 as xattr
      May cause issues in, for example, FTS transfers, where DEST wants to verify checksum.
      Propose to add this as default behavious

      Oxford almost ready to start initial tests with Xcache; reminder that RAL will serve as the endpoint.
      RAL will need to accept Ox certificate for forwarding proxy with read-only access to ATLAS data.


      Are there continuing issues with the Arc-CE's ? Occasional drops in jobs  / changes in (idle) jobs. Not easily explained by ATLAS observations.

    • 13:46 13:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      SAM tests on the AAA proxies and manager are no longer failing. AAA was quiet over the weekend so all SAM tests were green.

      AAA became more busy since yesterday, and interestingly the maximum cap, which previously always appeared to be ~1GB/s is now going to ~1.5GB/s. This may be a result of my change to the throttling last week, which I increased from 200 to 400 concurrent connections on ceph-gw10 only. Puzzlingly, both gw10 and gw11 seem to have increased their max rate equally. I would like to play further with this setting, although I do have 2 observations (perhaps related to each other) on making such changes: 1. They don't seem to come into use for some days after making the change. 2. I am not sure if there is some particular order I should be restarting services on the various redirectors. 

      Anyway, some of the WN tests are failing intermittently, and yesterday these were enough (10% or more) to make the whole day fail site readiness. It's not just the WN-xrootd-access test failing, although this is one. Also seeing a couple of failures on WN-analysis and CONDOR-jobSubmit.

    • 13:48 13:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))


      1. Low number of running jobs
          a. Likely just a fairshare issue
          b. Would like to have some documentation about this
      2. Streaming issue for user jobs
          a. Waiting for development of fix
              . What is the time scale (in the 2nd week of February now)
          b. Testing lower value of block size as mitigation
              . Not sure if it has helped. Checks ongoing
      3. Failing user jobs not able to open files
          a. Related to the GGUS that was opened in December and closed in January
          b. Edge case for what was fixed within LHCb software in January
             . Run I simulation, which cannot be fixed in this method (uses xroot v3.x)


      No complaints / running reasonably smoothly at RAL

    • 13:52 13:53
      VO Liaison Others 1m
    • 13:53 13:54
      Experiment Planning 1m
    • 13:54 13:55
      Dune/protoDune 1m
    • 13:55 13:56
      Euclid 1m
    • 13:56 13:57
      SKA 1m
    • 13:57 13:58
      AOB 1m
    • 13:58 13:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)) , Darren Moore (Science and Technology Facilities Council STFC (GB))