RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:34
      Site Operations 4m
    • 13:34 13:35
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Generally smooth running while Katy was away for the last weeks. Observed a few periods of failures of tests on the AAA machines, seems to particularly affect the older machines (ceph-gw10/11). Still waiting for the old Antares machines to be recommissioned for AAA (ticket on this is well over a year old).

      Since the last weekend CMS SAM tests are intermittently failing the ARC-CE/token tests, at sites around the world. This seems a recurring problem, and one I have repeatedly reported to those who run the tests since the test has been running. I have taken CMS out of drain at RAL twice this week so far. 

      For production jobs and transfers, everything looks good. 

      Reminder to self: Still to look at the /unmerged/ problem.

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues:

      • lbprod DB outage at CERN yesterday afternoon
        • Caused massive job failures across the grid
      • High failure rate for MCFastSimulation jobs
        • Jobs incorrectly asses how much time they need to execute
        • Fix applied, but it only affects new productions
        • Some old ones are still present in the system, so we have to wait for them to disappear
          • Failure rate is still a bit high, but gradually decreasing
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
      • FYI: nice new site dashboard from Alice -- https://alimonitor.cern.ch/sitedashboard/
      • Data migration is ongoing, quota excess is decreasing.
      • Alice requested mlsensor installation
        • Seems to be local xrootd monitoring collector, i am looking at its documentation and  source code
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Thomas Birkett, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • Data transfer between RAL and LANCS completed
        • Rucio no longer source of truth - Number of files moved equal to RAL, but Rucio claims datasets not 'OK'
          • Investigating, most likely not enough daemons in dev deployment
        • FTS connection number increased to 120 between RAL and LANCS on the LSST FTS (was set much lower for some reason)
        • Deletion campaign to follow with new connection number and less intervention
      • Jobs running well - new data transfered and ingested and subsequent jobs run on new data:
      • Data into RAL 15TB in last week
      • Data out of RAL to LANCS at a rate of 614kB/s due to small file size of 1.27 million files for 4 TB - average file size of ~30KB
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:08 14:13
      XRootD Development 5m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:14 14:19
      Utilizing GPUs 5m
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:25 14:26
      AOB 1m
    • 14:27 14:36
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore, Thomas Birkett
    • 14:45 14:50
      Any other Business 5m
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore