RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 1
      Site Operations
    • 13:34
      Experiment Operational Issues
    • 2
      ATLAS Operations Report
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 3
      CMS Operations Report
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Generally smooth running while Katy was away for the last weeks. Observed a few periods of failures of tests on the AAA machines, seems to particularly affect the older machines (ceph-gw10/11). Still waiting for the old Antares machines to be recommissioned for AAA (ticket on this is well over a year old).

      Since the last weekend CMS SAM tests are intermittently failing the ARC-CE/token tests, at sites around the world. This seems a recurring problem, and one I have repeatedly reported to those who run the tests since the test has been running. I have taken CMS out of drain at RAL twice this week so far. 

      For production jobs and transfers, everything looks good. 

      Reminder to self: Still to look at the /unmerged/ problem.

    • 4
      LHCb Operations Report
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues:

      • lbprod DB outage at CERN yesterday afternoon
        • Caused massive job failures across the grid
      • High failure rate for MCFastSimulation jobs
        • Jobs incorrectly asses how much time they need to execute
        • Fix applied, but it only affects new productions
        • Some old ones are still present in the system, so we have to wait for them to disappear
          • Failure rate is still a bit high, but gradually decreasing
    • 5
      ALICE Operations Report
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
      • FYI: nice new site dashboard from Alice -- https://alimonitor.cern.ch/sitedashboard/
      • Data migration is ongoing, quota excess is decreasing.
      • Alice requested mlsensor installation
        • Seems to be local xrootd monitoring collector, i am looking at its documentation and  source code
    • 6
      LSST Operations Report
      Speakers: Thomas Birkett, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • Data transfer between RAL and LANCS completed
        • Rucio no longer source of truth - Number of files moved equal to RAL, but Rucio claims datasets not 'OK'
          • Investigating, most likely not enough daemons in dev deployment
        • FTS connection number increased to 120 between RAL and LANCS on the LSST FTS (was set much lower for some reason)
        • Deletion campaign to follow with new connection number and less intervention
      • Jobs running well - new data transfered and ingested and subsequent jobs run on new data:
      • Data into RAL 15TB in last week
      • Data out of RAL to LANCS at a rate of 614kB/s due to small file size of 1.27 million files for 4 TB - average file size of ~30KB
    • 14:00
      Tier-1 Projects
    • 7
      XRootD Development
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 8
      Utilizing GPUs
      Speakers: Dr Brij Kishor Jashal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:25
      AOB
    • 9
      Summary of Operational Status and Issues
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore, Thomas Birkett
    • 10
      Any other Business
      Speakers: Brian Davies (Science and Technology Facilities Council STFC (GB)), Darren Moore