RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Antares tests in yellow warning mode was 'fixed' at the test end by tweaking the evaluation of the test, to make it more appropriate for sites with low numbers of links (such as RAL and CERN tapes). Has been green since. 

      Busy period writing to tape. 

      Another period of low efficiency CMS jobs doing mostly I/O only. Many T1s see the same. Job failure rate was fine. 

      Some issues with other sites reading data from Echo using AAA. Reading jobs were timing out because they couldn't get hold of data at RAL; Jyothish (again) increased the throttling level to allow more connections...but this time it seems too many for Echo to easily process. Remember that CMS jobs are likely making small, sparse reads. The IOPS went up higher than we are comfortable with. There is an associated ticket. https://helpdesk.ggus.eu/#ticket/zoom/2837 

       

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      News:

      • LHCbDIRAC downtime next week, drain starts this Friday
        • Database upgrade, may take up to 1 week (though more likely a few days)
      • Token-based ETF tests are being added to LHCb testing framework
        • Some issues so far, though we can expect working tests from the preprod ETF machine soon


      Operational issues:

      • A lot of failed transfers to/from ECHO last Friday, due to ceph roll-back
      • Repeated yesterday, but the scale was significantly smaller
      • Some failed jobs, due to problems with CERN-EOS-PILOT, not RAL's fault.

       

    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))

      New IngestD versions deployed for autmated ingestion when data movement happens for the CCMS1 test.

      Tim working on getting next dataset RC2 to RAL to make sure we have all currently used datasets at RAL (got access to Lancs UI to get various meta data and exports from the butler). Once this is in place, a tentative intention to alternate the testing of weekly pipeline builds (i.e., the actual processing jobs) between Lancs and RAL for testing. 

      DC2 Dataset now at RAL and after a brief hicup with a bad file that I quickly found and replaced.

      Step 1 Of the pipeline was completed over the weekend.
      Brian Yanny is now on leave, so passed responsibility to Jen Adelman-McCarthy who I am working with.

      Step 2 has failed to start with an out of memory error, but I think that was a miscommunication between CM team, and I believe can be resolved on their side
      BUT, the creating of the mapping for the step has been noted to take more memory and time than at other sites

      For some reason the pipetaskInit took 1.5 hours and used 6 cores and lots of memory, even though usually this task takes just a couple of minutes.

       

    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Tape Robotics downtime

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)

      Packet marking (Scitag) testing in progress. 
      Currently in an email chain with Andy and Marion debugging path-based packet triggers.

    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45 14:46
      AOB 1m
    • 14:46 14:55
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 14:55 15:00
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore