RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Thursday-Friday we saw SAM test and transfer failures into Antares due to missing router ACLs(?) on the new EOS front-end. 

      Observed the problem again that number of running cores is very different in Vande compared to CMS monit. This discrepancy seems to be much reduced today. Doing some connection tests with FNAL to see if there is a problem (again) connecting to the schedulers based at FNAL. 

      Testing transfers at CNAF today and yesterday. Using RAL as a destination for reads from CNAF. Investigating some errors seen at RAL, whereas other CMS T1 destinations used show much lower (or zero) error rates. One error found just existed for 2 minutes - possible network glitch? Tom Birkett contacted DI. 

      Overall good performance of jobs relative to other CMS T1s. 

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues;

      • There was a spike of failed WGProduction jobs last Sunday. Not our fault -- buggy xrootd client used by the jobs.
      • There were some failed uploads from HLTFarm to Tier-1 sites (including RAL)
        • These errors can be ignored -- HLTFarm does not have external connectivity, and this transfers should have never been submitted (but due to a bug in DIRAC they were..)
      • Low level of upload failures from other sites to ECHO
        • French sites seems to be the most affected
        • Seems like transfers are just timing out due to low speed
      • Almost all transfers from RAL to Lanzhou are failing
        • So far it is not clear which side is problematic
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • RC2 pipeline now complete
        • required amendments to the run to output logging every 10 mins so PanDA didnt kill the job
        • Still need to investigate why RAL is taking longer than other sites as LANCS are now staging data for jobs the same way we are (via https / davs though a gateway)
      • Now working with CM team to enable data retrival to the USDF for comparison and analysis of the sites outputs
      • IngestD update deployed at RAL - Major version change, now running version 2.1
      •  
    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Repack Progress

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)
    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45 14:50
      SSD Storage Evaluation 5m
    • 14:50 14:55
      Echo deployment 5m
    • 15:00 15:01
      AOB 1m
    • 15:01 15:10
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 15:10 15:15
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore