RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

Zoom Meeting ID
66811541532
Host
Alastair Dewhurst
Useful links
Join via phone
Zoom URL
    • 13:30 13:31
      Experiment Operational Issues 1m
    • 13:35 13:40
      ATLAS Operations Report 5m
      Speakers: Brij Kishor Jashal (Rutherford appelton laboratory), Jyoti Prakash Biswal (Rutherford Appleton Laboratory)
    • 13:40 13:45
      CMS Operations Report 5m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      Katy may or may not make the meeting as she is in CMS Computing Management 3-day meeting.

      Red day for SAM on Monday due to another network issue lasting a few hours. 

      Otherwise a very good week for CMS at T1 - many running cores at end of last week and since the Monday network issues; highest CPU efficiency among all T1s. 

      A few residual failures on Echo->Antares link since Monday network issues to investigate - may still be cleaned up automatically in time. Otherwise performance of transfers looks good. Wrote over 700TB to tape this week. 

    • 13:45 13:50
      LHCb Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)

      Operational issues:

      • Network outage on Monday (GGUS:683544)
        • Site-wide network outage, affected LHCb as well (obviously)
        • Fixed on Monday evening
        • Is the reason behind this known?
      • New Echo storage node inaccessible from 201[89] gen WNs (GGUS:683524)
        • Caused local gateways to get stuck and consequently all downloads from ECHO fail
        • Fixed now, network setup was changed to allow WNs to talk to the SN.
      • Redirector issues this morning
        • Wrong DNS alias?

       

      CVMFS:

      • (monitoring) issues with squid0[56] are still present 
        • The machines are correctly resolvable from RAL, but not outside RAL (e.g. lxplus)
          • Added to internal DNS, but not external one?
          • FAB-1101 is tracking the issue
    • 13:50 13:55
      ALICE Operations Report 5m
      Speaker: Alexander Rogovskiy (Rutherford Appleton Laboratory)
    • 13:55 14:00
      LSST Operations Report 5m
      Speakers: Mathew Sims, Timothy Noble (Science and Technology Facilities Council STFC (GB))
      • RC2 now ingested and running jobs
        • some hiccups, but seems to be there now
        • still issues with Step 3 requesting not enough Memory, but that has not been fixed in the pipeline and requires alterations on a run basis - once done, runs as expected
      • Network blips seem to have extended run time of jobs, but jobs still succeeding
      • IngetstD v20 now deployed
      • Nalin got workable demo deployments of IngestD  / LSST monitoring stack and will soon deploying that for LSST monitoring, and then be looking at more in-depth job monitoring

       

       

       

    • 14:00 14:01
      Tier-1 Projects 1m
    • 14:15 14:25
      Anatares Upgrade 10m

      New EOS nodes
      Repack Progress

      Speakers: George Patargias, Thomas Byrne
    • 14:25 14:35
      XRootD Development 10m
      Speakers: Alexander Rogovskiy (Rutherford Appleton Laboratory), Jyothish Thomas (STFC)

      streamed checksum validation sitting at 99.94% currently. Rolling out next week in logging mode, prepared CC for switching to storage mode after.

      error 500 cause possibly found - DNS entry mismatch

    • 14:35 14:45
      Utilizing GPUs 10m
      Speakers: Jyoti Prakash Biswal (Rutherford Appleton Laboratory), Thomas Birkett
    • 14:45 14:46
      AOB 1m
    • 14:46 14:55
      Summary of Operational Status and Issues 9m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore
    • 14:55 15:00
      Any other Business 5m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore