RAL Tier1 Experiments Liaison Meeting

Europe/London
Access Grid (RAL R89)

Access Grid

RAL R89

    • 13:38 13:39
      Major Incidents Changes 1m
    • 13:39 13:40
      Summary of Operational Status and Issues 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))
    • 13:40 13:41
      GGUS /RT Tickets 1m

      https://tinyurl.com/T1-GGUS-Open
      https://tinyurl.com/T1-GGUS-Closed

    • 13:41 13:42
      Site Availability 1m

      https://lcgwww.gridpp.rl.ac.uk/utils/availchart/

      https://cms-site-readiness.web.cern.ch/cms-site-readiness/SiteReadiness/HTML/SiteReadinessReport.html#T1_UK_RAL

      http://hammercloud.cern.ch/hc/app/atlas/siteoverview/?site=RAL-LCG2&startTime=2020-01-29&endTime=2020-02-06&templateType=isGolden

    • 13:42 13:43
      Experiment Operational Issues 1m
    • 13:44 13:45
      VO-Liaison ATLAS 1m
      Speakers: James William Walder (Science and Technology Facilities Council STFC (GB)), Dr Tim Adye (Science and Technology Facilities Council STFC (GB))

      RT: 
       - 244717  Possible discrepancy in job RAM usage;  JW to investigate (once the pilot issue below is resolved).
       

      - Current issue with latest pilot for the reported used corecount (affects e.g. site-oriented dashboard with higher slot counts); may also affect memory calculation and (ATLAS-side) job killing (e.g. mem/core no longer valid).
      Will be fixed asap with hot-fix. 

      - Correction to last meeting:
        - The HC test AFT 952 (analysis test job) *are running* with direct-io (missing a flag in panda output report).
       - However, the conclusions still stand; i.e. test 952 (the value used)  contains 1k events, and the (single) test file exists already on almost 50% of nodes. 
       - Other analysis tests:  1013 is copy; 883 is direct (but similar caveats to the 952 tests).
      Will follow-up with HC support.

       

    • 13:46 13:47
      VO Liaison CMS 1m
      Speaker: Katy Ellis (Science and Technology Facilities Council STFC (GB))

      I was on annual leave last week.

      The slow IPv4 transfers to WNs: I have tried prompting the network team, since they have not sent any response since I have been away. Before I left, I ran a couple of tests for them. For one test they didn't see any evidence of my (very slow) transfer at all in their logs. Rather unsatisfactory. Maybe the ticket has been updated?

      CMS jobs have had low efficiency (~30% frequently) on a regular basis, and ~no time at all with a 'good' value (e.g. similar to other VOs 70-80% plus).

      https://ggus.eu/index.php?mode=ticket_info&ticket_id=148374 . AAA SAM tests are now green, but I intend to check if the same errors are still appearing. Various people were following up, and I need to push them to continue if necessary before closing the ticket.

      I am very busy with CMS/Rucio testing of various types: multihop transfers to CTA; tape approval scripts; 'on tape' functionality. We are currently synchronising the existing data from Phedex into Rucio (been running 1 week, ~1 week remaining). We are also thinking about communication to the collaboration.

       

    • 13:48 13:49
      VO Liaison LHCb 1m
      Speaker: Raja Nandakumar (Science and Technology Facilities Council STFC (GB))
    • 13:52 13:53
      VO Liaison Others 1m
    • 13:53 13:54
      Experiment Planning 1m
    • 13:54 13:55
      Dune/protoDune 1m
    • 13:55 13:56
      Euclid 1m
    • 13:56 13:57
      SKA 1m
    • 13:57 13:58
      AOB 1m
    • 13:58 13:59
      Any other Business 1m
      Speakers: Brian Davies (Lancaster University (GB)), Darren Moore (Science and Technology Facilities Council STFC (GB))