US-ALICE Grid operations review

US/Pacific
Thursday 70-191, Friday 2-100, Monday Pers-Hall Annex (Lawrence Berkeley Lab)

Thursday 70-191, Friday 2-100, Monday Pers-Hall Annex

Lawrence Berkeley Lab

Zoom link: https://lbnl.zoom.us/my/jeffersonporter By Phone: US: +1 669 900 6833 or +1 646 558 8656 Meeting ID: 352 692 1875
Jeff Porter (Lawrence Berkeley National Lab. (US)), Latchezar Betev (CERN)
Description
Yearly review of the Grid operations in the ALICE US sites
    • 1
      Intro, Goals & Agenda tuning
      Speaker: Jeff Porter (Lawrence Berkeley National Lab. (US))
    • 2
      ALICE-USA Computing project summary

      overview of project status with pointers to specific items we hope to address during the meeting.

      Speaker: Jeff Porter (Lawrence Berkeley National Lab. (US))
    • 10:30
      coffee break
    • 3
      ALICE Computing requirements 2019-2021

      https://docs.google.com/document/d/1IDV0UE1oVkHrheToJjl1fQycFIWbzQ-c4ULRkMwrv6o/edit?ts=5caf6a69

      Review of new requirements and expected changes in the job flux into the T2s for Run 3. Also, any new infrastructure on the horizon ... Singularity?

      Speaker: Latchezar Betev (CERN)
    • 4
      US T2 layout details, 2019 task list, evolution plans

      Fill out a Google doc on layout details, items on our to-do list, and how the project expect the clusters to grow/evolve. Note any site issues that may impact these plans

      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Karen Fernsler (LBNL), Pete Eby (Oak Ridge National Laboratory - (US))
    • 12:15
      Lunch
    • 5
      US T2 layout, 2019 task lists, evolution plans continued
      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Karen Fernsler (LBNL), Pete Eby (Oak Ridge National Laboratory - (US))
    • 6
      Quick Check of site health: review from last year

      list of items from ALICE monitoring that one should check and a quick review of how to identify problems via aliensh.

      Speaker: Latchezar Betev (CERN)
    • 7
      T2 Site Test Infrastructure

      Identify set of tasks that can be scripted via aliensh or pyJalien.py - to help verify that site behavior is normal relative to ALICE use &/or to identify specific issues: site-to-site FW block, CMreport not working, Grid cert expired, ...

      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), Pete Eby (Oak Ridge National Laboratory - (US))
    • 15:15
      Coffee Break
    • 8
      NERSC HPC status, current model, future plans, summer students
      Speaker: Jeff Porter (Lawrence Berkeley National Lab. (US))
    • 9
      ESnet Review prep & overview

      Overview of review process and objective, with opportunity to fill out some numbers.

      Report as exists:

      https://docs.google.com/document/d/1_5jIzpTFfJJgzEiokYsxb3Ked4eKYx5MZVR_Ll-DnFs

    • 10
      WAN: Requirements & Monitoring plans

      Review expectations/requirements RTT/bandwidth,
      monitoring capabilities of perfsonar & its installation
      Build a reply to Shawn McKee on what instances to include in OSG monitoring network

      Notes from Pete:

      https://docs.google.com/presentation/d/1ic0lgYWHGXtGAN7UyQGFQoOXM0e0X9qokOqISVgBC7k

      Speakers: Costin Grigoras (CERN), Pete Eby (Oak Ridge National Laboratory - (US))
    • 11
      EOS drain summary & EOS or XRootD test infrastructure

      Debrief on draining our 2 EOS storages. Lessons learned?

      Could EOS/XRootD test infrastructure help? Better network / server monitors?

      Speakers: Costin Grigoras (CERN), Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Pete Eby (Oak Ridge National Laboratory - (US))
    • 11:10
      coffee discussion
    • 12
      Analysis Jobs: overview of common features, efficiency

      General summary of how analysis jobs work (or don't!), where efficiency loss arises, how to detect

      Speaker: Costin Grigoras (CERN)
    • 12:15
      Lunch
    • 13
      HPCS Efficiency evaluation

      review monitors of HPCS efficiency: contributions & determine what is reasonable to expect given data layout. If not reasonable, how to fix?

      Speakers: Costin Grigoras (CERN), Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Karen Fernsler (LBNL)
    • 14
      Site configurations to support Run 3

      We're starting to buy / configure T2 hardware for Run 3. What should we be doing? Will we support analysis?

      Speaker: Latchezar Betev (CERN)
    • 15:15
      coffee
    • 15
      T2 Site Test Infrastructure, revisited
      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), Pete Eby (Oak Ridge National Laboratory - (US))
    • 16
      AOB
    • 17
      WP15 status & plans for US participation