US-ALICE Grid operations review

US/Pacific
Thursday 70-191, Friday 2-100, Monday Pers-Hall Annex (Lawrence Berkeley Lab)

Thursday 70-191, Friday 2-100, Monday Pers-Hall Annex

Lawrence Berkeley Lab

Zoom link: https://lbnl.zoom.us/my/jeffersonporter By Phone: US: +1 669 900 6833 or +1 646 558 8656 Meeting ID: 352 692 1875
Jeff Porter (Lawrence Berkeley National Lab. (US)), Latchezar Betev (CERN)
Description
Yearly review of the Grid operations in the ALICE US sites
    • 09:15 09:30
      Intro, Goals & Agenda tuning 15m
      Speaker: Jeff Porter (Lawrence Berkeley National Lab. (US))
    • 09:30 10:30
      ALICE-USA Computing project summary 1h

      overview of project status with pointers to specific items we hope to address during the meeting.

      Speaker: Jeff Porter (Lawrence Berkeley National Lab. (US))
    • 10:30 10:45
      coffee break 15m
    • 10:45 11:45
      ALICE Computing requirements 2019-2021 1h

      https://docs.google.com/document/d/1IDV0UE1oVkHrheToJjl1fQycFIWbzQ-c4ULRkMwrv6o/edit?ts=5caf6a69

      Review of new requirements and expected changes in the job flux into the T2s for Run 3. Also, any new infrastructure on the horizon ... Singularity?

      Speaker: Latchezar Betev (CERN)
    • 11:45 12:15
      US T2 layout details, 2019 task list, evolution plans 30m

      Fill out a Google doc on layout details, items on our to-do list, and how the project expect the clusters to grow/evolve. Note any site issues that may impact these plans

      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Karen Fernsler (LBNL), Pete Eby (Oak Ridge National Laboratory - (US))
    • 12:15 13:45
      Lunch 1h 30m
    • 13:45 14:10
      US T2 layout, 2019 task lists, evolution plans continued 25m
      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Karen Fernsler (LBNL), Pete Eby (Oak Ridge National Laboratory - (US))
    • 14:10 14:30
      Quick Check of site health: review from last year 20m

      list of items from ALICE monitoring that one should check and a quick review of how to identify problems via aliensh.

      Speaker: Latchezar Betev (CERN)
    • 14:30 15:15
      T2 Site Test Infrastructure 45m

      Identify set of tasks that can be scripted via aliensh or pyJalien.py - to help verify that site behavior is normal relative to ALICE use &/or to identify specific issues: site-to-site FW block, CMreport not working, Grid cert expired, ...

      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), Pete Eby (Oak Ridge National Laboratory - (US))
    • 15:15 15:35
      Coffee Break 20m
    • 15:35 16:00
      NERSC HPC status, current model, future plans, summer students 25m
      Speaker: Jeff Porter (Lawrence Berkeley National Lab. (US))
    • 16:00 17:00
      ESnet Review prep & overview 1h

      Overview of review process and objective, with opportunity to fill out some numbers.

      Report as exists:

      https://docs.google.com/document/d/1_5jIzpTFfJJgzEiokYsxb3Ked4eKYx5MZVR_Ll-DnFs

    • 09:30 10:30
      WAN: Requirements & Monitoring plans 1h

      Review expectations/requirements RTT/bandwidth,
      monitoring capabilities of perfsonar & its installation
      Build a reply to Shawn McKee on what instances to include in OSG monitoring network

      Notes from Pete:

      https://docs.google.com/presentation/d/1ic0lgYWHGXtGAN7UyQGFQoOXM0e0X9qokOqISVgBC7k

      Speakers: Costin Grigoras (CERN), Pete Eby (Oak Ridge National Laboratory - (US))
    • 10:30 11:10
      EOS drain summary & EOS or XRootD test infrastructure 40m

      Debrief on draining our 2 EOS storages. Lessons learned?

      Could EOS/XRootD test infrastructure help? Better network / server monitors?

      Speakers: Costin Grigoras (CERN), Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Pete Eby (Oak Ridge National Laboratory - (US))
    • 11:10 11:30
      coffee discussion 20m
    • 11:30 12:15
      Analysis Jobs: overview of common features, efficiency 45m

      General summary of how analysis jobs work (or don't!), where efficiency loss arises, how to detect

      Speaker: Costin Grigoras (CERN)
    • 12:15 13:45
      Lunch 1h 30m
    • 13:45 14:30
      HPCS Efficiency evaluation 45m

      review monitors of HPCS efficiency: contributions & determine what is reasonable to expect given data layout. If not reasonable, how to fix?

      Speakers: Costin Grigoras (CERN), Jeff Porter (Lawrence Berkeley National Lab. (US)), John White (LBNL), Karen Fernsler (LBNL)
    • 14:30 15:15
      Site configurations to support Run 3 45m

      We're starting to buy / configure T2 hardware for Run 3. What should we be doing? Will we support analysis?

      Speaker: Latchezar Betev (CERN)
    • 15:15 15:30
      coffee 15m
    • 15:30 16:30
      T2 Site Test Infrastructure, revisited 1h
      Speakers: Jeff Porter (Lawrence Berkeley National Lab. (US)), Pete Eby (Oak Ridge National Laboratory - (US))
    • 16:30 17:30
      AOB 1h
    • 09:30 12:30
      WP15 status & plans for US participation 3h