WLCG Open Technical Forum (OTF) #5

Europe/Zurich
31/3-004 - IT Amphitheatre (CERN)

31/3-004 - IT Amphitheatre

CERN

105
Show room on map
Alessandro Di Girolamo (CERN), James Letts (Univ. of California San Diego (US))
Description

 

Welcome to the WLCG Open Technical Forum (OTF).

The WLCG Technical Coordination Board (TCB) is responsible for the technical evolution of WLCG services in line with the needs of the experiments and the capabilities of the infrastructure providers. The TCB defines a multi-year roadmap for such evolution and is responsible for its implementation. The TCB achieves these goals with a bottom-up approach through an Open Technical Forum (OTF) which welcomes the participation of all contributors to the technical evolution in the WLCG community.

Organizational matters:

  • Please register only if you plan to attend in person.
  • If you require a visitor pass, please contact lcg.office AT cern.ch at least 48h before  your travel.

 

Registration
Participants
Participants
  • Andreu Pacheco Pages
  • +1
Zoom Meeting ID
63997284703
Host
James Letts
Useful links
Join via phone
Zoom URL
    • 14:00 17:30
      Resource Provisioning with k8s

      Resource Provisioning in WLCG - Problem Statement
      The WLCG community has relied for a long time on traditional batch systems at computing centers to provision resources for data processing and analysis. However, the increasing complexity and diversity of workloads for HL-LHC - including AI/ML training, interactive analysis, and on the one side, and the popularity of heterogeneous providers (such as HPC and Cloud) providing sizable amount of and specialized/heterogeneous hardware (e.g., GPUs) on the other side- are pushing the boundaries of what these systems can efficiently support. All this clearly poses challenges in an effective integration but might open the doors to new patterns of data access/processing.
      Emerging technologies such as Kubernetes (k8s) offer new models for dynamic and scalable resource provisioning and, although k8s based platforms are particularly suitable to expose and manage applications (services-oriented), nowadays lightweight job scheduling systems (such as kueue) are gaining traction.
      The problem we face is how to evolve our provisioning models to support this broader ecosystem of resources, to enable new workload and processing patterns, while maintaining interoperability, efficiency, and sustainability (i.e. reducing the operational costs.) across WLCG. Some of the points we would like to understand:

      • What are the concrete use cases driving this evolution, what role can we foresee for k8s native scheduling systems?
      • What are the technical and organizational barriers to adoption?
      • What role can k8s have in enhancing the ML/AI development within communities computing systems?
      • How can we bridge traditional batch systems with k8s-based or hybrid models? What are the opportunities for systems/applications to scale-out?
      • What role can community-developed solutions (e.g., interLink, SONIC) play?

      This session aims to frame these questions, share early implementations, and identify common challenges and opportunities for collaboration

      • 14:00
        Introduction and problem statement 10m
      • 15:30
        coffee/tea break 20m
  • Wednesday 25 June
    • 14:00 17:00
      FTS Evolution toward Run4
      • 14:00
        Introduction and problem statement 5m
        Speakers: Alessandro Di Girolamo (CERN), James Letts (Univ. of California San Diego (US))
      • 14:05
        FTS Evolution - draft of the plan 10m
        Speakers: Mihai Patrascoiu (CERN), Steven Murray (CERN)
      • 14:15
        FTS and LHCb 20m
      • 14:35
        FTS and CMS 20m
      • 14:55
        FTS and ATLAS 20m
      • 15:15
        Coffee and tea 20m
      • 15:35
        discussion 50m