Second K8s-HEP Meetup

Name: Second K8s-HEP Meetup
Start: 2020-12-01T09:00:00-06:00
End: 2020-12-02T13:45:00-06:00
Location: Zoom

1 Dec 2020, 09:00 → 2 Dec 2020, 13:45 America/Chicago

Virtual (Zoom)

Virtual

Zoom

lincoln bryant

Description

This is the second "meetup" of folks dealing with the challenges of applying Kubernetes to computing in high energy physics. The first (in-person) meeting was held at UChicago in January 2020, https://indico.cern.ch/event/882955/. The meetup offers an opportunity to share experience, expertise, tips for K8s and cloud-native technologies from both application and infrastructure perspectives. While the context is high energy physics, many contributions may be generally applicable to scientific computing and creation of cyberinfrastructure. All are welcome to attend.

You must register to attend - we'll be sending out Zoom connection details the morning of.

Live notes: https://docs.google.com/document/d/1s0KAl-LNnn1vvkH-Twiu909sLsaQdj6wo0vWJ9WUv-E/edit?usp=sharing

Contact

lincolnb@uchicago.edu

Registration

Participants

64 View full list

Tuesday 1 December
- Tue 1 Dec
- Wed 2 Dec
- 09:00 → 11:00
  block 1 - presentations
  - 09:00
    
    Welcome! What's this? 10m
    
    Speakers: Lincoln Bryant (University of Chicago (US)), Robert William Gardner Jr (University of Chicago (US))
    
    Slides
  - 09:10
    
    Fermilab Experience with OKD (OpenShift) 20m
    
    Fermilab has made the strategic decision to deploy OKD, the open source version of Red Hat OpenShift, for Kubernetes container management. We will discuss our experience so far with OKD and describe some of the challenges we faced deploying a variety of applications.
    
    Speaker: Anthony Tiradani (Fermilab)
    
    K8s_HEP_Meetup_December_2020.pdf
  - 09:30
    
    Debugging Kubernetes pod throughput with Calico CNI 20m
    
    Exploring how the kubelet, with Calico as the CNI plugin,
    depends on the performance of the Kubernetes API server
    to be able to start pods quickly.
    
    Speaker: Mr Thomas George Hartland (CERN)
    
    pod-throughput-calico.pdf
  - 09:50
    
    K8s autoscaling based on custom metrics. Two examples of application: CMSWEB and HTCondor in the CMS Analysis Facility@INFN 20m
    
    At the moment, Kubernetes only supports horizontal pod autoscaling based on predefined pod metrics (CPU and memory usage). Therefore, in order to achieve an actually green elastic cloud model (optimizing resource usage) a key point is to integrate this tool with autoscaling solutions based on custom metrics, and this requires the usage of third-party elements.
    In this work we show the horizontal pod autoscaling based on custom metrics: in this workflow metrics are collected by a Prometheus server, and are then manipulated and made available to k8s-native Horizontal Pod Autoscaler (HPA) resources.
    We show how we apply the presented feature to two HEP-related use cases: in the first one this solution is applied to CMSWEB (i.e. CMS web services) infrastructure, in the second one it is used to enhance elasticity of an analysis facility prototype on INFN-Cloud, with the automatic scaling of HTCondor instances.
    
    Speaker: Tommaso Tedeschi (Universita e INFN, Perugia (IT))
    
    k8s-HEP_tedeschi.pdf
  - 10:10
    
    Lightweight integration of Kubernetes clusters for ATLAS batch processing 20m
    
    The PanDA team has evaluated the possibility of native Kubernetes job submission in order to process ATLAS workloads and offer the possibility of immediate integration of major cloud computing providers. This model also offers a novel way to set up lightweight compute sites, without the need of setting up a Grid stack.
    
    During the last year we have been running several queues at clusters setup by institutes associated to ATLAS (ASGC, CERN, University of Chicago, University of Victoria) and on cloud providers (Amazon and Google), and have focused on increasing the stability and efficiency.
    
    This contribution will discuss the advantages and challenges we have faced during our experience and also briefly introduce ongoing work to integrate less trivial (non pleasantly parallel) workloads.
    
    Speaker: Fernando Harald Barreiro Megino (University of Texas at Arlington)
    
    Lightweight integration of Kubernetes clusters for ATLAS batch processing (1).pdf
  - 10:30
    
    Lazy Image Pulling with Stargz 20m
    
    Container images allowed having reproducible environments and container orchestration lets users parallelize and create elaborate workflows with tools like Argo or just Kubernetes jobs. It is easy to create very large images and when parallelizing jobs, the time and cost of pulling container images can increase significantly. Golang developers proposed the seekable tar gunzip format (stargz) to address the issue for their CI by downloading files from the container registry only when they are needed. This presentation describes the current state of lazy loading images with containerd and stargz and we will present the results of our benchmarks.
    
    Speaker: Spyridon Trigazis (CERN)
    
    Spyros_Trigazis_Lazy_Image_Pulling_with_Stargz.pdf
- 11:00 → 11:20
  
  Coffee break 20m
- 11:20 → 13:00
  Overflow & Open Discussion
  - 11:20
    
    Reproducible and Scalable workflows for SkyhookDM experimentation on Kubernetes 20m
    
    Preparing a Systems experiment environment requires setting up infrastructure, baselining the infrastructure, installing dependencies and tools, running experiments, and manually plotting results, which if done manually, is cumbersome and error-prone. This same scenario applies to researchers starting to experiment with Ceph or SkyhookDM, which is an extension for Ceph to run queries on tabular datasets stored as objects. To address this issue, we used Popper, the container-native workflow engine, to build scalable and reproducible workflows for automating an end-to-end pipeline for experimenting with Ceph and SkyhookDM deployed on Kubernetes via Rook.
    
    Speaker: Jayjeet Chakraborty (University of California, Santa Cruz)
    
    chakraborty_k8s_hep.pdf
  - 11:40
    
    Discussion 1h 20m
Wednesday 2 December
- Tue 1 Dec
- Wed 2 Dec
- 09:00 → 11:45
  block 3 - presentations
  - 09:00
    
    Multi Cluster / Cloud Kubernetes for GPU Evaluation 20m
    
    GPUs are scarce resources in many of our centers, including CERN.
    
    This talk will quickly describe a multi cloud deployment with the goal of evaluating the performance of different workloads in all GPUs offered by GCP, Azure and AWS.
    
    It will include some details about setting up clusters and GPUs in each of these clouds, and some preliminary results.
    
    Speaker: Ricardo Brito Da Rocha (CERN)
    
    Multi Cluster _ Cloud Kubernetes for GPU Evaluation.pdf
  - 09:20
    
    Running a multi-tenant Kubernetes with GitOps 20m
    
    Starting in October 2020, the PATh project is making a concerted effort to transition the centrally-run OSG services (such as websites, software repositories, information services) from ad-hoc deployment models to Kubernetes.
    
    To do so, we needed a Kubernetes "home" and an operational model! In this talk, we'll overview the work going on in the Tiger cluster at Morgridge, our current GitOps-based workflow with Flux, and how we see things fitting into the larger ecosystem of distributed services and federated Kubernetes.
    
    Speaker: Brian Paul Bockelman (University of Wisconsin Madison (US))
    
    Tiger-GitOps.pdf
  - 09:40
    
    Overview of CMSWEB Cluster in Kubernetes 20m
    
    The CMS experiment heavily relies on the CMSWEB cluster to host critical services for its operational needs. The cluster is deployed on virtual machines (VMs) from the CERN OpenStack cloud and is manually maintained by operators and developers. The release cycle is composed of several steps, from building RPMs, their deployment, validation, and integration tests. To enhance the sustainability of the CMSWEB cluster, CMS decided to migrate its cluster to a containerized solution such as Docker, orchestrated with Kubernetes (k8s). This allows us to significantly reduce the release upgrade cycle, follow the end-to-end deployment procedure, and reduce operational cost.
    
    Recently, we have performed the migration of some CMSWEB services from the VM cluster to the Kubernetes. This talk gives an overview of the current CMSWEB cluster. We describe the new architecture of the CMSWEB cluster in Kubernetes and its implementation strategy. We’ll discuss how we create docker images of the services and deploy them in this cluster using the service deployment cycle. Furthermore, we’ll discuss about monitoring of these services. In the end, we’ll discuss our future plan related to this cluster.
    
    Speaker: Muhammad Imran (National Centre for Physics (PK))
    
    k8shep_presentation_cmsweb.pdf
  - 10:00
    
    Experience with K8s at Coffea-Casa AF@UNL 20m
    
    In this contribution we would like to share our experience designing an Analysis Facility for the columnar analysis utilizing the analysis package COFFEA at University of Nebraska-Lincoln and to describe our adventure on deploying different workloads and services at UNL Kubernetes cluster (Jupyterhub with Traefik integration, HTCondor, ServiceX and other infrastructure deployments).
    
    Speaker: Carl Lundstedt (University of Nebraska Lincoln (US))
    
    Experience with K8s at Coffea-Casa AF@UNL.pdf
  - 10:20
    
    Test REANA Deployment at BNL 20m
    
    In this presentation we'll discuss our experiences deploying a test REANA instance on a k8s cluster at BNL.
    
    Speaker: Christopher Henry Hollowell (Brookhaven National Laboratory (US))
    
    reana_bnl.pdf
  - 10:40
    
    What's new with SLATE? 20m
    
    Will review progress over the past year with SLATE - including new containerized apps, storage provisioner, security policies for federated operations
    
    Speaker: lincoln bryant
    
    SLATE K8S Workshop 2.0 (4).pdf
  - 11:00
    
    Kubernetes at UVic 20m
    
    I will describe Kubernetes cluster deployment at UVic, including batch computing and APEL accounting for ATLAS.
    
    Speaker: Ryan Taylor (University of Victoria (CA))
    
    Kubernetes at UVic - slides
  - 11:20
    
    Packaging and using services in Kubernetes 20m
    
    OSG lessons learned distributing service container images and experiences contributing to and deploying services with SLATE
    
    Speaker: Brian Hua Lin (University of Wisconsin - Madison)
    
    2020-12-02.osg-packaging-using-k8s.pdf
    
    2020-12-02.osg-packaging-using-k8s.pptx
- 11:45 → 12:05
  
  Coffee break 20m
- 12:05 → 13:45
  Overflow & Open Discussion
  - 12:05
    
    Discussion 55m

Choose timezone

Second K8s-HEP Meetup

Virtual

Zoom