Batch Operations Weekly

Europe/Zurich
31/S-023 (CERN)

31/S-023

CERN

22
Show room on map
Videoconference Rooms
Batch_Operations_Weekly
Name
Batch_Operations_Weekly
Description
Batchops
Extension
10748105
Owner
Gavin McCance
Auto-join URL
Useful links
Phone numbers
    • 14:00 15:00
      Agenda 1h
      • [Luis] Discuss the enforcement of code-review in our repositories/infra.
        • Traceability?: why setting X is applied in hostgroup A?
        • Quality?: why these sub hostgroups are a copy/paste of that one?
        • Standards?: why isn’t this patch merged into master after 6 months?
        • Knowledge sharing?: avoid “golden boy” anti-pattern.
        • Technical debt!
        • Related pointers: CODEOWNERS file (useful in bi).
        • CI/CD Testing?: offloading quality/functionality checks to automated pipelines for container images and helm charts.
        • Needs process for out of band changes.
        • Is there a threshold consideration
        • Agreed: MRs come with require approver. Can uncheck for emergency (can the runner notify on that?). We need a bot for open reqs. Need agreement with HPCers for bi. Protect QA & Master. Features should be tickets if you can't describe them in basically the commit message.
      • BBC-2109: Migration to CC7 (50%):
        • Only ~500 cores (CMS T0) pending in gva_project_004 (networking issues in OpenStack).
      • BBC-2028: Provisioning more 24cpu nodes in Geneva Project 041 (BE-ABP).
        • A total of 2400 cores is desired.
        • Using standard naming convention.
        • Testing Terraform on mixed flavor environments.
        • related:
          • build some new 32 cores bigmcore.
          • create a new wholemachine, fullnode hostgroup to consolidate sixteen and bigmcore: a hostgroup that only accepts full node jobs. This requires as well doing magic to convert user jobs to right sizes (18->16, 28->24, 48->32).
      • Fifemon probes:
        • New version almost ready. They were running for the last days succesfully sending data to fifecarbon02 (test graphite instance).
        • condorstats-t0, condorstats-prod, condorstats-test. condorstats-vcpool ?
        • Migration: change hieradata to point to fifecarbon01 (prod instance), stop condorstats01 and run puppet on the new instances.
        • next:
      • Exploring how to squeeze more efficiently the idle resources in the central managers (Ben?)
      • AMS public now allows u_va submission
        • AMS can now exit LSF going to 50/50 VMs in share / whole nodes in t0
        • Move all resources to CC7? Yes good point will confirm.
      • Kubernetes:
        • Do we want or already have chaos tools?
          • We have users.
        • Consul: use case for wtfis (egroups and schedds), workers, terraform states. Anything else?
          • replace other k/v use cases
          • maybe roger / drain state
      • Haggis
        • API wrapper:
          • Implementing the backend API call logic
          • Creating unit-tests
        • Website:
          • New version has a new implementation of a right-side drawer that works on every screen resolution and supports independent scrolling (very usefull in the Compute tab)
          • Grafana monitoring of errors and access time per page using the Prometheus client for Go - tested and fully functional!
          • New data-table fixed-headers feature is now available in 2.0 Alpha - currently waiting for the next major release
          • Please feel free to suggest any changes or report bugs here!
      • Kubernetes & Condor: CHEP?
Your browser is out of date!

If you are using Internet Explorer, please use Firefox, Chrome or Edge instead.

Otherwise, please update your browser to the latest version to use Indico without problems.

×