Ceph/CVMFS/Filer Service Meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

4
Show room on map
Description

Zoom: Ceph Zoom

    • 14:00 14:15
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)

      Enrico:

      • Recurrent no_contact and snapshot_error (lhcb.cern.ch backup Stratum 0) over the weekend with lxcvmfs127, now fixed (HW problem on hypervisor). This is *not* user visible.
      • Bad TCP_REFRESH_FAIL_OLD on front caches of Stratum1 for ProxyPass-ed repos
        • Tracked at CVMFSOPS-245
        • Seems to be fixed (fingers-crossed) with some header manipulation
      • Review of engineering tools repositories with IT-CDA
      • Useful meeting within it (ST, CM, CDA) about container distribution on the grid:
        • Prototype with Harbor (OIDC, garbage collection, security scans) using webhooks
        • In the plans to integrate with GitLab registry
      • Plan to submit a paper on lhcbdev.cern.ch at (virtual) CHEP with SFT and LHCb people
    • 14:15 14:30
      Ceph: Operations 15m
      • Notable Incidents/Requests 5m
      • Upgrades, Migrations, (De-)Commissioning 5m
        Speaker: Enrico Bocchi (CERN)
        • Gabe data migrations now paused until January. The balancer will remain off.
        • Latest new hw delivery is in final burn in test, so we can expect it January.
      • Hardware Repair Liaison 5m
        Speaker: Julien Collet (CERN)

        Julien

        • Lots of disk awaiting replacements!
        • Update on the script finished, but roll out will wait January
        • Disk replacements will carry on during Christmas break
          • Will be in touch with the repair team for that time in case they need

        Enrico

        • Memory module replaced on CEPHNETHUB-DATA-32672F0985 on 10/12. Smooth since then.
      • Puppet, Tools, Monitoring 5m
        Speaker: Theofilos Mouratidis (CERN)
        • Enabling PSI (Pressure Stall Info) on el8 machines. Needs a psi=1 kernel param, plus a reboot.
          • Goal is to better understand how heavily loaded a machine is.
          • This gives three new metrics, similar in concept to loadavg:
        [13:50][root@cephdata20b-b7e4a773b6 (qa:ceph/flax/osd*48) ~]# cat /proc/pressure/cpu 
        some avg10=0.00 avg60=0.08 avg300=0.15 total=1216096
        [13:50][root@cephdata20b-b7e4a773b6 (qa:ceph/flax/osd*48) ~]# cat /proc/pressure/io 
        some avg10=5.80 avg60=21.48 avg300=13.74 total=56284995
        full avg10=5.69 avg60=20.00 avg300=12.73 total=52108387
        [13:50][root@cephdata20b-b7e4a773b6 (qa:ceph/flax/osd*48) ~]# cat /proc/pressure/memory 
        some avg10=0.00 avg60=0.00 avg300=0.00 total=0
        full avg10=0.00 avg60=0.00 avg300=0.00 total=0
      • Upstream News 5m
        Speaker: Dan van der Ster (CERN)
    • 14:30 14:45
      Ceph: Ongoing Projects 15m
      • Kopano/Dovecot 5m
        Speaker: Dan van der Ster (CERN)
        • All users have been migrated to Dovecot. I'll start a discussion to understand what will become of the hw (kelly cluster).
      • REVA/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
    • 14:45 14:55
      S3 10m
      Speakers: Enrico Bocchi (CERN) , Julien Collet (CERN)
      • CEPH-974: I have managed to reproduce the "nameless" object issue in a test bucket, now trying to clean up. (I sent a mail to the ML, but no response yet). This ticket might be related: https://tracker.ceph.com/issues/48363

      Enrico:

      • Traefik/Filebeat finished and ready:
        • Painful log ingestion via fielbeat+logstash (marathon)+ES works!
        • Propose to spawn few bigger RGWs beginning of January to replace existing front-end
      • cephgabe-rgwxl-ac460f6b70 (s3.cern.ch for CVMFS) back in the LB after reboot

      Giuliano

      • Warp dashboard:
        • Now pointing to metrictanks
    • 14:55 15:05
      Filer/CephFS 10m
      Speakers: Dan van der Ster (CERN) , Enrico Bocchi (CERN) , Theofilos Mouratidis (CERN)
      • CEPH-1014: all the client.manila users were given `mgr r` capabilities, so that OpenStack can query the list of subvolumes (to see if they have been deleted correctly).
    • 15:05 15:10
      AOB 5m

      Unbeatable CEPH dream team won the IT-ST PubQuiz, by far.