Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



Show room on map

Zoom: Ceph Zoom

    • 2:00 PM 2:15 PM
      CVMFS 15m
      Speakers: Enrico Bocchi (CERN) , Fabrizio Furano (CERN)


      • backup-cvmfs01 (RJ59) to be decommissioned (ETA: tbd) -- CEPH-972
        • Very easy, we can do at any time.
      • cvmfs-backend (RA13) to be decommissioned -- ETA Dec 2021 (or sooner) -- CEPH-1087
        • Very critical -- need to change config for squids and properly review
        • We can use hot-spare Stratum1 p06636710y99625.cern.ch
        • Getting full on ZFS (~82%) -- Trying to recover some space
      • Moved 5 machines to "IT CVMFS Ironic"
        • 1 machine (out of 6) does not boot. Cloud is investigating...
        • Prospected usage is new hot-spare backend, HA Stratum 1, ...
        • @fabrizio, @dan, please check you have access and can manage machines.
      • CVMFSOPS-277 -- Fixed abandoned nested catalogs in unpacked.cern.ch


      • the test machine in the "angel" cluster goes out of memory every week, changed the formula that calculates the size of the memory cache of squid
    • 2:15 PM 2:35 PM
      Ceph Operations Reports 20m
      • Teo (cta, erin, kelly, levinson) 5m
        Speaker: Theofilos Mouratidis (CERN)
        • Created ceph/next cluster on C8 with ceph/pacific (16.2.4)
          • Using 250G (x3) of hyperc volumes
        • It will be used mainly to test out future versions that have specific components we need
        • For example pacific has the snapshot scheduler
        • This cluster will be used for now to be a backend for Reva to:
          • Firstly run testing suites against it, like smashbox and ocis
          • Then run user tests with a small number of people
      • Enrico (barn, beesly, gabe, meredith, nethub, vault) 5m
        Speaker: Enrico Bocchi (CERN)

        Barn, Meredith, Vault: NTR



        • Reshard CVMFS buckets (50k objects per shard, prime number of shards): cvmfs-cms, cvmfs-cms-ib, cvmfs-atlas-nightlies, cvmfs-ams, cvmfs-unpacked-v20200814, cvmfs-alice
        • Logs for CSIR on ES in progress (logstash struggles to keep up to speed with manipulations and multiple outputs)
        • Traefik (w/o consul/nomad) in progress


        • OTG0063859 -- "Network Validation Tests: 773 Site Backbone Failure" was OK

        • Next Monday (or is it the 28th?) it will be more fun -- OTG0064352

      • Dan (dwight, flax, kopano, jim, upstream) 5m
        Speaker: Dan van der Ster (CERN)
        • CEPH-1155CEPH-1159: CS8 raid1 issue understood and proposed patch from mdadm upstream fixes. Feedback to Linux team for better CS8 testing: https://its.cern.ch/jira/projects/LOS/issues/LOS-759
        • CEPH-1154: puppet and cephadm still ongoing. 
        • FILER-141: migrate Filer cp1/cpio1 volumes to barn-500 using ZFS add/remove functionality. Done for twiki over the weekend.
        • Still studying beesly unfound issue. This morning there was a io error on primary during scrub, which repaired correctly as usual, so the weekend issue was something unique/rare.
        • Met with Jakob B about CephFS tests for ROOT: they have poor ceph perf on a remote cluster somewhere outside cern; asking for some help to understand.
          • Also future ROOT IO will prefer to write to single file from multi-writers. This is a use-case for LAZY_IO.
      • Arthur (pam) 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
        • Tested loss of network connectivity between ceph nodes on pam
          • mostly problems with renewing ceph authx keys on the osd (went away after a restart)
        • Triggered the system stray file limits on pam while deleting snapshotted file
          • Still not determined a way to purge the system stray dir (apart from restarting the mds)
      • Jose (OpenStack) 20m
        Speaker: Jose Castro Leon (CERN)


        • Upgrade to Wallaby postponed waiting for 14.2.22 deployed in all cephfs clusters
        • Clients on controllers will be running Octopus release using repo defined in ceph puppet module


        • Upgrade to Wallaby is ready, no rush to push it
        • As part of the change to use the ceph puppet module, shall I move it to Octopus?


        • I'll prepare the clients to use ceph puppet module, Shall I update the clients to Octopus later?


        Related to the Availability Zone discussions, we'll go forward with the proposal to have exactly the same AZs on storage that on compute (cern-geneva-a, cern-geneva-b, cern-geneva-c, gva-critical)

        Machines in any of the AZs will be able to mount any volume in any other AZ. This may not be the case in the PCC as compute and storage zones will map to existing rooms.

    • 2:35 PM 2:45 PM
      R&D Projects Reports 10m
      • Reva/CephFS 5m
        Speaker: Theofilos Mouratidis (CERN)
        • Use cases (to be discussed on this Thursday, time: TBA):
          • Static path, User -> /dir: (e.g. /project-geant4)
            • Everyone will mount the path
              • Create, view, edit according to their permissions
            • on Cernbox everyone will view and sync /dir
          • Static path + home, User -> /dir/<username>: (e.g. /home/tmourati)
            • each user can either mount /home or /home/<username>
            • each user has full access to their user folder
            • each user has access to other folder if given permission to
            • on Cernbox they will view /home/<username>
          • Static Manila, User -> /volumes/_no_group/common_manila_share>/uid:
            • Can be served by the first bullet by setting dir to:
              • /volumes/_no_group/<common_share>/<uid>
          • Manila + home, User -> /volumes/_no_group/<user_share>/<uid>:
            • Can be served by the second bullet but:
              • Create subvolume for user if it doesn't exist
              • User <--> Share, 1:1 ratio
              • Use fsadmin to store
          • Existing Manila (TBD)
        • File versions
          • We need global scope snapshots instead of just using subvolume snapshots to cover all cases.
          • If we let users snapshot their own directories we will have the following problem:
            • mkdir /home/tmourati/.snap/new_snap
            • mv /home/tmourati/a.txt /home/tmourati/b.txt (ino: 1588)
            • rm /1588; ln -s /home/tmourati/b.txt /1588 (to update reverse index)
            • To see the previous version of the file now:
              • Checking the path will not work, file b.txt does not exist in snap
              • Checking the link won't work as well because it points to b.txt, same as above
              • Will have to brute force by implementing `find -inum 1588` in go-ceph
            • But, by snapshotting the whole FS we can get the previous version of the link
              • So /.snap/new_snap/1588 -> /home/tmourati/a.txt
              • And then we retrieve it from /.snap/new_snap/home/tmourati/a.txt
        • Shares
          • In the current implementation we assume the user has visibility from their user folder and down the FS
          • Every operation is done with relative paths
          • The problem arrives when a user wants to access a share (either dir or file)
            • The context will point to "/home/alice/"
            • The link will point to "shared.txt"
            • But the previous file belongs to Bob
            • So, reading shared.txt with Alice's mount handle will lead to an error
              • The mount handle would try to read /home/alice/shared.txt
              • Will work only for Bob 
          • To fix this problem (not implemented yet)
            • Each user will have to mount root (/) and automatically cd to their user folder
              • To still do operations with relative paths
            • Symbolic links will point to files with an absolute path
            • A user will now read a link that will point immediately to the file (e.g /user/bob/shared.txt)
        • Checksums
          • Implemented checksums for Reva through xattrs
          • They are calculated only when GetMD is run
            • This is the method that provides info for a file, used by propfind for etag etc
          • Will use for now md5 hashes
            • We can check later if we switch to sha256 or something else
          • Added a timestamp for the checksum that is equal to the file's mtime
          • If "file mtime != checksum mtime", we recalculate the checksum
          • For a direct mounts we should note the users that the xattr for mtime may not be up to date.
      • Disaster Recovery 5m
        Speaker: Arthur Outhenin-Chalandre (CERN)
    • 2:45 PM 2:50 PM
      AOB 5m
      • Enrico
        • Absent this Friday (18/06)
        • I have started compiling some slides (codimd) for potential Ceph@ASDF (volume retyping + replication). I will share them with you this week.