Ceph/CVMFS/Filer Service Meeting

600/R-001 (CERN)



    • 14:00 14:05
      CVMFS 5m
      Speaker: Enrico Bocchi (CERN)
      • The majority of RMs (silently) updated to 2.6.0
      • compass.cern.ch moved to CC7
      • atlas-online-nightlies created (CC7 + S3, no gateway)
      • Enabled GC for na61.cern.ch (required bugfix from developers)
      • GC ran manually for atlas-nightlies on S3
        • new APIs will be provided to run GC from RMs
        • ETA: ~June
      • Still see some 502s when pushing to S3


      • Lemon metrics and alarms: Opt-in by May 6th
      • Deploy config change to lxplus//lxbatch to use the dedicated atlas proxies
      • Mail from Ben: LHCb jobs with "low efficiency" get killed in Wigner


      • Waiting for the procedure to migrate repos to S3
      • Waiting for clarification about GC_TIMESPAN, GC_LAP, AUTOTAG, etc...
    • 14:05 14:10
      Ceph Upstream News 5m

      Releases, Tickets, Testing, Board, ...

      Speaker: Dan van der Ster (CERN)
      • CEPH-705: built v12.2.12-0.1, which includes the fixes for awsv4 authentication (pre-signed urls with / or %2d, etc..)
        • This required a python3 hack, because epel changed their py3 from py34 to py36. Upstream already fixed their mimic builds, so i had to backport the fixes to luminous.
    • 14:10 14:15
      Ceph Backends & Block Storage 5m

      Cluster upgrades, capacity changes, rebalancing, ...
      News from OpenStack block storage.

      Speaker: Theofilos Mouratidis (National and Kapodistrian University of Athens (GR))


      • ceph/gabe: all data moved to the bluestore machines, and reformatted 3 racks and added them to the cluster
      • ceph/flax: received 2 racks, formatted to bluestore, waiting for pg_split to move into the default root
    • 14:15 14:20
      Ceph Disk Management 5m

      OSD Replacements, Liaison with CF, Failure Predictions

      Speaker: Julien Collet (CERN)


      • Meeting with repair team on May 8th on Ceph Disk replacements
      • Scripts improvements (input protection/error messages/cluster details...)
    • 14:20 14:25
      S3 5m

      Ops, Use-cases (backup, DB), ...

      Speakers: Julien Collet (CERN) , Roberto Valverde Cameselle (Universidad de Oviedo (ES))


      • S3 account migration/removal campaign has started, will send reminder this week.
    • 14:25 14:30
      CephFS/Manila/FILER 5m

      Filer Migration, CephFS/Manila status and plans.

      Speaker: Dan van der Ster (CERN)


      • Incident INC1973961: itnfs30 was not allowing OpenShift to mount anything. Loadavg was normal and zfs looked fine. Tons of processes of 'osadmin'. ~1000 zfs filesystems, which is abnormal. Alex Lossent found that a process on their side had created 800 volumes in a loop. He manually removed them, I restarted nfs-server, and things recovered.


      • User issue from HPC:

      I have, however, observed that the modification time of some data files remains unchanged when small amounts of data are appended to them. The modification time reported to the user stays the same even after days of running.

      In the cases where this occured, the simulation program opened the data file with fopen() in "ab" mode, writes a few numbers and closes the file with fclose(). One can see that the file grows and it is possible to check that the data are correctly stored.

      The same does not happen when larger amounts of data (a few KB at least) are appended. And it did not happen before the software update. There is no indication of data losses or data corruption.

    • 14:30 14:35
      HPC 5m

      Performance testing, HPC storage status and plans

      Speaker: Pablo Llopis Sanmillan (CERN)
    • 14:35 14:40
      HyperConverged 5m
      Speakers: Jose Castro Leon (CERN) , Julien Collet (CERN) , Roberto Valverde Cameselle (Universidad de Oviedo (ES))
    • 14:40 14:45
      Monitoring 5m


      • Prophetstore got in touch,
      • Report on our experience with the tool in progress
    • 14:45 14:50
      AOB 5m