EOS DevOps Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Luca Mascetti (CERN)
Description

Weekly meeting to discuss progress on EOS rollout.

  • please keep content relevant to (most of) the audience, explain context
  • Last week: major issues, preferably with ticket
  • This week/planning: who, until when, needs what?

Add your input to the "contribution minutes" before the meeting. Else will be "AOB".

 

● EOS production instances (LHC, PUBLIC, USER)

(Luca)

  • ongoing capacity shuffling in eospublic/eosalice
  • need to schedule a ns compact on eosuser (files this time)
    • approaching memory limit -> need a clean restart
  • Homes instances
    • need to update for symlink issue via propfind
    • migration script hiccup for aborted files and symlinks
    • investigating German pop-up
    • put in place FST hourly logrotation

● EOS clients, FUSE(X)

(Luca)

 


● Development issues

(Georgios)

  • Last week, network interventions affected a QDB node in a strange way: It was able to establish TCP connections to other machines, but others could not do the same.
    • Even though the two unaffected nodes should have been able to form a quorum, this one node managed to disrupt the rest by repeatedly starting elections, since it was not receiving heartbeats.
    • QDB 0.3.2 contains a protection against such scenario.
  • The symlink resolution for relative symlinks in the new namespace had a bug - there was a hole in the tests. Path resolution in general has been improved and refactored, available in the next MGM release.

(Elvin)

  • EOSBACKUP
    • together with Cristi fixed the configuration (i.e some FSTs had headroom set to 0)
    • fixed issues with TPC transfers which fail in mixed IPV4/6 setups, observed also in EOSPUBLIC
  • Fixed quota bug not accounting touched filed - it exists also in beryl_aquamarine
  • Reviving the coverity builds and static analysis tools in the pipeline
  • Fixes to eosd bugs reported in Swan
There are minutes attached to this event. Show them.
    • 16:00 16:20
      EOS production instances (LHC, PUBLIC, USER) 20m
      • major events last week
      • planned work this week
      Speakers: Cristian Contescu (CERN), Herve Rousseau (CERN), Hugo Gonzalez Labrador (CERN), Luca Mascetti (CERN)

      (Luca)

      • ongoing capacity shuffling in eospublic/eosalice
      • need to schedule a ns compact on eosuser (files this time)
        • approaching memory limit -> need a clean restart
      • Homes instances
        • need to update for symlink issue via propfind
        • migration script hiccup for aborted files and symlinks
        • investigating German pop-up
        • put in place FST hourly logrotation
    • 16:20 16:25
      EOS clients, FUSE(X) 5m
      • (major) issues seen
      • Rollout of new versions and FUSEX
      Speakers: Dan van der Ster (CERN), Jan Iven (CERN)

      (Luca)

       

    • 16:25 16:35
      Development issues 10m
      • New namespace
      • Testing
      • Xrootd
      Speakers: Andreas Joachim Peters (CERN), Elvin Alin Sindrilaru (CERN), Georgios Bitzes (CERN), Jozsef Makai (CERN), Michal Kamil Simon (CERN)

      (Georgios)

      • Last week, network interventions affected a QDB node in a strange way: It was able to establish TCP connections to other machines, but others could not do the same.
        • Even though the two unaffected nodes should have been able to form a quorum, this one node managed to disrupt the rest by repeatedly starting elections, since it was not receiving heartbeats.
        • QDB 0.3.2 contains a protection against such scenario.
      • The symlink resolution for relative symlinks in the new namespace had a bug - there was a hole in the tests. Path resolution in general has been improved and refactored, available in the next MGM release.

      (Elvin)

      • EOSBACKUP
        • together with Cristi fixed the configuration (i.e some FSTs had headroom set to 0)
        • fixed issues with TPC transfers which fail in mixed IPV4/6 setups, observed also in EOSPUBLIC
      • Fixed quota bug not accounting touched filed - it exists also in beryl_aquamarine
      • Reviving the coverity builds and static analysis tools in the pipeline
      • Fixes to eosd bugs reported in Swan
    • 16:35 16:50
      AOB 15m