EOS DevOps Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Luca Mascetti (CERN)
Description

Weekly meeting to discuss progress on EOS rollout.

  • please keep content relevant to (most of) the audience, explain context
  • Last week: major issues, preferably with ticket
  • This week/planning: who, until when, needs what?

Add your input to the "contribution minutes" before the meeting. Else will be "AOB".

 

● EOS production instances (LHC, PUBLIC, USER)

- need to compact eosuser

- update pps to latest release

- backup to be updated - balancing ongoing - crash on fsts

- 1tb machine ca be given back: mail to gavin

- media and uat need intervention for ipmi issues (powercycle) - scheduled in 3weeks

- 4 machines to backup and 5 to alicedaq (now in castorspare)


● EOS clients, FUSE(X)

- 4.3.6 ready need to be tagged in koji and update CRM ticket

- batch test queue has the correct config (to be verified)


● Development issues

(Andreas)

  • fixed up 'rm -r' console command to disable automatically '-r' if a file is given
    • still needs a distinction to understand * as a match or file character
  • fixed Fileinfo::DirJSON function for AARNET to use fine grained ns locks (their queris lasts seconds and during this queries all FUSE clients cannot write) will build 4.2.29 for AARNET
  • with help of Elvin google GRPC rpms now built on all 4 EOS platforms
  • implemented Ping/Fileinfo via GRPC, now adding SSL and client identification
  • changing locking in 'find' which takes one ns lock per directory which is 'not good enough' for long listings within a single directory. Same strategy could be considered for opendir, propfind, fuse listing etc ...

(Georgios)

  • Implemented quota recomputation command for QDB namespace. To try it: "eos ns recompute_quotanode /path/to/quota/node/ ". Let me know of any bugs.

● EOShome migration

Massimo

  • Procedure to migrate (prototype) being tested
    • Need more volunteers
  • Migration continuing (close to completion modulo the home01 which had to be scratched ie: we should be close to 80%)
    • Migration script (Eddie) much improved on corner cases (parsing difficult path)
    • Some server side improvemts also in place (some probl in directory names)
    • Bottom line: the script is sufficiently lean to print problems in simple way
  • Batch (batchtest) got the new fusex for home
    • I had no time to test it :(

 


● AOB

Massimo

  • Data migr ahead of schedule ~ 1week
  • Today/tomorrow we are migrating ST
  • Next week IT (volunteers)
  • Next week also new users?
    • Originally foreseen Aug 27. Try to anticipate it as much as possible
  • Backup
    • Return 1TB head node (asap)
  • Next stops
    • Aug27: ITMM discussing storage
    • Sep5: possible second "review"
There are minutes attached to this event. Show them.
    • 16:00 16:20
      EOS production instances (LHC, PUBLIC, USER) 20m
      • major events last week
      • planned work this week
      Speakers: Cristian Contescu (CERN), Herve Rousseau (CERN), Hugo Gonzalez Labrador (CERN), Luca Mascetti (CERN)

      - need to compact eosuser

      - update pps to latest release

      - backup to be updated - balancing ongoing - crash on fsts

      - 1tb machine ca be given back: mail to gavin

      - media and uat need intervention for ipmi issues (powercycle) - scheduled in 3weeks

      - 4 machines to backup and 5 to alicedaq (now in castorspare)

    • 16:20 16:25
      EOS clients, FUSE(X) 5m
      • (major) issues seen
      • Rollout of new versions and FUSEX
      Speakers: Dan van der Ster (CERN), Jan Iven (CERN)

      - 4.3.6 ready need to be tagged in koji and update CRM ticket

      - batch test queue has the correct config (to be verified)

    • 16:25 16:35
      Development issues 10m
      • New namespace
      • Testing
      • Xrootd
      Speakers: Andreas Joachim Peters (CERN), Elvin Alin Sindrilaru (CERN), Georgios Bitzes (CERN), Jozsef Makai (CERN), Michal Kamil Simon (CERN)

      (Andreas)

      • fixed up 'rm -r' console command to disable automatically '-r' if a file is given
        • still needs a distinction to understand * as a match or file character
      • fixed Fileinfo::DirJSON function for AARNET to use fine grained ns locks (their queris lasts seconds and during this queries all FUSE clients cannot write) will build 4.2.29 for AARNET
      • with help of Elvin google GRPC rpms now built on all 4 EOS platforms
      • implemented Ping/Fileinfo via GRPC, now adding SSL and client identification
      • changing locking in 'find' which takes one ns lock per directory which is 'not good enough' for long listings within a single directory. Same strategy could be considered for opendir, propfind, fuse listing etc ...

      (Georgios)

      • Implemented quota recomputation command for QDB namespace. To try it: "eos ns recompute_quotanode /path/to/quota/node/ ". Let me know of any bugs.
    • 16:40 17:00
      EOShome migration 20m

      Massimo

      • Procedure to migrate (prototype) being tested
        • Need more volunteers
      • Migration continuing (close to completion modulo the home01 which had to be scratched ie: we should be close to 80%)
        • Migration script (Eddie) much improved on corner cases (parsing difficult path)
        • Some server side improvemts also in place (some probl in directory names)
        • Bottom line: the script is sufficiently lean to print problems in simple way
      • Batch (batchtest) got the new fusex for home
        • I had no time to test it :(

       

    • 17:00 17:15
      AOB 15m

      Massimo

      • Data migr ahead of schedule ~ 1week
      • Today/tomorrow we are migrating ST
      • Next week IT (volunteers)
      • Next week also new users?
        • Originally foreseen Aug 27. Try to anticipate it as much as possible
      • Backup
        • Return 1TB head node (asap)
      • Next stops
        • Aug27: ITMM discussing storage
        • Sep5: possible second "review"