Indico celebrates its 20th anniversary! Check our blog post for more information!

EOS DevOps Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Jan Iven (CERN)
Description
Weekly meeting to discuss progress on EOS rollout

● production instances

EOSATLAS

Crash end of last week, headnode running out of memory + a user doing nasty things.

now back to "reasonable" mem usage.

EOSCMS

+2PB usable to be added:

  • by harvesting some nodes from EOSALICE (to be checked with Roberto)
  • EOSLHCB and EOSATLAS also have some surplus capacity

Timeframe: urgent (confirmed by Bernd)


● CERNBOX and EOSUSER

Doing MD5 scan (for sec team).

Today backup caused trouble on EOSUSER (NO_CONTACT, call from operator)


● FUSE and client versions

puppet-eosclient support for eosxd going into production tomorrow.

(puppet stdlib has been rolled back - except for the things this module needs).


● Citrine rollout

 

EOSALICE+EOSPUBLIC

Very high activity on both instances, leading to crash because of known bug and FD exhaustion.

Required to use XRootD 4.8.0-rc1 (now on -rc2) that increases the 32k limit to 64.
This helped quite a bit.

In the meantime the origin of the sudden load increase on EOSPUBLIC has been identified and mitigated (thanks to ALICE computing team, was a fallback location).

Unfortunately, there's a regression in XRootD 4.8 that prevents headnodes from talking to each other (but failover still works just fine). Confirmed auth issue, under investigation.


Q: Can we somehow use the EOS test infrastructure to provide better testing to XRootd?

Q: what is strategy to go >> 64k filedescriptors? Being looked up by Xrootd team/Andy, these may internal to Xrootd things (with a fixed size memory structure - was signed short, should be easy to change). 


● SWAN

User reported an issue (cannot save notebooks) due to FST being full.

Will update to 4.2.4 on SWAN (client side change to deal with this, launch converter), and in parallel Luca cleaned up the affected disk.

A: EOSFUSEX should also get the same behaviour.

 


● nextgen FUSE

CLIENT

  • tagged 4.2.5 today
    • not useful to tag a version before visible issues are sorted out
    • took me 3 days to figure out the Rainer problem (building EOS rpm with eosxd)
      • three main issues:
        • removing MD records to early before they could be executed upstream (e.g. deletions) resulting in ENOTEMPTY messages
        • different behaviour of the kernel cache on EL7 resulting in 'no such file or directory' during compilation
        • recreation of identical name after deletion has to wait for asynchronous deletion
           
      • added the Rainer test to eos-fusex-certify script
        • tested on El7 and SLC6
           
      • inode 1 bug
        • when you mount 'your' home directory path, the top level directory of the local mount has inode 1
          • when the cache capability expires for the first time all directories in the top level mount became invisible for ever. Same happened when the first call on the top level directory was not 'ls'

eosxd bugs fixed 4.2.5
* [EOS-2146] - symlinks have to show the size of the target string
* [EOS-2147] - listxattr creates SEGV on OSX
* [EOS-2148] - eosxd on OSX creates empty file when copying with 'cp'
* [EOS-2159] - An owner of a directory has to get always chmod permissions
* [EOS-2161] - rm -rf on fusex mount fails to remove all files/subdirectories
* [EOS-2174] - Running out of FDs when using a user mount

 

SERVER

  • tagged 0.3.270 today (problem with the build system to be sorted by Joszef)
    • made server open atomic

 

AOB

- implementation of listing of large directories has to be changed on server side to keep NS locks only for 10k entries and then re-lock (avoid write starvation when Massimo lists 10M dirs inside a dir)

- when MGM is down when eosxd mounts, the XrdCl object never tries to re-establish the conneciton, although eosxd replays commands according to the local timeout configuration

- UAT and PPS should be updated to the tagged versions (-> Luca)

- Need to check the YUM repo for aquamarine releases (point to storage-ci, not dss-ci).

- can now limit access to domains ("cern.ch") à la AFS (BE software distribution).


● new Namespace

 

 

New-catalogue tests (Massimo)

EOSPPS, xroot:4.8.0-0.rc1,  fusex: 4.8.0-0.rc1

Client machines in Wigner (eospluswig701.cern.ch) to minimise the MGM-client latency

Xmas tree (ladder of directories 1000 level, 5000 directories):

  • This test uses FUSEX as well
  • Factor of 3 faster than old FUSE (both creation and rm -fr) with client and server in Meyrin
  • Comparable to AFS (client and server in Meyrin)):

#-bash-4.2$ ./ladder.py /eos/pps/users/laman
#ladder is going to create 5050 directories
#Dir creation: 5.528661 s (5050 dirs)    AFS erratic between 10 s and 23 s
#Dir removal: 15.508470 s (5050 dirs)    AFS erratic between 8 s and 26 s

Large directory

  • This test is based on eos commands (command line and via python binding)
    • One dir is created containing a lot of subdir on the same level (d/1, d/2, ... ,d/1000000)
  • mkdir runs at ~13kHz
  • In the past, bad behavior of eos fileinfo d (when d contains >>1M files)
    • 10^5 files
      • eos mkdir for the last dir takes 22 ms
      • eos ls takes 1.8 s
      • eos rm -r takes 3.5 s
      • eos fileinfo d directory: 17 ms
    • 1M files
      • eos mkdir for the last dir takes 30 ms
      • eos ls takes 2.4 s (1M directories; output truncated at 50000), See AOB
      • eos rm -r takes 12.7 s (second test 10.9)
      • eos fileinfo d directory: 17 ms
    • 10M files
      • eos mkdir for the last dir takes 36 ms
      • eos ls takes 4.8 s (1M directories; output truncated at 50000), See AOB
      • eos rm -r aborts (the dir is a unclear status: visible from fusex, not visible from els ls: (errc=2) (No such file or directory)
      • eos fileinfo d directory:  20 ms

Rate tests

Parallel mkdir from 4 (5) nodes. Single stream is ~1300 Hz, 4 streams is ~2700 Hz, 5 streams is ~ 2000 Hz.

AOB

Since (I understand) eos find runs server side, why do I get this?

-bash-4.2$ eos find --count /eos/pps/users/laman/largedir
nfiles=0 ndirectories=50001
warning: find results are limited for you to ndirs=50000 -  result is truncated!
 (errc=7) (Argument list too long)

(not an admin? need special powers for this)

 


● AOB

  • Kuba has created testcases, passed on Jozsef.
    • should test more the multi-client
  • mount /scratch on PLUS/AIADM "qa" machines (Dan to do merge request).

 

There are minutes attached to this event. Show them.
    • 16:00 16:05
      overall planning 5m
      Speaker: Jan Iven (CERN)
    • 16:05 16:30
      operations: production
      • 16:05
        production instances 5m
        Speaker: Herve Rousseau (CERN)

        EOSATLAS

        Crash end of last week, headnode running out of memory + a user doing nasty things.

        now back to "reasonable" mem usage.

        EOSCMS

        +2PB usable to be added:

        • by harvesting some nodes from EOSALICE (to be checked with Roberto)
        • EOSLHCB and EOSATLAS also have some surplus capacity

        Timeframe: urgent (confirmed by Bernd)

      • 16:10
        CERNBOX and EOSUSER 5m
        Speaker: Luca Mascetti (CERN)

        Doing MD5 scan (for sec team).

        Today backup caused trouble on EOSUSER (NO_CONTACT, call from operator)

      • 16:15
        FUSE and client versions 5m
        Speaker: Dan van der Ster (CERN)

        puppet-eosclient support for eosxd going into production tomorrow.

        (puppet stdlib has been rolled back - except for the things this module needs).

      • 16:20
        Citrine rollout 5m
        Speaker: Herve Rousseau (CERN)

         

        EOSALICE+EOSPUBLIC

        Very high activity on both instances, leading to crash because of known bug and FD exhaustion.

        Required to use XRootD 4.8.0-rc1 (now on -rc2) that increases the 32k limit to 64.
        This helped quite a bit.

        In the meantime the origin of the sudden load increase on EOSPUBLIC has been identified and mitigated (thanks to ALICE computing team, was a fallback location).

        Unfortunately, there's a regression in XRootD 4.8 that prevents headnodes from talking to each other (but failover still works just fine). Confirmed auth issue, under investigation.


        Q: Can we somehow use the EOS test infrastructure to provide better testing to XRootd?

        Q: what is strategy to go >> 64k filedescriptors? Being looked up by Xrootd team/Andy, these may internal to Xrootd things (with a fixed size memory structure - was signed short, should be easy to change). 

      • 16:25
        SWAN 5m
        Speaker: Jakub Moscicki (CERN)

        User reported an issue (cannot save notebooks) due to FST being full.

        Will update to 4.2.4 on SWAN (client side change to deal with this, launch converter), and in parallel Luca cleaned up the affected disk.

        A: EOSFUSEX should also get the same behaviour.

         

    • 16:30 16:50
      development: near-term
      • 16:30
        nextgen FUSE 5m
        Speaker: Andreas Joachim Peters (CERN)

        CLIENT

        • tagged 4.2.5 today
          • not useful to tag a version before visible issues are sorted out
          • took me 3 days to figure out the Rainer problem (building EOS rpm with eosxd)
            • three main issues:
              • removing MD records to early before they could be executed upstream (e.g. deletions) resulting in ENOTEMPTY messages
              • different behaviour of the kernel cache on EL7 resulting in 'no such file or directory' during compilation
              • recreation of identical name after deletion has to wait for asynchronous deletion
                 
            • added the Rainer test to eos-fusex-certify script
              • tested on El7 and SLC6
                 
            • inode 1 bug
              • when you mount 'your' home directory path, the top level directory of the local mount has inode 1
                • when the cache capability expires for the first time all directories in the top level mount became invisible for ever. Same happened when the first call on the top level directory was not 'ls'

        eosxd bugs fixed 4.2.5
        * [EOS-2146] - symlinks have to show the size of the target string
        * [EOS-2147] - listxattr creates SEGV on OSX
        * [EOS-2148] - eosxd on OSX creates empty file when copying with 'cp'
        * [EOS-2159] - An owner of a directory has to get always chmod permissions
        * [EOS-2161] - rm -rf on fusex mount fails to remove all files/subdirectories
        * [EOS-2174] - Running out of FDs when using a user mount

         

        SERVER

        • tagged 0.3.270 today (problem with the build system to be sorted by Joszef)
          • made server open atomic

         

        AOB

        - implementation of listing of large directories has to be changed on server side to keep NS locks only for 10k entries and then re-lock (avoid write starvation when Massimo lists 10M dirs inside a dir)

        - when MGM is down when eosxd mounts, the XrdCl object never tries to re-establish the conneciton, although eosxd replays commands according to the local timeout configuration

        - UAT and PPS should be updated to the tagged versions (-> Luca)

        - Need to check the YUM repo for aquamarine releases (point to storage-ci, not dss-ci).

        - can now limit access to domains ("cern.ch") à la AFS (BE software distribution).

      • 16:35
        new Namespace 5m
        Speaker: Elvin Alin Sindrilaru (CERN)

         

         

        New-catalogue tests (Massimo)

        EOSPPS, xroot:4.8.0-0.rc1,  fusex: 4.8.0-0.rc1

        Client machines in Wigner (eospluswig701.cern.ch) to minimise the MGM-client latency

        Xmas tree (ladder of directories 1000 level, 5000 directories):

        • This test uses FUSEX as well
        • Factor of 3 faster than old FUSE (both creation and rm -fr) with client and server in Meyrin
        • Comparable to AFS (client and server in Meyrin)):

        #-bash-4.2$ ./ladder.py /eos/pps/users/laman
        #ladder is going to create 5050 directories
        #Dir creation: 5.528661 s (5050 dirs)    AFS erratic between 10 s and 23 s
        #Dir removal: 15.508470 s (5050 dirs)    AFS erratic between 8 s and 26 s

        Large directory

        • This test is based on eos commands (command line and via python binding)
          • One dir is created containing a lot of subdir on the same level (d/1, d/2, ... ,d/1000000)
        • mkdir runs at ~13kHz
        • In the past, bad behavior of eos fileinfo d (when d contains >>1M files)
          • 10^5 files
            • eos mkdir for the last dir takes 22 ms
            • eos ls takes 1.8 s
            • eos rm -r takes 3.5 s
            • eos fileinfo d directory: 17 ms
          • 1M files
            • eos mkdir for the last dir takes 30 ms
            • eos ls takes 2.4 s (1M directories; output truncated at 50000), See AOB
            • eos rm -r takes 12.7 s (second test 10.9)
            • eos fileinfo d directory: 17 ms
          • 10M files
            • eos mkdir for the last dir takes 36 ms
            • eos ls takes 4.8 s (1M directories; output truncated at 50000), See AOB
            • eos rm -r aborts (the dir is a unclear status: visible from fusex, not visible from els ls: (errc=2) (No such file or directory)
            • eos fileinfo d directory:  20 ms

        Rate tests

        Parallel mkdir from 4 (5) nodes. Single stream is ~1300 Hz, 4 streams is ~2700 Hz, 5 streams is ~ 2000 Hz.

        AOB

        Since (I understand) eos find runs server side, why do I get this?

        -bash-4.2$ eos find --count /eos/pps/users/laman/largedir
        nfiles=0 ndirectories=50001
        warning: find results are limited for you to ndirs=50000 -  result is truncated!
         (errc=7) (Argument list too long)

        (not an admin? need special powers for this)

         

    • 16:50 17:45
      other: pilot services, long-term dev, external
      • 16:50
        Webservice 5m
        Speaker: Luca Mascetti (CERN)
      • 16:55
        Backup 5m
        Speaker: Luca Mascetti (CERN)
      • 17:00
        Samba 5m
        Speaker: Luca Mascetti (CERN)
      • 17:05
        $HOME structure 5m
        Speaker: Luca Mascetti (CERN)
      • 17:10
        BATCH integration 5m
        Speaker: Massimo Lamanna (CERN)
      • 17:15
        Xrootd 5m
        Speaker: Michal Kamil Simon (CERN)
      • 17:20
        AOB 5m
        • Kuba has created testcases, passed on Jozsef.
          • should test more the multi-client
        • mount /scratch on PLUS/AIADM "qa" machines (Dan to do merge request).