- Compact style
- Indico style
- Indico style - inline minutes
- Indico style - numbered
- Indico style - numbered + minutes
- Indico Weeks View
Crash end of last week, headnode running out of memory + a user doing nasty things.
now back to "reasonable" mem usage.
+2PB usable to be added:
Timeframe: urgent (confirmed by Bernd)
Doing MD5 scan (for sec team).
Today backup caused trouble on EOSUSER (NO_CONTACT, call from operator)
puppet-eosclient support for eosxd going into production tomorrow.
(puppet stdlib has been rolled back - except for the things this module needs).
Very high activity on both instances, leading to crash because of known bug and FD exhaustion.
Required to use XRootD 4.8.0-rc1 (now on -rc2) that increases the 32k limit to 64.
This helped quite a bit.
In the meantime the origin of the sudden load increase on EOSPUBLIC has been identified and mitigated (thanks to ALICE computing team, was a fallback location).
Unfortunately, there's a regression in XRootD 4.8 that prevents headnodes from talking to each other (but failover still works just fine). Confirmed auth issue, under investigation.
Q: Can we somehow use the EOS test infrastructure to provide better testing to XRootd?
Q: what is strategy to go >> 64k filedescriptors? Being looked up by Xrootd team/Andy, these may internal to Xrootd things (with a fixed size memory structure - was signed short, should be easy to change).
User reported an issue (cannot save notebooks) due to FST being full.
Will update to 4.2.4 on SWAN (client side change to deal with this, launch converter), and in parallel Luca cleaned up the affected disk.
A: EOSFUSEX should also get the same behaviour.
CLIENT
eosxd bugs fixed 4.2.5
* [EOS-2146] - symlinks have to show the size of the target string
* [EOS-2147] - listxattr creates SEGV on OSX
* [EOS-2148] - eosxd on OSX creates empty file when copying with 'cp'
* [EOS-2159] - An owner of a directory has to get always chmod permissions
* [EOS-2161] - rm -rf on fusex mount fails to remove all files/subdirectories
* [EOS-2174] - Running out of FDs when using a user mount
SERVER
AOB
- implementation of listing of large directories has to be changed on server side to keep NS locks only for 10k entries and then re-lock (avoid write starvation when Massimo lists 10M dirs inside a dir)
- when MGM is down when eosxd mounts, the XrdCl object never tries to re-establish the conneciton, although eosxd replays commands according to the local timeout configuration
- UAT and PPS should be updated to the tagged versions (-> Luca)
- Need to check the YUM repo for aquamarine releases (point to storage-ci, not dss-ci).
- can now limit access to domains ("cern.ch") à la AFS (BE software distribution).
New-catalogue tests (Massimo)
EOSPPS, xroot:4.8.0-0.rc1, fusex: 4.8.0-0.rc1
Client machines in Wigner (eospluswig701.cern.ch) to minimise the MGM-client latency
Xmas tree (ladder of directories 1000 level, 5000 directories):
#-bash-4.2$ ./ladder.py /eos/pps/users/laman
#ladder is going to create 5050 directories
#Dir creation: 5.528661 s (5050 dirs) AFS erratic between 10 s and 23 s
#Dir removal: 15.508470 s (5050 dirs) AFS erratic between 8 s and 26 s
Large directory
Rate tests
Parallel mkdir from 4 (5) nodes. Single stream is ~1300 Hz, 4 streams is ~2700 Hz, 5 streams is ~ 2000 Hz.
AOB
Since (I understand) eos find runs server side, why do I get this?
-bash-4.2$ eos find --count /eos/pps/users/laman/largedir
nfiles=0 ndirectories=50001
warning: find results are limited for you to ndirs=50000 - result is truncated!
(errc=7) (Argument list too long)
(not an admin? need special powers for this)