EOS DevOps Meeting

Name: EOS DevOps Meeting
Start: 2018-05-15T16:00:00+02:00
End: 2018-05-15T17:50:00+02:00
Location: CERN

Tuesday 15 May 2018, 16:00 → 17:50 Europe/Zurich

513/R-068 (CERN)

513/R-068

CERN

Show room on map

Jan Iven (CERN)

Description

Weekly meeting to discuss progress on EOS rollout.

please keep content relevant to (most of) the audience, explain context
Last week: major issues, preferably with ticket
This week/planning: who, until when, needs what?

Add your input to the "contribution minutes" before the meeting. Else will be "AOB".

Hide

● EOS production instances (LHC, PUBLIC, USER)

LHC instances

ATLAS
- Moved lxfs* boxes to read-only while investigating a possible fix (network interface issue, to be checked)
LHCb and ALICE
- Updated headnodes to 4.2.22
- Slow update of FSTs running...
Finished migrating HTTP and GridFTP servers to CentOS7

EOSPUBLIC: updated to 4.2.22 (this morning)

EOSUSER: some FSTs had full /var, triggered corruption. Since we've removed write cache, some other code path got used. Triggered some loop, many messages in client /var/log -> full client filesystem (BenJones). Note: FST automatically goes into readonly, but nevertheless sqldata base got corrupted.

● EOS clients, FUSE(X)

(Dan) 4.2.22 (el6 + el7) and xrootd 4.8.3 tagged into production koji repos today. Reminder:

http://linuxsoft.cern.ch/internal/repos/eos6-stable/x86_64/os/Packages/
http://linuxsoft.cern.ch/internal/repos/eos7-stable/x86_64/os/Packages/

Luca reported a fresh crash seen by CMS (on their machine) - no coredump (but might have been the previous version?).

Massimo would like /eos/pps to be mounted everywhere. Luca: careful, has all stored logs, and might have unsafe permissions? Perhaps mount some sub-folder (/eos/pps/users) only? Dan and Massimo will check. Could perhaps just user mount instead? Will check

(Andreas)

no update on eosxd development (many pending feature branches see last meeting)
- q: who decides what gets merged? admin of the imstance where the feature is required.

(Jan):

cross-check - no blocking issues in FUSEX that would prevent wider testing (AFS-phaseout)?
what are the steps required to get FUSEX enabled on other instances (server-side, open port)?
- Luca: want scale test - more than current Massimo test (e.g using user mounts, at least 1k concurrent mounts).
- Cristi: need to test against a "citrine" instance (eoslegacy?)

● Development issues

(Georgios)

Request: We need more space for PPS, we're at 97% capacity on eospps-ns* (already with compression).
- Could we add a second SSD, so that we can test rocksdb with multiple data directories?
  - Luca: won't happen - complicated workflow to add HW.
  - Dan: servers would already have 2 SSDs, perhaps put OS on spinning disk.
  - Elvin: would like to verify that RocksDB can actually split over several devices
  - Massimo: take offline, will try to get test machine
- Could we move away from VMs for the NS machines? Virtualization severely limits amount of IOPS we can get from SSDs. (Only a couple of thousand, real SSD can do ~100k, real disk ~300 - all seen both with QuarkDB and "fio" synthetic test)
  - Luca: please benchmark the machine he prepared
  - Dan: (->offline) please talk to Arne, he might be investigating such limits.
Two weeks ago, PPS stopped working upon reaching 2^32 files. There was a type confusion (id_t) in the code, making file and container IDs which should normally be 64 bits, get truncated to 32 bits.
- Safety checks detected there was something wrong, and the NS was refusing to boot.
- Elvin: also would affect aquamarine. IDs get increased, never re-used - will this affect EOSUSER? Only if somebody creates+deletes files in a loop.
- Jan: could this affect FUSE clients? yes in principle, but Georgios did check, nothing found
Experimenting with optimizing how metadata is laid out on disk, to reduce the high amount of IOPS incurred when listing a directory.
- So that, file metadata within the same directory are physically colocated on disk.
- Listings would thus benefit greatly from kernel page cache, and rocksdb block cache.
- Jan: worried about intrusive changes close to production roll-out? Would probably be last major change (and would need a conversion campaign on EOSPPS, prod instances will have this from the start).

(Andreas)

Changed the behaviour of atomic uploads to avoid any file loss scenario by overlapping/replaid open/uploads EOS-2571
Fixed looping bug in FST (MgmSyncer.cc) using a bogus mtime making a
thread running in a tight loop
( filling up /var parition in EOSUSER/GENOME )
fix in CITRINE & AQUAMARINE branches

(Massimo)

Memleaks in (old) namespace, linked to file creation - seems to also affect new NS (QuarkDB starts paging. Other issues: every hour, on the hour, see mem increase - log truncation? Last issue is runaway mem consumptiom

(Jan): new NS roll-out - status (EOSHOME has 1 filesystem)?

have Foreman hostgroup, puppet config (MGM and QuarkDB on same box), have redirector, have NS for first instance (but will wait for new QuarkDB on-disk layout)

There are minutes attached to this event. Show them.

- 16:00 → 16:20
  EOS production instances (LHC, PUBLIC, USER) 20m
  - major events last week
  - planned work this week
  Speakers: Cristian Contescu (CERN), Herve Rousseau (CERN), Hugo Gonzalez Labrador (CERN), Luca Mascetti (CERN)
  LHC instances
  
  ATLAS
  
  Moved lxfs* boxes to read-only while investigating a possible fix (network interface issue, to be checked)
  
  LHCb and ALICE
  
  Updated headnodes to 4.2.22
  
  Slow update of FSTs running...
  
  Finished migrating HTTP and GridFTP servers to CentOS7
  
  EOSPUBLIC: updated to 4.2.22 (this morning)
  
  EOSUSER: some FSTs had full /var, triggered corruption. Since we've removed write cache, some other code path got used. Triggered some loop, many messages in client /var/log -> full client filesystem (BenJones). Note: FST automatically goes into readonly, but nevertheless sqldata base got corrupted.
- 16:20 → 16:25
  EOS clients, FUSE(X) 5m
  - (major) issues seen
  - Rollout of new versions and FUSEX
  Speakers: Dan van der Ster (CERN), Jan Iven (CERN)
  (Dan) 4.2.22 (el6 + el7) and xrootd 4.8.3 tagged into production koji repos today. Reminder:
  
  http://linuxsoft.cern.ch/internal/repos/eos6-stable/x86_64/os/Packages/
  
  http://linuxsoft.cern.ch/internal/repos/eos7-stable/x86_64/os/Packages/
  
  Luca reported a fresh crash seen by CMS (on their machine) - no coredump (but might have been the previous version?).
  
  Massimo would like /eos/pps to be mounted everywhere. Luca: careful, has all stored logs, and might have unsafe permissions? Perhaps mount some sub-folder (/eos/pps/users) only? Dan and Massimo will check. Could perhaps just user mount instead? Will check
  
  (Andreas)
  
  no update on eosxd development (many pending feature branches see last meeting)
  
  q: who decides what gets merged? admin of the imstance where the feature is required.
  
  (Jan):
  
  cross-check - no blocking issues in FUSEX that would prevent wider testing (AFS-phaseout)?
  
  what are the steps required to get FUSEX enabled on other instances (server-side, open port)?
  
  Luca: want scale test - more than current Massimo test (e.g using user mounts, at least 1k concurrent mounts).
  
  Cristi: need to test against a "citrine" instance (eoslegacy?)
- 16:25 → 16:35
  Development issues 10m
  - New namespace
  - Testing
  - Xrootd
  Speakers: Andreas Joachim Peters (CERN), Elvin Alin Sindrilaru (CERN), Georgios Bitzes (CERN), Jozsef Makai (CERN), Michal Kamil Simon (CERN)
  (Georgios)
  
  Request: We need more space for PPS, we're at 97% capacity on eospps-ns* (already with compression).
  
  Could we add a second SSD, so that we can test rocksdb with multiple data directories?
  
  Luca: won't happen - complicated workflow to add HW.
  
  Dan: servers would already have 2 SSDs, perhaps put OS on spinning disk.
  
  Elvin: would like to verify that RocksDB can actually split over several devices
  
  Massimo: take offline, will try to get test machine
  
  Could we move away from VMs for the NS machines? Virtualization severely limits amount of IOPS we can get from SSDs. (Only a couple of thousand, real SSD can do ~100k, real disk ~300 - all seen both with QuarkDB and "fio" synthetic test)
  
  Luca: please benchmark the machine he prepared
  
  Dan: (->offline) please talk to Arne, he might be investigating such limits.
  
  Two weeks ago, PPS stopped working upon reaching 2^32 files. There was a type confusion (id_t) in the code, making file and container IDs which should normally be 64 bits, get truncated to 32 bits.
  
  Safety checks detected there was something wrong, and the NS was refusing to boot.
  
  Elvin: also would affect aquamarine. IDs get increased, never re-used - will this affect EOSUSER? Only if somebody creates+deletes files in a loop.
  
  Jan: could this affect FUSE clients? yes in principle, but Georgios did check, nothing found
  
  Experimenting with optimizing how metadata is laid out on disk, to reduce the high amount of IOPS incurred when listing a directory.
  
  So that, file metadata within the same directory are physically colocated on disk.
  
  Listings would thus benefit greatly from kernel page cache, and rocksdb block cache.
  
  Jan: worried about intrusive changes close to production roll-out? Would probably be last major change (and would need a conversion campaign on EOSPPS, prod instances will have this from the start).
  
  (Andreas)
  
  Changed the behaviour of atomic uploads to avoid any file loss scenario by overlapping/replaid open/uploads EOS-2571
  
  Fixed looping bug in FST (MgmSyncer.cc) using a bogus mtime making a
  thread running in a tight loop
  ( filling up /var parition in EOSUSER/GENOME )
  fix in CITRINE & AQUAMARINE branches
  
  (Massimo)
  
  Memleaks in (old) namespace, linked to file creation - seems to also affect new NS (QuarkDB starts paging. Other issues: every hour, on the hour, see mem increase - log truncation? Last issue is runaway mem consumptiom
  
  (Jan): new NS roll-out - status (EOSHOME has 1 filesystem)?
  
  have Foreman hostgroup, puppet config (MGM and QuarkDB on same box), have redirector, have NS for first instance (but will wait for new QuarkDB on-disk layout)
- 16:35 → 16:50
  
  AOB 15m

Choose timezone

EOS DevOps Meeting

513/R-068

CERN

● EOS production instances (LHC, PUBLIC, USER)

LHC instances

● EOS clients, FUSE(X)

● Development issues

LHC instances