EOS DevOps Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Jan Iven (CERN)
Description

Weekly meeting to discuss progress on EOS rollout.

  • please keep content relevant to (most of) the audience, explain context
  • Last week: major issues, preferably with ticket
  • This week/planning: who, until when, needs what?

Add your input to the "contribution minutes" before the meeting. Else will be "AOB".

 

● EOS production instances (LHC, PUBLIC, USER)

LHC instances

  • ATLAS
    • Moved lxfs* boxes to read-only while investigating a possible fix (network interface issue, to be checked)
  • LHCb and ALICE
    • Updated headnodes to 4.2.22
    • Slow update of FSTs running...
  • Finished migrating HTTP and GridFTP servers to CentOS7

EOSPUBLIC: updated to 4.2.22 (this morning)

EOSUSER: some FSTs had full /var, triggered corruption. Since we've removed write cache, some other code path got used. Triggered some loop, many messages in client /var/log -> full client filesystem (BenJones). Note: FST automatically goes into readonly, but nevertheless sqldata base got corrupted.

  

 

 


● EOS clients, FUSE(X)

(Dan) 4.2.22 (el6 + el7) and xrootd 4.8.3 tagged into production koji repos today. Reminder:

  • http://linuxsoft.cern.ch/internal/repos/eos6-stable/x86_64/os/Packages/
  • http://linuxsoft.cern.ch/internal/repos/eos7-stable/x86_64/os/Packages/

Luca reported a fresh crash seen by CMS (on their machine) - no coredump (but might have been the previous version?).

Massimo would like /eos/pps to be mounted everywhere. Luca: careful, has all stored logs, and might have unsafe permissions? Perhaps mount some sub-folder (/eos/pps/users) only? Dan and Massimo will check. Could perhaps just user mount instead? Will check

 


(Andreas)

  • no update on eosxd development (many pending feature branches see last meeting)
    • q: who decides what gets merged? admin of the imstance where the feature is required. 

(Jan):

  • cross-check - no blocking issues in FUSEX that would prevent wider testing (AFS-phaseout)?
  • what are the steps required to get FUSEX enabled on other instances (server-side, open port)?
    • Luca: want scale test - more than current Massimo test (e.g using user mounts, at least 1k concurrent mounts).
    • Cristi: need to test against a "citrine" instance (eoslegacy?) 

 


● Development issues

(Georgios)

  • Request: We need more space for PPS, we're at 97% capacity on eospps-ns* (already with compression).
    • Could we add a second SSD, so that we can test rocksdb with multiple data directories?
      • Luca: won't happen - complicated workflow to add HW.
      • Dan: servers would already have 2 SSDs, perhaps put OS on spinning disk.
      • Elvin: would like to verify that RocksDB can actually split over several devices
      • Massimo: take offline, will try to get test machine
    • Could we move away from VMs for the NS machines? Virtualization severely limits amount of IOPS we can get from SSDs. (Only a couple of thousand, real SSD can do ~100k, real disk ~300 - all seen both with QuarkDB and "fio" synthetic test)
      • Luca: please benchmark the machine he prepared
      • Dan: (->offline) please talk to Arne, he might be investigating such limits.
  • Two weeks ago, PPS stopped working upon reaching 2^32 files. There was a type confusion (id_t) in the code, making file and container IDs which should normally be 64 bits, get truncated to 32 bits.
    • Safety checks detected there was something wrong, and the NS was refusing to boot.
    • Elvin: also would affect aquamarine. IDs get increased, never re-used - will this affect EOSUSER? Only if somebody creates+deletes files in a loop.
    • Jan: could this affect FUSE clients? yes in principle, but Georgios did check, nothing found
  • Experimenting with optimizing how metadata is laid out on disk, to reduce the high amount of IOPS incurred when listing a directory.
    • So that, file metadata within the same directory are physically colocated on disk.
    • Listings would thus benefit greatly from kernel page cache, and rocksdb block cache.
    • Jan: worried about intrusive changes close to production roll-out? Would probably be last major change (and would need a conversion campaign on EOSPPS, prod instances will have this from the start).

(Andreas)

  • Changed the behaviour of atomic uploads to avoid any file loss scenario by overlapping/replaid open/uploads EOS-2571
     
  • Fixed looping bug in FST (MgmSyncer.cc) using a bogus mtime making a
    thread running in a tight loop
    ( filling up /var parition in EOSUSER/GENOME )
    fix in CITRINE & AQUAMARINE branches

(Massimo)

  • Memleaks in (old) namespace, linked to file creation - seems to also affect new NS (QuarkDB starts paging. Other issues: every hour, on the hour, see mem increase - log truncation? Last issue is runaway mem consumptiom

(Jan): new NS roll-out - status (EOSHOME has 1 filesystem)?

  • have Foreman hostgroup, puppet config (MGM and QuarkDB on same box), have redirector, have NS for first instance (but will wait for new QuarkDB on-disk layout)

 

There are minutes attached to this event. Show them.
    • 1
      EOS production instances (LHC, PUBLIC, USER)
      • major events last week
      • planned work this week
      Speakers: Cristian Contescu (CERN), Herve Rousseau (CERN), Hugo Gonzalez Labrador (CERN), Luca Mascetti (CERN)

      LHC instances

      • ATLAS
        • Moved lxfs* boxes to read-only while investigating a possible fix (network interface issue, to be checked)
      • LHCb and ALICE
        • Updated headnodes to 4.2.22
        • Slow update of FSTs running...
      • Finished migrating HTTP and GridFTP servers to CentOS7

      EOSPUBLIC: updated to 4.2.22 (this morning)

      EOSUSER: some FSTs had full /var, triggered corruption. Since we've removed write cache, some other code path got used. Triggered some loop, many messages in client /var/log -> full client filesystem (BenJones). Note: FST automatically goes into readonly, but nevertheless sqldata base got corrupted.

        

       

       

    • 2
      EOS clients, FUSE(X)
      • (major) issues seen
      • Rollout of new versions and FUSEX
      Speakers: Dan van der Ster (CERN), Jan Iven (CERN)

      (Dan) 4.2.22 (el6 + el7) and xrootd 4.8.3 tagged into production koji repos today. Reminder:

      • http://linuxsoft.cern.ch/internal/repos/eos6-stable/x86_64/os/Packages/
      • http://linuxsoft.cern.ch/internal/repos/eos7-stable/x86_64/os/Packages/

      Luca reported a fresh crash seen by CMS (on their machine) - no coredump (but might have been the previous version?).

      Massimo would like /eos/pps to be mounted everywhere. Luca: careful, has all stored logs, and might have unsafe permissions? Perhaps mount some sub-folder (/eos/pps/users) only? Dan and Massimo will check. Could perhaps just user mount instead? Will check

       


      (Andreas)

      • no update on eosxd development (many pending feature branches see last meeting)
        • q: who decides what gets merged? admin of the imstance where the feature is required. 

      (Jan):

      • cross-check - no blocking issues in FUSEX that would prevent wider testing (AFS-phaseout)?
      • what are the steps required to get FUSEX enabled on other instances (server-side, open port)?
        • Luca: want scale test - more than current Massimo test (e.g using user mounts, at least 1k concurrent mounts).
        • Cristi: need to test against a "citrine" instance (eoslegacy?) 

       

    • 3
      Development issues
      • New namespace
      • Testing
      • Xrootd
      Speakers: Andreas Joachim Peters (CERN), Elvin Alin Sindrilaru (CERN), Georgios Bitzes (CERN), Jozsef Makai (CERN), Michal Kamil Simon (CERN)

      (Georgios)

      • Request: We need more space for PPS, we're at 97% capacity on eospps-ns* (already with compression).
        • Could we add a second SSD, so that we can test rocksdb with multiple data directories?
          • Luca: won't happen - complicated workflow to add HW.
          • Dan: servers would already have 2 SSDs, perhaps put OS on spinning disk.
          • Elvin: would like to verify that RocksDB can actually split over several devices
          • Massimo: take offline, will try to get test machine
        • Could we move away from VMs for the NS machines? Virtualization severely limits amount of IOPS we can get from SSDs. (Only a couple of thousand, real SSD can do ~100k, real disk ~300 - all seen both with QuarkDB and "fio" synthetic test)
          • Luca: please benchmark the machine he prepared
          • Dan: (->offline) please talk to Arne, he might be investigating such limits.
      • Two weeks ago, PPS stopped working upon reaching 2^32 files. There was a type confusion (id_t) in the code, making file and container IDs which should normally be 64 bits, get truncated to 32 bits.
        • Safety checks detected there was something wrong, and the NS was refusing to boot.
        • Elvin: also would affect aquamarine. IDs get increased, never re-used - will this affect EOSUSER? Only if somebody creates+deletes files in a loop.
        • Jan: could this affect FUSE clients? yes in principle, but Georgios did check, nothing found
      • Experimenting with optimizing how metadata is laid out on disk, to reduce the high amount of IOPS incurred when listing a directory.
        • So that, file metadata within the same directory are physically colocated on disk.
        • Listings would thus benefit greatly from kernel page cache, and rocksdb block cache.
        • Jan: worried about intrusive changes close to production roll-out? Would probably be last major change (and would need a conversion campaign on EOSPPS, prod instances will have this from the start).

      (Andreas)

      • Changed the behaviour of atomic uploads to avoid any file loss scenario by overlapping/replaid open/uploads EOS-2571
         
      • Fixed looping bug in FST (MgmSyncer.cc) using a bogus mtime making a
        thread running in a tight loop
        ( filling up /var parition in EOSUSER/GENOME )
        fix in CITRINE & AQUAMARINE branches

      (Massimo)

      • Memleaks in (old) namespace, linked to file creation - seems to also affect new NS (QuarkDB starts paging. Other issues: every hour, on the hour, see mem increase - log truncation? Last issue is runaway mem consumptiom

      (Jan): new NS roll-out - status (EOSHOME has 1 filesystem)?

      • have Foreman hostgroup, puppet config (MGM and QuarkDB on same box), have redirector, have NS for first instance (but will wait for new QuarkDB on-disk layout)

       

    • 4
      AOB