EOS DevOps Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Jan Iven (CERN)
Description
Weekly meeting to discuss progress on EOS rollout

● overall 2017 planning

  • ITUM-22 - see slides (mail). Promises EOSFUSE available for testing next week.

● production instances

Production instances

Slow upgrade of FSTs running on EOSPUBLIC ongoing (to 4.1.29). 

EOSALICE is on 4.1.30, EOSLHCB on 4.1.31 (supposed to fix a FSCK crash but does not - ongoing investigation into "ghost" FS).

EOSATLAS (mail about disappearing file) - need to recover (re-register in namespace) the files that have gone due to the balancing bug (need to recover logfile from CASTOR)


● CERNBOX and EOSUSER

Update planned for Oct 10th (CERNBox web service, not EOS)

EOS: need to compact, need to validate & eventually deploy newer EOS.

CERNBox (web) - being puppetized (Hugo)

EOS MGM: logrotate (copy+truncate) causes hiccups ("D" state) - need to look at alternatives:

  • hourly logrotate (as done on other instances) would cause many small hiccups.
  • standard Xrootd logging (with fixed crontab entry)?
  • or write EOS-level logs to separate file..)
  • or reduce Xrootd log level. AFS does not log file-level operations (just trace).

● FUSE and client versions

QA for 4.1.30 didn't show any issues, so it was tagged for production yesterday: CRM-2426

4.1.30 has also been tagged in the eos7-pending tag, which means LinuxSoft will pick it up for their next desktop updates cycle.

The eosclient puppet module has a small improvement in qa: CRM-2437


● Citrine rollout

CentOS7 Support

Around half of EOSPPS' diskserver are now running Centos7.
Improvements regarding SystemD support will be pushed to EOS in the coming days.


● SWAN

old EOS nodes will get re-used for SWAN test before being retired.

BE (meeting) 2017-09-25: BE would like SWAN to be a production service, with SPARK adapter.


● nextgen FUSE

New Fuse Beta Release status

FUSEX code has been carved out of the Aquamarine GIT repo into:

https://gitlab.cern.ch/eos/eos-fusex/

It is now included into Citrine & Aquamarine as a sub-module.

[submodule "fusex"]
    path = fusex
    url = https://:@gitlab.cern.ch:8443/eos/eos-fusex.git
    branch = master

 

Joszef is adding it to the Citrine master build pipeline in GitLab, I am adding it to the Aquamarine build in Jenkins. Once build succeeds, create new RPM eos-fusex containing eosxd.

The server side implementation is currently only available in the Aquamarine code (initial testing will be on EOSUAT instance). Elvin can start the merging into Master when he is back [ it is slightly more complicated to do that because of namespace API changes]. First version containing server side support will be EOS Aquamarine 0.3.270. Both MGM and FST need to be updated for new functionality.

We are still fixing both builds.

New Fuse Development Status

Versioned Protocol

If a client mounts an instance with a different fusex protocol version, it receives an evict message (e.g. unmount).

Quota Interface

Quota Interface via mount point

[root@eos-aufs fuse]# getfattr --only-values -n eos.quota /eos/dev/fuse/
getfattr: Removing leading '/' from absolute path names
instance             uid     gid        vol-avail        ino-avail        max-fsize                         endpoint
dev                   99      99       7269240152           997505     549755813888             apeters.cern.ch:1094

 

The quota status on client side is not yet real-time enough. FUSE clients sees only changes of a quota node, where they currently has a CAP. There is a regular update done by listing or max. after 5 minutes. Need to add a server side quota node push if an authenticated ID runs out of quota.

Cache Consistency

A flush call from one FUSE client triggers a cache invalidation on all other clients having a file open. E.g. cache invalidation happens also while filedescriptors are already open.

O_SYNC

open(...O_SYNC) disables the local file start cache/journaling. For large file uploads this gives better throughput.

FSYNC

Synces everything from the local journal and it is guaranteed that clients see the updated contents in that moment. fsync calls internally the flush logic as explained above.

TODO

fusebind

After some back and forward Georgios has found a 'solution' to avoid the need of eosfusebind. He can explain best. When his work is finished he needs to register his new class in a single function:

class fusexrdlogin  {
public:
  static int loginurl ( XrdCl::URL& url, fuse_req_t req ,
                       fuse_ino_t ino,
                       bool root_squash = false,
                       int connectionid = 0);
};

 

master/slave failover

The async ZMQ connection does not fail-over automatically on master slave redirection.

fixing of known issues

 


Rollout:

  • (not built yet)
  • will only support "mount -t eosfuse"
    • need to see with perma-mount people (gateways, web) - mount via /etc/fstab, need to perhaps include in our "eosclient" module
    • options come via JSON config, but can set on-the-fly - see README.  
  • automount: will lose the "small file cache" each time we unmount
    • where does this need to go on local disk?
    • what happens in case of crash? per-file "cookie" (mtime, checksum) comparison
      • unflushed data will be recovered as new namespace entry.
    • might need to increase idle timeout
  • first round: EOS internal testing
    • should run microtest (but not in CI environment)
    • may use "eostest" cluster (bagplus)
  • prepare puppet profiles for roll-out on experiment/plus machines
  • MGM also need new port to be opened

 

 

 

 

 

 

 

 

 

 

 


● new Namespace

Giorgios is waiting for a "testbed"? Might be EOSUAT (but will not get new namespace), EOSBACKUP will get new namespace but needs to go to "citrine" first, also needs an additional node ( could use a 32GB VM with local SSD (not throttled)?).

  • for development-only testbed: use VMs - even EOSALICE namesapce takes only 80GB on disk, so would fit again to "local" disk.
  • will "find" completely trash the cache?

● Samba

Updated Samba (from CC74), this messed up the not-quite-production workflow for video conversion. Also not-understood disconnects.


● Xrootd

(mail from Michal, "FYI"):

[..] in 4.7.0 the xrootd client started enforcing the correct format of kXR_login response (meaning that the session id has to be present in the response). This turned out to cause problems for dcache, as its implementation of the xroot protocol is inaccurate. However, it shouldn't be a problem for EOS as it has native support for xroot protocol.

Should ask CERN security team for help/audit. Also should discuss whether we enable "request signing" (available in 4.7) - enforce between MGM and FST, optional for clients (new clients will be protected against request hijacking)?


● AOB

Tests to AARNET from EOSUAT - achieved 50Gb; bottleneck is inside network. 

There are minutes attached to this event. Show them.
    • 16:00 16:05
      overall 2017 planning 5m
      Speaker: Jan Iven (CERN)
      • ITUM-22 - see slides (mail). Promises EOSFUSE available for testing next week.
    • 16:05 16:30
      operations: production
      • 16:05
        production instances 5m
        Speaker: Herve Rousseau (CERN)

        Production instances

        Slow upgrade of FSTs running on EOSPUBLIC ongoing (to 4.1.29). 

        EOSALICE is on 4.1.30, EOSLHCB on 4.1.31 (supposed to fix a FSCK crash but does not - ongoing investigation into "ghost" FS).

        EOSATLAS (mail about disappearing file) - need to recover (re-register in namespace) the files that have gone due to the balancing bug (need to recover logfile from CASTOR)

      • 16:10
        CERNBOX and EOSUSER 5m
        Speaker: Luca Mascetti (CERN)

        Update planned for Oct 10th (CERNBox web service, not EOS)

        EOS: need to compact, need to validate & eventually deploy newer EOS.

        CERNBox (web) - being puppetized (Hugo)

        EOS MGM: logrotate (copy+truncate) causes hiccups ("D" state) - need to look at alternatives:

        • hourly logrotate (as done on other instances) would cause many small hiccups.
        • standard Xrootd logging (with fixed crontab entry)?
        • or write EOS-level logs to separate file..)
        • or reduce Xrootd log level. AFS does not log file-level operations (just trace).
      • 16:15
        FUSE and client versions 5m
        Speaker: Dan van der Ster (CERN)

        QA for 4.1.30 didn't show any issues, so it was tagged for production yesterday: CRM-2426

        4.1.30 has also been tagged in the eos7-pending tag, which means LinuxSoft will pick it up for their next desktop updates cycle.

        The eosclient puppet module has a small improvement in qa: CRM-2437

      • 16:20
        Citrine rollout 5m
        Speaker: Herve Rousseau (CERN)

        CentOS7 Support

        Around half of EOSPPS' diskserver are now running Centos7.
        Improvements regarding SystemD support will be pushed to EOS in the coming days.

      • 16:25
        SWAN 5m
        Speaker: Jakub Moscicki (CERN)

        old EOS nodes will get re-used for SWAN test before being retired.

        BE (meeting) 2017-09-25: BE would like SWAN to be a production service, with SPARK adapter.

    • 16:30 16:50
      development: near-term
      • 16:30
        nextgen FUSE 5m
        Speaker: Andreas Joachim Peters (CERN)

        New Fuse Beta Release status

        FUSEX code has been carved out of the Aquamarine GIT repo into:

        https://gitlab.cern.ch/eos/eos-fusex/

        It is now included into Citrine & Aquamarine as a sub-module.

        [submodule "fusex"]
            path = fusex
            url = https://:@gitlab.cern.ch:8443/eos/eos-fusex.git
            branch = master

         

        Joszef is adding it to the Citrine master build pipeline in GitLab, I am adding it to the Aquamarine build in Jenkins. Once build succeeds, create new RPM eos-fusex containing eosxd.

        The server side implementation is currently only available in the Aquamarine code (initial testing will be on EOSUAT instance). Elvin can start the merging into Master when he is back [ it is slightly more complicated to do that because of namespace API changes]. First version containing server side support will be EOS Aquamarine 0.3.270. Both MGM and FST need to be updated for new functionality.

        We are still fixing both builds.

        New Fuse Development Status

        Versioned Protocol

        If a client mounts an instance with a different fusex protocol version, it receives an evict message (e.g. unmount).

        Quota Interface

        Quota Interface via mount point

        [root@eos-aufs fuse]# getfattr --only-values -n eos.quota /eos/dev/fuse/
        getfattr: Removing leading '/' from absolute path names
        instance             uid     gid        vol-avail        ino-avail        max-fsize                         endpoint
        dev                   99      99       7269240152           997505     549755813888             apeters.cern.ch:1094

         

        The quota status on client side is not yet real-time enough. FUSE clients sees only changes of a quota node, where they currently has a CAP. There is a regular update done by listing or max. after 5 minutes. Need to add a server side quota node push if an authenticated ID runs out of quota.

        Cache Consistency

        A flush call from one FUSE client triggers a cache invalidation on all other clients having a file open. E.g. cache invalidation happens also while filedescriptors are already open.

        O_SYNC

        open(...O_SYNC) disables the local file start cache/journaling. For large file uploads this gives better throughput.

        FSYNC

        Synces everything from the local journal and it is guaranteed that clients see the updated contents in that moment. fsync calls internally the flush logic as explained above.

        TODO

        fusebind

        After some back and forward Georgios has found a 'solution' to avoid the need of eosfusebind. He can explain best. When his work is finished he needs to register his new class in a single function:

        class fusexrdlogin  {
        public:
          static int loginurl ( XrdCl::URL& url, fuse_req_t req ,
                               fuse_ino_t ino,
                               bool root_squash = false,
                               int connectionid = 0);
        };

         

        master/slave failover

        The async ZMQ connection does not fail-over automatically on master slave redirection.

        fixing of known issues

         


        Rollout:

        • (not built yet)
        • will only support "mount -t eosfuse"
          • need to see with perma-mount people (gateways, web) - mount via /etc/fstab, need to perhaps include in our "eosclient" module
          • options come via JSON config, but can set on-the-fly - see README.  
        • automount: will lose the "small file cache" each time we unmount
          • where does this need to go on local disk?
          • what happens in case of crash? per-file "cookie" (mtime, checksum) comparison
            • unflushed data will be recovered as new namespace entry.
          • might need to increase idle timeout
        • first round: EOS internal testing
          • should run microtest (but not in CI environment)
          • may use "eostest" cluster (bagplus)
        • prepare puppet profiles for roll-out on experiment/plus machines
        • MGM also need new port to be opened

         

         

         

         

         

         

         

         

         

         

         

      • 16:35
        new Namespace 5m
        Speaker: Elvin Alin Sindrilaru (CERN)

        Giorgios is waiting for a "testbed"? Might be EOSUAT (but will not get new namespace), EOSBACKUP will get new namespace but needs to go to "citrine" first, also needs an additional node ( could use a 32GB VM with local SSD (not throttled)?).

        • for development-only testbed: use VMs - even EOSALICE namesapce takes only 80GB on disk, so would fit again to "local" disk.
        • will "find" completely trash the cache?
    • 16:50 17:45
      other: pilot services, long-term dev, external
      • 16:50
        Webservice 5m
        Speaker: Luca Mascetti (CERN)
      • 16:55
        Backup 5m
        Speaker: Luca Mascetti (CERN)
      • 17:00
        Samba 5m
        Speaker: Luca Mascetti (CERN)

        Updated Samba (from CC74), this messed up the not-quite-production workflow for video conversion. Also not-understood disconnects.

      • 17:05
        $HOME structure 5m
        Speaker: Luca Mascetti (CERN)
      • 17:10
        BATCH integration 5m
        Speaker: Massimo Lamanna (CERN)
      • 17:15
        Xrootd 5m
        Speaker: Michal Kamil Simon (CERN)

        (mail from Michal, "FYI"):

        [..] in 4.7.0 the xrootd client started enforcing the correct format of kXR_login response (meaning that the session id has to be present in the response). This turned out to cause problems for dcache, as its implementation of the xroot protocol is inaccurate. However, it shouldn't be a problem for EOS as it has native support for xroot protocol.

        Should ask CERN security team for help/audit. Also should discuss whether we enable "request signing" (available in 4.7) - enforce between MGM and FST, optional for clients (new clients will be protected against request hijacking)?

      • 17:20
        AOB 5m

        Tests to AARNET from EOSUAT - achieved 50Gb; bottleneck is inside network.