EOS DevOps Meeting

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Jan Iven (CERN)
Description
Weekly meeting to discuss progress on EOS rollout

● overall 2017 planning

please note re-ordered items (suggested by Massimo) - idea is to have everybody aware of major operational/dev stuff, and possible skip quickly over "nothing to report" entries near the end.


● production instances

GSI saga

Gerri's plugin is in xrootd-3.3.6-CERN-5, took also the opportunity to update EOS to 0.3.258

EOSCMS updated yesterday
EOSATLAS updated today

Rolling upgrade of the FSTs is ongoing, number of crashes should decrease

EOSCMS T0 access via AAA

CMS complained that this "fallback" is not performant enough, yet they'll use it only once or twice a year.
We currently proxy the traffic through the AAA VMs because of the path rewrite.

Opened a task (EOS-1900) to see if we could have a plugin similar to the rucio-lfn2pfn plugin for CMS that redirects traffic to the headnode.

 

Also getting rid of LXFS* machines since we have some spare capacity (HW is no longer supported by AI tools, cannot reinstall..). Also found some leftover uninstalled boxes, revived.

Also found a bug in group scheduler, cannot enable but would be nice in EOSALICE.

JIRA: please assign directly to a likely candidate. Do not leave SW bugs on Hervé..

 


● FUSE and client versions

4.1.25 is in qa: CRM-2349 (CMS is waiting eagerly for this, but their "qa" machine cannot validate whether the mem leak is gone).


it-puppet-module-eosclient has a change in qa: CRM-2348

Allow users to override eosclient config for a subset of instances (e.g "webservices"). Syntax is e.g.

---
eosclient_instance_config:
  user:
    EOS_FUSE_MGM_ALIAS: eosuser-web.cern.ch
  project:
    EOS_FUSE_MGM_ALIAS: eosuser-web.cern.ch

Note: this uses "hiera_hash", which might be deprecated after puppet-4.10/5.0 

Could use class parameters.


● Citrine rollout

New namespace

We have to install a couple of boxes with Centos7 in order for Elvin to test and validate the new namespace on PPS.

Production instances migration to Citrine

On hold for now, will re-assess before next TS (in september)


● new Namespace

Intergating patches from Georgios.

Would like proposals for roll-out in production.


● Xrootd

Gerri GSI patches will be integrated into xrootd-3.3.X and 4.7.


● AOB

Note: Meeting next week will be in B600-R-001 (kicked out for pre-GDB) Nope, back to usual room.

 

Discussion: unify the GSI multi-process setup for the other production instances (like LHCb) - can gain experience with this on Citrine..

There are minutes attached to this event. Show them.
    • 16:00 16:05
      overall 2017 planning 5m
      Speaker: Jan Iven (CERN)

      please note re-ordered items (suggested by Massimo) - idea is to have everybody aware of major operational/dev stuff, and possible skip quickly over "nothing to report" entries near the end.

    • 16:05 16:30
      operations: production
      • 16:05
        production instances 5m
        Speaker: Herve Rousseau (CERN)

        GSI saga

        Gerri's plugin is in xrootd-3.3.6-CERN-5, took also the opportunity to update EOS to 0.3.258

        EOSCMS updated yesterday
        EOSATLAS updated today

        Rolling upgrade of the FSTs is ongoing, number of crashes should decrease

        EOSCMS T0 access via AAA

        CMS complained that this "fallback" is not performant enough, yet they'll use it only once or twice a year.
        We currently proxy the traffic through the AAA VMs because of the path rewrite.

        Opened a task (EOS-1900) to see if we could have a plugin similar to the rucio-lfn2pfn plugin for CMS that redirects traffic to the headnode.

         

        Also getting rid of LXFS* machines since we have some spare capacity (HW is no longer supported by AI tools, cannot reinstall..). Also found some leftover uninstalled boxes, revived.

        Also found a bug in group scheduler, cannot enable but would be nice in EOSALICE.

        JIRA: please assign directly to a likely candidate. Do not leave SW bugs on Hervé..

         

      • 16:10
        CERNBOX and EOSUSER 5m
        Speaker: Luca Mascetti (CERN)
      • 16:15
        FUSE and client versions 5m
        Speaker: Dan van der Ster (CERN)

        4.1.25 is in qa: CRM-2349 (CMS is waiting eagerly for this, but their "qa" machine cannot validate whether the mem leak is gone).


        it-puppet-module-eosclient has a change in qa: CRM-2348

        Allow users to override eosclient config for a subset of instances (e.g "webservices"). Syntax is e.g.

        ---
        eosclient_instance_config:
          user:
            EOS_FUSE_MGM_ALIAS: eosuser-web.cern.ch
          project:
            EOS_FUSE_MGM_ALIAS: eosuser-web.cern.ch
        

        Note: this uses "hiera_hash", which might be deprecated after puppet-4.10/5.0 

        Could use class parameters.

      • 16:20
        Citrine rollout 5m
        Speaker: Herve Rousseau (CERN)

        New namespace

        We have to install a couple of boxes with Centos7 in order for Elvin to test and validate the new namespace on PPS.

        Production instances migration to Citrine

        On hold for now, will re-assess before next TS (in september)

      • 16:25
        SWAN 5m
        Speaker: Jakub Moscicki (CERN)
    • 16:30 16:50
      development: near-term
      • 16:30
        nextgen FUSE 5m
        Speaker: Andreas Joachim Peters (CERN)
      • 16:35
        new Namespace 5m
        Speaker: Elvin Alin Sindrilaru (CERN)

        Intergating patches from Georgios.

        Would like proposals for roll-out in production.

    • 16:50 17:45
      other: pilot services, long-term dev, external
      • 16:50
        Webservice 5m
        Speaker: Luca Mascetti (CERN)
      • 16:55
        Backup 5m
        Speaker: Luca Mascetti (CERN)
      • 17:00
        Samba 5m
        Speaker: Luca Mascetti (CERN)
      • 17:05
        $HOME structure 5m
        Speaker: Luca Mascetti (CERN)
      • 17:10
        BATCH integration 5m
        Speaker: Massimo Lamanna (CERN)
      • 17:15
        Xrootd 5m
        Speaker: Michal Kamil Simon (CERN)

        Gerri GSI patches will be integrated into xrootd-3.3.X and 4.7.

      • 17:20
        AOB 5m

        Note: Meeting next week will be in B600-R-001 (kicked out for pre-GDB) Nope, back to usual room.

         

        Discussion: unify the GSI multi-process setup for the other production instances (like LHCb) - can gain experience with this on Citrine..