EOS devops meeting

Europe/Zurich
600/R-001 (CERN)

600/R-001

CERN

15
Show room on map
    • 15:00 15:10
      Communications
      • New releases and describe the main bugs and new features added.
      • Planned upgrades and/or interventions
      • 15:00
        New releases and hightlighs the main bugs and new features added. 5m
      • 15:05
        Planned upgrades and/or interventions an their impact 5m

        (maria)

        • EOSATLAS: Fusex in QA lxplus machines
        • Roll-out of xrootd-dsi 3.11 in atlas, cms, lhcb, eulake, pps
        • EOSLHCB: planned upgrade 4.7.X, 24th of March (OTG0055312)

        (cristi)

        • eulake: mgm to 4.7.0, 2 weeks ago
        • backup upgraded to 4.7.0, 2 weeks ago (4.7.1 on the FST)
          • rain layout draining issue found in 4.7.0 and solved in 4.7.1 (EOS-4027)
        • alicedaq upgraded to 4.7.1 both MGM and FST
        • backup: went to 4.7.2 yesterday for further testing
          • MQ segfaulted last night: EOS-4036
        • alice upgrade next week (to 4.7.2 directly, most probably, pending announcement)

        (roberto)

        • eosproject-i01 running 4.7.2 on mgm and fst update currently running.
        • eosproject-i00 running 4.7.0 in mgm and 4.7.1 in fsts
        • if everything is fine, tomorrow I will upgrade eoshome-i04 and eosproject-i02.
    • 15:10 15:20
      Main Issues (Incidents/bugs)

      e.g:
      - Crash:
      - when happened?
      - what happened?
      - how it was solved?
      - how can be avoided?

      (maria)

      • EOSLHCB:
        • Problem: gridftp doors were saturated because the eoslhcb online farm was shutdown and then the nodes were draining.
        • Action performed: Restart eosd, bestman and gridftp in srm-eoslhcb and gridftp doors. Restart also eosd and bestman in srm-eoslhcb-bis.
        • How to avoid this: They use xrootd and fall back to srm when there is an error. However, they have a very old version of xrootd 4.8.3 and they were already hitting some bugs. In two weeks they will upgrade to 4.11.

      (andreas)

      • EOSLHCB:
        • misconfiguration in /etc/xrd.cf.mgm
          all.export / nolock

       

      - Maria fixed the miss-configuration in puppet thanks to Andreas debugging. Also she fixed in Eulake for having all instances in the same track. She is contacting Joel and Chris for do a fast restarting of the mgm.

    • 15:20 15:30
      Main Success (also great features/bugs solved in the software)

      e.g.:
      - what happened?
      - when it happened?
      - who is happy about it? :)

    • 15:30 16:00
      Round table (Max 2 min per person)

      -> 1-2 Objectives for the week - short description
      -> Anything blocking?
      -> 1- 2 Objectives accomplished