Deployment

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description
More in depth look at the information deployment layer
Videoconference Rooms
WLCG_monitoring_consolidation
Name
WLCG_monitoring_consolidation
Description
Kick-off meeting for the WLCG monitoring consolidation project
Extension
109258925
Owner
Pablo Saiz
Auto-join URL
Useful links
Phone numbers

In the room: Stefan, Alessandro, Lionel, Luca, Andrea, Julia, Pablo, Mike, Ivan, Eddie, Costin, Alberto

On the phone: Alessandra

Apologies from: Maarten, Marian, Pedro

 

Minutes taken by Costin


Pablo: Clean up procedure. We have agreed that the tests results older than 3 months are not important. We have cleaned them up already from preproduction. If there are no objections, in two  weeks we will clean them also from production.

1. ATLAS probes

    Slide 5:

        Alessandro: same code is called with different attributes, several probes produce many different attributes

    Slide 6:

        Julia: in the future version there will be the option to have custom services or site-wide values. If that is good enough than no effort will be done in order to backport this feature

        Alessandro, Alessandra: ok

    Slide 9:

        Pablo: the same storage probe would be usable by ATLAS, CMS and LHCb, right?

        Alessandro: it should indeed be very easy to implement the experiment-specific function to get the list of services

        Julia: do you want to get rid of the grid monitoring framework in nagios probes with this occasion?

        Alessandro: yes

        Julia: do you reuse any components from this, like communication libraries?

        Alessandro: nagios should report directly to the message queue

        Alessandro, Julia, Luca: discussion around the active/passive probes, since different files are used, and get and put are not synchronous (by using pre-placed files), and it might be complicated to publish the tests in a consistent way

    Slide 11:

        Stefan: To clarify: LHCb sees the system as a black box, API is essential to publish/retrieve results, so the internals are not important for LHCb. Also the actual queue (Nagios or SAM) is not important since the message is the same.

        Alessandro: is it important to see the results in the Nagios box? We think not. So far they are visible.

        Alessandra, Julia: if the tests are passively important it might be difficult to rerun the tests locally. Looking for alternatives. If the experiments inject directly messages, in any Nagios box, then the site ones or even the CERN one is not important.

        Alessandro: re-run metrics: not if they are not active and in the local box. Notifications: in theory is possible even if the tests are not local. Import by sites: useful, but could be replaces by an API that sounds like a good option. Still, re-running the metrics is the important issue.

            Services should be instrumented to publish error codes in Nagios. A Nagios box reimplementing APF doesn't make sense.

        Costin: ALICE has the same approach, to publish status from the VoBox

        Alberto: can we understand if the error is site or experiment-specific from a single instance?

        Stefan: important point for LHCb, since the experiment is bringing the entire middleware with it, it might complicate the picture

        Alessandro: submission is independent of the payload and there will be two different error states, one for the site and another for the experiment software

        Andrea: this development would mean deciding from the beginning that Nagios boxes have to be decomissioned. And since the current system is working well and CMS is happy with it, we need more than an idea of change before committing to this.

        Alessandro: ATLAS wants to automatize and simplify as much as possible the environment.

        Julia: functional tests are run 1/hour, would not interfere at all with the automatization

        Alessandro: to be investigated

2. From Quattor to Agile Infrastructure Deployment

- postponed for the next meeting, which should be next week in order to catch up, on Friday morning

There are minutes attached to this event. Show them.
    • 14:00 14:20
      ATLAS probes 20m
      Speaker: Alessandra Forti (University of Manchester (GB))
      Slides
    • 14:20 14:40
      From Quattor to Agile Infrastructure Deployment 20m
      Speaker: Michael John Kenyon (University of Glasgow)
      Slides
    • 14:40 14:55
      DIscussion 15m
    • 14:55 15:00
      Next meeting 5m