SAM3 in production

Europe/Zurich
513/R-068 (CERN)

513/R-068

CERN

19
Show room on map
Description

Main topic to discuss:

* Open WLCGMONCON tickets

* SAM3 in production

Videoconference Rooms
WLCG_monitoring_consolidation
Name
WLCG_monitoring_consolidation
Description
Kick-off meeting for the WLCG monitoring consolidation project
Extension
109258925
Owner
Pablo Saiz
Auto-join URL
Useful links
Phone numbers

Attended:

David, Alessandra, Julia, Pablo, Hector, Andrea, Nicolo, Maarten, Marian, Salvatore, Pepe

Apologies: Lionel

 

Open tickets

_____________________________________________________

Out of 65 tickets 12 are opened

Walked through the open tickets

The tickets which relate to the SAM/Nagios modification/simplification should move to a different tracker. Marian will take care.

Test scheduling timeout issue will be discussed at the December GDB. Marian will present the statistics of timeout probes with condorG and cream-CE submissions. Compared to February this year, the situation substantially improved. However there is not ideal solution for it.

 

SAM3 in production

Round table

CMS (Andrea)

Andrea works on the script which would extract site availability from SAM3 and import it into SSB. The automation of the comparison of the availability results between two systems is not yet in place, though just manual comparison looks good. Andrea intends to spend several days on the comparison and then SAM3 would become a part of the production monitoring flow (integrated with site readiness, SSB, etc…)

 

ALICE (Maarten)

No complains. ALICE does not use SAM as much as ATLAS and CMS. Maarten does not foresee any problems.

 

 

ATLAS (Alessandra and Salvatore)

Salvatore does not see any issues apart of the fact that some of the tests are not visible in the UI, but this is because they were not added to the profile. Comparison of the availability metrics were performed and looks fine, if there were differences they were for good reason.

 

Nobody from LHCb, Pablo will ask Stefano after the meeting.

 

No show stoppers so far.

We decide by the end of the month whether we still run SAM2 for one more month. The current plan is to block web access to SAM2 by the middle of December, and if nobody complains after that stop all other components

Now people should investigate any dependencies in the experiment systems on SAM2 and upgrade if needed.

 

ATLAS asked for a programmatic interface which was provided by SAM2. Pablo told that he would like to know which APIs are needed and then enable them, rather than doing full copy of all APIs which existed in SAM2.

 

Access to SAM3 UI. Can contain sensitive information for the site. So the access will be protected with the certificate. Would be also relevant for APIs. Should check that experiments are ready to use APIs with the secure access.

Pablo pointed to Pepe site nagios plugin has to be modified as well to be able to use secure access.

 

SAM3 recomputations

____________________________________________________

Pablo is happy that the recomputations won’t be done any more centrally. The instructions how to do recomputations was validated by Andrea who went through the exercise. According to Andrea, it is pretty straight forward, he already passed the instructions to the site support team.

 

The procedure how recomputation should be handled and requested inside the experiments is up to the experiments. However, since sites do request the recomputation, they should be aware how they can request the recomputation and what is the procedure for acceptance. The suggestion is to continue to handle it through the GGUS tickets. But the tickets should be assigned to the experiment teams rather than the WLCG monitoring team. Decided to keep common tracker at least in the beginning. 10 days for recomputations, reports have to be ready before the MB meeting. The entry point should stay the same for the sites, the person in the monitoring shift re-assigns it to the corresponding team in the experiment.

Andrea mentioned that it is possible that in CMS the request will be directly submitted in the CMS internal tracker and it won’t be visible outside the CMS, but it is not a problem.

 

AOB

__________________________

What to do if someone asks about history for n hours (days) which includes a non-complete hour (day)

Possible alternatives:

a)Exclude the current bin

b)Include the current bin , so that you get a bit bigger time interval than you requested

c)Count the latest incomplete bin as a complete one (slightly shorter time range than requested)

 

Currently SAM2 returned only complete bins ignoring the latest incomplete time bin. This is not optimal since does not allow to see the history with the latest evaluation.

Chose the last (c) option.

 

Nicolo

Need to change the way of CMS SEs discovery which is currently taken from SiteDB. Took offline.

Pablo suggested to meet on the 5th of December

 

 

 

 

 

There are minutes attached to this event. Show them.
    • 14:00 14:15
      Open JIRA actions 15m
      Review of the JIRA actions scheduled for this and next month https://its.cern.ch/jira/issues/?filter=13902
      Speakers: Dr Edward Karavakis (CERN), Hector Martin De Los Rios Saiz (Universidad Complutense (ES)), Julia Andreeva (CERN), Lionel Cons (CERN), Luca Magnoni (CERN), Marian Babik (CERN), Pablo Saiz (CERN)
    • 14:15 14:35
      SAM3 in production 20m
      Speakers: Alessandra Forti (University of Manchester (GB)), Dr Andrea Sciaba (CERN), Maarten Litmaath (CERN), Nicolo Magini (CERN), Dr Stefan Roiser (CERN)
    • 14:35 14:55
      SAM3 recomputations 20m
      Discussion on the procedures to request recomputations and perform them. See https://twiki.cern.ch/twiki/bin/view/ArdaGrid/ProfileCorrections for information about how to perform them
    • 14:55 15:00
      Next meeting 5m