Minutes from WLCG Mon consolidation meeting on 17th of Jan 2014
people in the room : pablo, julia, luca, david t., eddie, marian, maarten, ale, simone, lionel, nicolo
remote:salvatore and david c.
Apogies: Stefan, Pedro
Pablo: From now on, the last person arriving to a meeting will be the one who takes the minutes in the next meeting. If anybody disagrees, please speak up
(there was consensus from the room)
Pablo: So, during the next meeting, Ale will take the minutes
Pablo: Version 1.0 of the document is out. We have added all the tasks of the project on JIRA. The timeline for the tasks go all the way until the end of the project.
Maarten: Did you identify any worrisome points? Would you be able to deliver by the end of the project?
Pablo: Time estimation was difficult, we have a rough estimate at the moment. We will be flexible as the tasks move forward we might need to assign more people on a task or change the duration of a task.
Questions on Luca’s presentation:
Ale on slide 6: I think it could be useful, while you develop, to contact people that are experts from ATLAS and CMS regarding the error codes.
Maarten: The machine that runs all those submissions has to be very stable and has to be configured properly. That’s where the expertise of atlas and cms would be needed. The probe can have its logic based on the official documentation.
Pablo: Regarding the timeout, what we have seen at the moment is that a test validity in SAM is 24 hours. In the beginning we said let’s change it to 2 hours and we saw many many gaps in the tests. For the record the current validity is set to 6 hours for a test and we will most probably reduce this number sequentially in the future.
Ale: If we want to test the status of a site, 6 hours is enough. We all agree on that.
Ale: We would like to add the queue name to the vo feed, we don’t want to rely on the BDII.
Julia: The problem is that NCG cannot handle queue names, it does not support it. We are working on providing new probes and a new SAM system (SAM 3). For some time we could live with this limitation to get the queues from BDII. The new configuration system will take into account the vo feed but this is a work for the future SAM 3. We should also take into account that even the availability calculation currently cannot handle the queues. It is not possible to take into account two queues from the same CE. This has to be considered. Also, the timeline for Nagios is not clear at the moment.
Luca: The target is to have a proposal for Nagios on February.
Ale: Are you also testing the OSG-CE? You should check it to a OSG-CE that doesn’t have a default queue.
Maarten: We do not lose anything with the new probe, we are doing what wms probes are doing currently. We still have the same constraints, we have changed the machinery of the probe but we are relying on the information system that the wms probes are relying.
Pablo: We won’t gain much in patching and hacking the old system as it is going away. We should think how to do it properly for the new system.
Maarten: Don’t retire the wms probes if it cause problems, don’t rush it. Sure, they are not used in real-case environments by the experiments but we were living with it for quite some time. We could retire them when the current SAM system retires.
Simone: If it is possible please retire the wms probes.
Julia: Will these changes (introducing new probes) have an impact on POEM, on profiles?
Marian: They need to be added.
Julia: Will it impact the availability calculation?
Luca: It is explained on the next slides.
Pablo: Would it be possible to reuse this probe by other experiments as well?
Luca: Yes, it is not specific to CMS, it will also be identical to ATLAS as well.
Maarten: The WN metrics stay the same but the CE metrics will change their name and this will affect the availability calculation.
Julia: This means that we register in preproduction a new profile and we compare them in preproduction?
Simone: I think you missed a step between step 2 and 3. You have no full wn test step. Step 2 is on pre-production. There is a big jump between step 2 and 3.
Nicolo: It is a list of changes.
Maarten: Exactly, we will see if we could avoid it. Step 1 could also be done in production.
Ale: As of today this seems to be the plan, we will see how it goes. Let’s start and see how it goes.
Maarten: I got a question about the server side of condorg, it has an instance at the moment but also this needs to be of a production quality.
Julia: It is sitting on the same box. It shouldn’t be a problem.
Pablo: Let’s postpone the REBUS talk and discuss it next week with the requirements from the experiments and WLCG office. Next meeting with be on Friday 31st of Jan. I would like to discuss a bit more the topic of vo feed to make sure that we have what we need and only that.