PPS Pilot Follow-up Meeting Minutes Tue 22 Jul 2008

  • Date: Tue 22 Jul 2008
  • Agenda: 37274
  • Description: pilot of Cream CE: check-point
  • Chair: Antonio Retico

Attendance

  • PPS: Antonio Retico
  • CNAF: Daniele Cesini, Danilo Dongiovanni
  • FZK: apologise
  • UPATRAS: -
  • PIC: -
  • Cream (Cluster of Competence): Massimo Sgaravatto, Alessio Gianelle, Sara Bertocco
  • SAM: represented by Antonio
  • Nagios: -
  • CMS: Enzo Miccio
  • Alice: Patricia Mendez
  • Atlas: -
  • LHCb: -

Status of the pilot service (by VOs and sites)

progresses on tasks

Status of the subtasks of TASK:7143 .

Notes:

  • TASK:7139 : set-up of VO SW area on WN still in progress. This is because at CNAF the SW area is on gpfs and experts were called to help. The work has been slowed down also by the unscheduled downtime the site suffered. Patricia (Alice) said that however they could make progress on the other CE installed at FZK
  • TASK:7142 : everything is ready. The task is kept open. Will be closed as official start of phase2.
  • TASK:7153 TASK:7144: SAM team is at work. We needed to set-up a new environment (SAM server DB, webservice, bdii2oracle script) first. Now the existing cream CEs in PPS are in our DB and CYFRONET is working to the submission. This is the same test as proposed by CMS, except that it is done using the ops test. This is a message for CMS as well that now we are ready to receive their SAM tests. In parallel, and with an eye to avoid overlapping, the SAM team is studying new tests. the new version of bdii2oracle script, enable to gather CREAM CEs is going to be released in the production instance of SAM as well. In addition to that it turned out that also FCR needs a bit of configuration. A new TASK:7427 has been opened for that
  • TASK:7157 no progress reported
  • TASK:7159 cream CEs are now available in the pps information system, but gstat does not show them, so it probably needs extra configuration. gstat developers will be contacted0 ( after the meeting GGUS:38891 was opened at this purpose)
  • TASK:7274 , TASK:7279 : The service management tasks were updated as a consequence of the decision to extend the pilot (see later)

  • TASK:7278 : Done. the management of the VOBOX at FZK should be included in TASK:7274 . The nodename should be mentioned in the pilot description

Feedback Alice (Patricia)

Alice is working with the site at FZK in production mode and real conditions. The WMS based submission using the VOBOX is failing due to BUG:37563 (number of proxy delegation limited to 9)

Massimo explains that the usage of VOBOX, by adding extra levels of delegation hits the limit imposed by the bug in VDT. This limit in not reached in case of VOBOX submitting to lcg-CE (BLAH missing in the chain) or VOBOX Submitting directly to the CREAM (no WMS in the chain). There is a patch for that in certification (Actually two patches, PATCH:1981, ready for certification and PATCH:1979, in certification, for 32bit and 64bit architecture respectively)

Patricia: Direct submission to CREAM was successful. Thanks to Massimo Sgaravatto for his help. The submission worked very well. Changes to Alice JDLs were necessary. Now we are changing the LCG module in Alien to enable submission to cream CEs.

Massimo: what is exactly Alice's idea? Do you want to submit using the WMS in the future and now you are doing the direct submission as a workaround or is the direct submission the option you prefer?

Patricia: Alice has always pushed to have the possibility to do direct submission. Before Cream this was impossible. So the idea is to submit directly to the CEs. We want however to have at the same time the chance to submit through the WMS as a fall back solution.

Feedback CMS (Enzo)

Enzo: Not much to report due to vacations

Antonio: As I said before, we are now ready to receive SAM tests from CMS. We will do the submission with the OPS use cases, and it would be useful if CMS could verify their ones as well

Update from JRA1

Status of ICE WMS

Massimo: We released two patches PATCH:1755 and PATCH:1790 with cream and CLI. These patch are now certified. We still don't have on official WMS+ICE. PATCH:1841 contains ICE+WMS + several bug fixes. We are still testing internally PATCH:1841 and we are not able for the time being to foresee the delivery date. The WMS+ICE installed in PPS following Alessio's instructions is usable but has known issues. Functionality and interfaces are however unchanged.

Initial planning of phase2

Antonio: As already agreed two weeks ago, before moving to phase2 we will wait for this patch to be available. This will give an extra value to phase two. In consideration of the status of PATCH:1841 I think we should stay in this configuration for a bit longer

Massimo: This does not seem to be impacting Alice for the time being. As far as the WMS is concerned what is exactly the point of moving the pilot in production?

Antonio: the current configuration is OK but is not suitable for scalability tests, because the batch resources behind don't allow heavy submission. SAM tests can be done anyway. Also CMS knows how to direct their test to the PPS system

Massimo: As a condition to move to phase 2 do you expect to have PATCH:1841 certified before or just something that in our opinion is ready for these tests?

Antonio: We expect to have this patch in a status that reflects the following conditions: * closed and delivered to certification * installable * not breaking the system * sufficiently documented
This status is somewhat intermediate between "Ready for Certification" and "Certified". This status does not exist in Savannah, but most probably will be introduced. The conditions in which this pilot was started were very particular, with an earlier version of the software in certification (according to the old process) and a newer one in PPS (following the new process).
Basically what we would expect as a pre-condition to start the pilot we expect two assertions to be done

  • JRA1: "the software is in our opinion ready to be given to the users" --> Patch in "Ready for Certification"
  • SA3: "the software is in our opinion safe to install and suitable for beta testing" --> patch in "Ready for Pilot" (tentative status name)
In this pilot I played the role to be of SA3 and, by evaluating the status of the software and the documentation, I judged the status of the software proposed as stable enough to start working on it with no major pain

The conclusion is that in this very moment we don't need to move to phase2 two because everybody seems to be able to work in this configuration, All activities (SAM, gstat, Nagios, experiments) are progressing. So there is no point in complicating the system.

Recommendations for release and deployment

Summary of technical issues which we want fixed before moving to phase2:

Decision about termination/extension of the pilot

In consideration of what discussed above, the decision is made to extend this phase of the pilot within the PPS infrastructure and to delay the deployment in production of five weeks from now.

The long delay is due to the fact than many of the actors will be on vacation by mid-August. On the other side, the existing services have to be kept up and running in the meantime because they are now part of Alice's production pool.

The next check-point meeting will be on the 26th of August

AOB

Antonio: What do we do with the version of cream coming now out from certification? This is now arriving to PPS, so, following the old process, it will be installed in the PPS sites and then moved to production. Between the PPS and the production phase normally an activity from the experiment is expected, but in this case it does not make sense, because the experiment are already working on the newer version. Normally in these cases a pre-deployment test is done and then the software made available immediately to production without further staging in PPS. This is what we think to do in this case as well. Are there any objections to this idea? We have also to consider that the production SAM, following the modifications done in PPS, is being re-configured to see the Cream CE (extract them from the BDII). Tests are not critical for the time being, We are trying, with this pilot, to minimise secondary effects, but we must be prepared to have something failing

Massimo: it has to be clear that PATCH:2001, the new version of cream, has more bugs fixed, but all the critical bugs found by Di in the the now certified version of CREAM (PATCH:1755) were fixed. So the "old" version has nothing critical and it is perfectly safe to install Of course all the fixes in PATCH:1755 were reported on PATCH:2001

Daniele:it is however important to do the pre-deployment of PATCH:1755 and we'll do it. So, wouldn't it be a good idea to attach these machines to the pilot and let the same tests flow in ?

Massimo: It is also important to notice that the installation instructions in PATCH:1755 were slightly improved in the meantime with respect to the ones originally used in the pilot so we confirm that a pre-deployment test is necessary

Antonio: Yes, this sound like a good idea, we'll do it and distinguish these services on the web page

Actions

See child tasks of https://savannah.cern.ch/task/?7143


-->

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2008-07-23 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback