PPS Pilot Follow-up Meeting Minutes Tue 11 Nov 2008

  • Date: Tue 11 Nov 2008
  • Agenda: 42943
  • Description: pilot of Cream CE: check-point
  • Chair: Antonio Retico

Attendance

  • PPS: Antonio Retico
  • CMS: Enzo Miiccio
  • Alice: Apoligise
  • BARI: Absent
  • CERN: Ulrich Schwickerath
  • CNAF: Absent
  • PADOVA: Sara Bertocco
  • FZK: Angela Poschlad
  • JRA1/Cream/WMS: Massimo Sgaravatto;
  • SA3: Alessio Gianelle

Review of action items (tasks)

Status of the subtasks of TASK:7981 : "Set-up and run Cream CE Pilot (Phase2)" (see them in the PPS tracker )

Notes:

TASK:7139 : Install a Cream CE at CNAF-PPS - In progress - 28d - 90% - last update: 15/7


TASK:7157 : Verify behaviour of CreamCE in nagios - In progress - 7d - 10% - last update: 7/11

Antonio:

  1. Karolis at Cern has started working on dedicated tests for Cream in Nagios.
  2. Konstantin is working to a "centralised" instance of Nagios, which shows among others, also the test available from SAM. This system uses the topology from the SAM DB, which in turns extracts it from GOCDB and the BDII (via the bdii2opracle script).

the bdii2oracle script makes the assumption that a site (not necessarily its services) has to be registered in the GOCDB, otherwise the information published is discarded. That's why the CREAM CEs at Padova and Bari are not visible.

In theory Nagios could extract these sites from the info system but this would be an exception not relevant for production, and the diversity of sources of topology could make the debugging quite difficult

A solution would be to create one or more dummy sites in GOC DB at this purpose. To be discussed


TASK:7427 : Configure FCR in PPS to handle CREAM-CE - 0% - last update: -

Antonio: We didn't start because it depends on SAM working well


TASK:7986 : Install a Cream CE at FZK-PPS - In progress - 0% - 28 d - last update: 8/10

Angela: We had issues installing the service, and we couldn't find the time to report them to Massimo.

Antonio invites to report the issues as soon as possible to the developers, because one of the goals of the pilot is to have installation issues sorted out.

Status and results of the pilot service (by VOs and sites)

CMS:

CMS submitted production jobs gradually increasing the submission rate . They started seeing problems at a rate of 20jobs/min (~14000 jobs/day)

Massimo: The performance rates can be improved. One cause of the performance issue has been identified in the proxy delegation mechanism. This is confirmed by tests done by Alessio where the automatic delegation was disabled. This issue is being addressed both as a quick fix and within the design of the new proxy renewal mechanism (the proper one). The high failure rate is still a concern.

Alessio's test results (received after the meeting):

  • test duration: 161 h 16 min (~7days)
  • #jobs submitted: 319.480 (7987 collections of 40)
  • rate: 33 jobs/min (~47500 jobs/day) (*)
jobs Success: ~94%

(*) The queue in ICE was often empty, so the rate could be increased but the high failure rate has to be addressed first

Enzo pointed out that during a stress test of the WMS a submission rate of 50000 jobs/day was reached and sustained. So CMS expects to see similar performances using the ICE-CREAM submission chain


Alice (some news reported by Antonio):

Sites have been invited again to install cream at the last ops meeting: No feedback on installations was received from the sites though so far

Status and results of the development (by developers)

PATCH:2147 now in PPS . Massimo confirms that this fix is already deployed in the pilot

Massimo: A new tag for the pilot is in preparation which should contain the fixes developed so far, specifically the workaround for the proxy renewal (PATCH:2552). The tag should be available in the next days for the sites to install.

Open Issues (by VOs, sites, deployment teams)

Declaration of sites in GOCDB

Antonio will register and maintain the dummy entries in GOCDB. The same security contact of the corresponding production sites will be used

Recommendations for release and deployment

none

Decision about termination/extension of the pilot

Antonio proposes the option to suspend the pilot for some time in order to allow the developers to get to a newer version of Cream addressing the performance issues currently being investigated.

Massimo objected that the pilot is still producing interesting results and should be continued.

Enzo (CMS) has no objection to continue

The decision is made to extend the pilot of one month (conventional end date set to 16th of December)

The next check point meeting will be on the 25th of November. For that date the new tag of Cream is expected to have been released and deployed in the pilot

AOB

Due date of "service" tasks (run service XX at YY) will be set to 1 day after the check-point date (in order for them not to get to expiration)


Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-11-13 - AntonioRetico
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback