9–11 May 2007
Manchester, United Kingdom
Europe/Zurich timezone

A GSI-secured jobmanager for connecting PBS servers in independent administrative domains

11 May 2007, 09:20
20m
Manchester, United Kingdom

Manchester, United Kingdom

oral presentation Workflow Workflow

Speaker

Mr John Walsh (EGEE SA1 & SA3/Grid-Ireland/Trinity College Dublin)

Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.

RemotePBS is installed at four Grid-Ireland sites in EGEE. We
have tested over 6000
jobs, including MPI codes, with good results. A stress-test run
of 3000 small jobs
completed with a 100% success rate. We have identified a number
of key weaknesses in
current jobmanagers. The "RemotePBS" manager is still in an
experimental stage, and
a number of bug fixes and enhancements are likely. Given the
lightweight requirements
required from the remote resources, it is hoped that jobmanager
would connect to the
Grid.

Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).

The "RemotePBS" jobmanager is aimed at security-conscious PBS
server managers of
major non-Grid computing facilities. It enables them to securely
connect existing PBS
servers to the Grid, even if the PBS server is in an separate
administrative domain
to the Grid servers. The PBS server admins have full control over
authorisation of
grid-authenticated users accessing their resources. The software
has been designed to
support execution of grid applications at large-scale computing
centres.

With a forward look to future evolution, discuss the issues you have encountered (or that you expect) in using the EGEE infrastructure. Wherever possible, point out the experience limitations (both in terms of existing services or missing functionality)

Current Grid middleware does not support the Grid model we had
envisaged deploying at
a national level. Lack of MPI support, portability of middleware,
and the facility to
connect remote queue managers are still issues. The gLite CE
assumes an implicit
unsecured trust between the CE and the queue manager and does not
accommodate
independent administrative domains. In addition, only one remote
queue manager per CE
is allowed. We expect to to implement a comparable gLite CE
version presently.

Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications

The EGEE middleware assumes the gatekeeper and PBS system enjoy
an implicit trust
relationship, an anathema to security-conscious admins of
supercomputing centres.
Also the EGEE model assumes the PBS is configured to suit EGEE
and is typically
installed on a gatekeeper. Major sites already have PBS servers,
often tuned over
years. To have access to these sites, these issues must be
redressed. Our jobmanager
does this.

The jobmanager, with modified information publishers, allows
multiple PBS servers to
be attached to the Grid via a single gatekeeper. Interactions
between the CE and PBS
are GSI-secured. The design makes it easy for PBS servers to
manage existing nodes
that only need to have standard WN software installed. It is
extensible to allow some
batch related information to be passed into the job. In addition,
the environment can
be customized whenever any enqueued Grid jobs are executed. The
burden and cost of
running a full Grid site is reduced for the PBS administrator.

Author

Mr John Walsh (EGEE SA1 & SA3/Grid-Ireland/Trinity College Dublin)

Co-authors

Dr Brian Coghlan (EGEE/Grid-Ireland/Trinity College Dublin) Dr Eamonn Kenny (EGEE SA3/Trinity College Dublin/WebCom-G) Dr Stephen Childs (EGEE SA1/Trinity College Dublin)

Presentation materials