9–11 May 2007
Manchester, United Kingdom
Europe/Zurich timezone

Grid systems' capacity metrics from a user point of view; Performability exploration within the ATLAS virtual organization

9 May 2007, 17:30
2h 30m
Manchester, United Kingdom

Manchester, United Kingdom

Board: P-036

Speakers

Mr Fotis Georgatos (University of Cyprus)Mr Ioannis Kouvakis (University of the Aegean)Mr Ioannis Koyretis (National Technical University of Athens)

With a forward look to future evolution, discuss the issues you have encountered (or that you expect) in using the EGEE infrastructure. Wherever possible, point out the experience limitations (both in terms of existing services or missing functionality)

To begin with, we have been able to make our first experiments
with the technology,
but not to implement it as a regular service, say in the ATLAS
VO, because we suspect
that we might be colliding with the VO's AUP. We do think that
the service is useful
though and should be expanded across multiple VOs.
It is imperative to note that resource characterization makes
sense, as long as
sysadmins keep the type of resources within a single queue
homogeneous (ie. similar
systems across one queue).

Report on the experience (or the proposed activity). It would be very important to mention key services which are essential for the success of your activity on the EGEE infrastructure.

As explained earlier, the metrics service can be deployed within
the context of a VO,
in order to benefit the users of that particular realm directly
and in their own
discretion or, it could be provided as an integrated servide
within the RB/WMS
mechanisms that can now make resource selection with more
detailed and accurate
algorithms. The basic framework for doing the first part is
already available as a
python code package, which is able to submit a self-compiling
lmbench source and some
related scripts that gather other system information -software &
hardware-, and
collect their reports. In order to make the results technically
correct some
statistical validation is necessary.

What is very important to specifically clarify, is that the
benefits of applying the
benchmarking technique and resource characterization can greatly
outnumber the
measurement system's overhead in itself (typically less than 0.5%
of site's capacity).

Describe the added value of the Grid for the scientific/technical activity you (plan to) do on the Grid. This should include the scale of the activity and of the potential user community and the relevance for other scientific or business applications

The potential community that benefits is in effect all grid
users, since optimization
of the system as a whole, can lead to direct and indirect
advantages for everyone, in
terms of total AND individual job throughput. What we want
specifically to
demonstrate is, that by skipping benchmarking and resource
characterization, enormous
amount of grid resources can be waisted or sub-optimally
exploited. For example, the
systems that are best on floating point of a given algorithm, say
64bit operations,
are not the ones that are optimal on memory transfers, and vice
versa. The results
are conclusive in demonstrating that the current GIIS-based
scheme is, at best,
incomplete.
We started our activity within SEE VO, then verified the
situation also within ATLAS
VO, and expect that if operations' teams (dteam, ops) align and
perform similar
metrics, the same will be proven for the system as a whole.
Currently this is not
possible from our side, because it requires operators' and/or VO
approval.

Describe the scientific/technical community and the scientific/technical activity using (planning to use) the EGEE infrastructure. A high-level description is needed (neither a detailed specialist report nor a list of references).

The activity we propose involves benchmarking of grid resources,
as they are
effectively available for all users. We do this in order to
measure real grid
characteristics, which serve as a proof that metrics-guided
resource selection is
nearly imperative, if not to optimally select resources, at least
to specifically
avoid ones which are known a priori that they don't perform as
good as necessary. Our
results hint in favour of a more intelligent matchmaking process
which involves metrics.

Authors

Mr Fotis Georgatos (University of Cyprus) Mr Ioannis Kouvakis (University of the Aegean) Mr Ioannis Koyretis (National Technical University of Athens)

Presentation materials

There are no materials yet.