CHEP 07

Name: CHEP 07
Start: 2007-09-02T08:00:00+02:00
End: 2007-09-09T12:00:00+02:00
Location: Victoria, Canada

2–9 Sept 2007

Victoria, Canada

Europe/Zurich timezone

Please book accomodation as soon as possible.

Support

chep07-support@triumf.ca

Building a robust distributed system: some lessons from R-GMA

5 Sept 2007, 15:40

20m

Carson Hall C (Victoria, Canada)

Carson Hall C

Victoria, Canada

oral presentation Grid middleware and tools Grid middleware and tools

Dr Steve Fisher (RAL)

R-GMA, as deployed by LCG, is a large distributed system. We are currently addressing some design issues to make it highly reliable, and fault tolerant. In validating the new design, there were two classes of problems to consider: one related to the flow of data and the other to the loss of control messages. R-GMA streams data from one place to another; there is a need to consider the behaviour when data is being inserted more rapidly into the system than taken out and more generally how to deal with bottlenecks. In the original R-GMA design the system tried hard to deliver all control messages; those messages that were not delivered quickly were queued for retry later. In the case of badly configured firewalls, network problems or very slow machines this led to long queues of messages, some of which were superseded by later messages that were also queued. In the new design no individual control message is critical; the system just needs to know if each message was received successfully. The system should also avoid single points of failure. However this can require complex code resulting in a system that is actually less reliable. We describe how we have dealt with bottlenecks in the flow of data, loss of control messages and the elimination of single points of failure to produce a robust R-GMA design. The work presented, though in the context of R-GMA, is applicable to any large distributed system.

Mr A Paventhan (RAL) Mr Adebiyi Kuseju (RAL) Mr Alastair Duncan (RAL) Dr Antony Wilson (RAL) Mr Ming Jiang (RAL) Ms Parminder Bhatti (RAL) Dr Steve Fisher (RAL)

Paper

chep07.pdf

Slides

CHEP07.pdf

CHEP07.ppt

CHEP 07

Support

Building a robust distributed system: some lessons from R-GMA

Carson Hall C

Victoria, Canada

Speaker

Description

Authors

Presentation materials

Choose timezone

CHEP 07

Support

Speaker

Description

Authors

Presentation materials