1–5 Nov 2010
The Statler Hotel
America/New_York timezone

Lessons learnt from Large LSF scalability tests

4 Nov 2010, 09:00
30m
The Statler Hotel

The Statler Hotel

Cornell UniversityIthaca NY USA
Monitoring & Infrastructure tools Datacenter and Monitoring

Speaker

Ulrich Schwickerath (CERN)

Description

During summer 2010, a large LSF test cluster infrastructure was put in place to allow scalability tests of the batch software (LSF) at a scale which exceeds the production instance by up to a factor 5. The response time of several central commands was measured as a function of the number of worker nodes and the number of batch nodes in the farm. Several issues which were found during the tests were fixed on the fly by the vendor. This way, it was possible to go up to 15,000 virtual worker nodes, and more than 400,000 jobs in the system. Some results from these scalability tests will be presented, lessons learned during the tests, and possible consequences for planning will be discussed.

Primary author

Co-authors

Presentation materials