5th EGEE User Forum

Name: 5th EGEE User Forum
Start: 2010-04-12T09:00:00+02:00
End: 2010-04-16T11:00:00+02:00
Location: Uppsala University

12–16 Apr 2010

Uppsala University

Europe/Stockholm timezone

Support

Kristina.Ulrika.Gunne@cern.ch

An investigation of the effect of clustering-based initialization on Learning Classifier Systems' effectiveness: leveraging the Grid infrastructure

14 Apr 2010, 11:40

20m

Room IX (Uppsala University)

Room IX

Uppsala University

Oral Scientific results obtained using distributed computing technologies Computer Science

Ms Fani Tzima (Aristotle University of Thessaloniki)Mr Fotis Psomopoulos (Aristotle University of Thessaloniki)

Strength-based Learning Classifier Systems (LCS) are machine learning systems designed to tackle both sequential and single-step decision tasks by coupling a gradually evolving population of rules with a reinforcement component. ZCS-DM, a Zeroth-level Classifier System for Data Mining, is a novel algorithm in this field, recently shown to be very effective in several benchmark classification problems. In this paper, we evaluate the effect of clustering-based initialization on the algorithm’s performance, utilizing the EGEE infrastructure as a robust framework for an efficient parameter sweep.

Conclusions and Future Work

Our studies so far have proven ZCS-DM to be a robust and accurate data mining tool, which can outperform its rival algorithms in most of the benchmark datasets used and to achieve a prediction accuracy well above the baseline on all of them. However, given the evolutionary nature of the algorithm, further optimization in terms of time efficiency is necessary. In this direction, we have employed a clustering-based initialization phase and evaluated its effect on algorithm performance through an extensive set of experiments conducted by leveraging the Grid infrastructure.

Impact

Among the various methods used to tackle classification problems, rule-based (or tree-structured) classifiers are particularly popular, because they combine: i) an intuitive representation that allows for easy interpretation of the resulting classification model; ii) a nonparametric nature that is especially suited for exploring datasets where there is no prior knowledge of the attributes’ probability distributions; iii) fast, computationally inexpensive construction methods that produce models storable in a compact form; and iv) fast classification of new observations, once the model has been constructed. Inspecting the above list, one can easily conclude that LCS share most of the advantages of these methods, with the exception of the third point, as genetic algorithm-based search is an arguably slow and computationally expensive search method. Towards this end, the optimization of the ZCS-DM algorithm using Grid resources may provide researchers with an invaluable tool for performing data-mining tasks, and end-users with an efficient application for enhancing decision making tasks.

Detailed analysis

Clustering-based initialization is based on the idea that starting from a non-random set of rules may help the evolutionary process focus on the search-space optima (the optimal ruleset for the given classification task in our case) more effectively and quickly. Intuitively, this non-random set of rules should be based on the given dataset and provide an effective summary of the knowledge available in it. Our solution tries to leverage the potential of clustering algorithms to provide a representative set of centroids for a given dataset, that we then try to transform into rules suitable for the initialization of ZCS-DM. The ultimate goal is to boost the algorithm’s performance, both in terms of predictive accuracy and in terms of training times, through the reduction of the evolutionary process’ execution time. In our current investigation, after detailing the proposed initialization process, we report the results of deploying the algorithm on the Grid infrastructure by means of a DAG workflow process. The conducted series of experiments evaluates alternative initialization parameter sets, aiming towards the optimization of the algorithm in terms of both efficiency and accuracy.

URL for further information	http://issel.ee.auth.gr/
Keywords	Classification, Learning Classifier Systems, Parameter Sweep, Algorithm Optimization

Ms Fani Tzima (Aristotle University of Thessaloniki) Mr Fotis Psomopoulos (Aristotle University of Thessaloniki)

Prof. Pericles Mitkas (Aristotle University of Thessaloniki)

Slides

EGEE-Tzima-Fpsom.pdf

EGEE-Tzima-Fpsom.ppt

5th EGEE User Forum

Support

An investigation of the effect of clustering-based initialization on Learning Classifier Systems' effectiveness: leveraging the Grid infrastructure

Room IX

Uppsala University

Speakers

Description

Conclusions and Future Work

Impact

Detailed analysis

Primary authors

Co-author

Presentation materials

Choose timezone

5th EGEE User Forum

Support

Speakers

Description

Conclusions and Future Work

Impact

Detailed analysis

Primary authors

Co-author

Presentation materials