Keywords like data mining (DM) and knowledge discovery (KD) appear in several thousands of articles in recent time. Such popularity is driven mainly by demand of private companies. They need to analyze their data effectively to get some new useful knowledge that can be capitalized. This process is called knowledge discovery and data mining is a crucial part of it. Although several methods and algorithms for data mining has been developed, there is still a lot of gaps to fill. The problem is that real world data are so diverse that no universal algorithm has been developed to mine all data effectively. Also stages of the knowledge discovery process need the full time assistance of an expert on data preprocessing, data mining and the knowledge extraction.

These problems can be solved by a KD environment capable of automatical data preprocessing, generating regressive, predictive models and classifiers, automatical identification of interesting relationships in data (even in complex and high-dimensional ones) and presenting discovered knowledge in a comprehensible form. In order to develop such environment, this thesis focuses on the research of methods in the areas of data preprocessing, data mining and information visualization.

The Group of Adaptive Models Evolution (GAME) is data mining engine able to adapt itself and perform optimally on big (but still limited) group of real-world data sets. The Fully Automated Knowledge Extraction using GAME (FAKE GAME) framework is proposed to automate the KD process and to eliminate the need for the assistance of data mining expert.

The GAME engine is the only GMDH type algorithm capable of solving very complex problems (as demonstrated on the Spiral data benchmarking problem). It can handle irrelevant inputs, short and noisy data samples. It uses an evolutionary algorithm to find optimal topology of models. Ensemble techniques are employed to estimate quality and credibility of GAME models.

Within the FAKE interface we designed and implemented several modules for data preprocessing, knowledge extraction and for visual knowledge discovery.

Goals of the FAKE GAME project

We are developing the open source software FAKE GAME. This software should be able to automatically preprocess various data, to generate regressive, predictive models and classifiers (by means of GAME engine), to automatically identify interesting relationships in data (even in high-dimensional ones) and to present discovered knowledge in a comprehensible form. The software should fill gaps which are not covered by existing open source data mining environment WEKA and possibly integrate with the YALE environment.


There are several presentations of FAKE GAME core ideas. These are listed below:

  • GMDH and FAKE GAME: Evolving ensembles of inductive modelsPPT, PDF
  • FAKE GAME updatesPPT, PDF
  • Optimization of Models: Looking for the Best StrategyPPT, PDF
  • Regularization of Evolving Polynomial ModelsPPT, PDF
  • Inductive Modeling: Detection of System States ValidityPPT, PDF
  • Evolutionary Search for Interesting Behavior of Neural Network EnsemblesPPT, PDF
  • Visualization Techniques Utilizing the Sensitivity Analysis of ModelsPPT, PDF
  • Data Mining EEG Data: When Models Can Be Trusted?PPT, PDF
  • The GAME Algorithm Applied to Complex Fractionated Atrial Electrograms Data SetPPT, PDF
  • Inductive Models & Visual Knowledge DiscoveryPPT, PDF
  • Automated FAKE GAME ReportingPPT, PDF
  • CIG Projects & Research InterestsODP, PDF
about.txt · Last modified: 2009/10/05 18:13 by tregoreg
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-Noncommercial-Share Alike 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki