Project: Raising the power of ensemble techniques

Team: Prof. Dr. Christoph Engels, Prof. Dr. Christoph M. Friedrich

Research institution: University of Applied Sciences and Arts Dortmund

Abstract: Ensemble methods (like Random Forests, Quantile Forests, Gradient Boosting Machines and variants) have demonstrated their outstanding behavior in the domain of data mining techniques. Some outstanding characteristics are (Breiman 2003, Friedman 2002):

  • They exhibt an excellent accuracy
  • They scale up and are parallel by design
  • They are able to handle
    • thousands of variables
    • many valued categoricals
    • extensive missing values
    • badly unbalanced data sets
  • They give an internal unbiased estimate of test set error as primitives are added to ensemble
  • They can hardly overfit
  • They provide a variable importance
  • They enable an easy approach for outlier detection

This project aims to raise these potentials in the powerful HANA environment. In principle there are two alternatives for reaching this objective: Using the function primitives of HANA PAL to build an ensemble or transferring a subset data sample to an R server.

Last modified 8 years ago Last modified on Apr 12, 2013 6:21:24 PM