Polynomial Cascades for Data Mining - Classification and Regression for large, high dimensional datasets

Polynomial Cascades were originally developed by Greg Grudic et.al. for high-dimensional regression problems. An extension for classification, Polynomial MPMC Cascades, were developed by Sander Bohte, Greg Grudic and myself. On this page I list some of the features of the regression and the classification method.

Features (Regression)

Fast - can handle large, high-dimensional problems.
Accurate models - minimizes mean-squared error
Can fit non-linear functions without the use of kernels.
No parameter tuning required - can be run by non-experts.
Interpretable model - the learned hypothesis is one high-degree, multi-variate polynomial. The hypothesis is not an ensemble of classifiers.

Features (Classification)

Fast - Runtime for using the features only is O(L x N x d) with L being the number of levels (data dependant, but usually <400), N being the number of training examples and d being the dimensionality of the problem. The method scales to very large problems with millions of examples (read: millions of rows in terms of SQL Databases).
No parameter tuning required - You can just run the algorithm and it will give you a good model. This means no cross validation is needed to determine a suitable kernel and kernel parameters.
Can classify non linearly separable classification problems without fine tuning of parameters
Competitive Performance on benchmarks with the Minimax Probability Machine for Classification (MPMC) and Support Vector Machines (SVM)
Incremental building of the model - You can stop the learning at any point in time and have a usable model. This can be useful if you have time constraints and a classifier must be learned in real time.
Bounded misclassification error - The PMC generates a worst case bound on the misclassification error for future, unseen data.
The underlying assumptions are minimal - No specific distributions are assumed.
Interpretable model - the learned hypothesis is one high-degree, multi-variate polynomial. The hypothesis is not an ensemble of classifiers.
Feature Selection - One feature per level is used, chosen by a theoretically well founded metric.
Kernels - the algorithm can be easily extended by kernels.

Future Work - Possible extensions

Handling multiple classes - currently the PMC only handles binary classification problems.
Handling missing values by not accounting for them during the computation of mean and covariance.

Code

You can download Matlab code for the classification cascade from here: Download Polynomial MPMC Cascades sourcecode.

Papers

Linear Time Nonparametric Classification and Feature Selection with Polynomial MPMC Cascades for large datasets - Available as Technical Report CU-CS-977-04.
Nonparametric Classification with Polynomial MPMC Cascades - ICML 2004
Is Nonparametric Learning Practical in Very High Dimensional Spaces? - IJCAI 1997

Blog

Consider checking out my blog

Email me