Polynomial Cascades for Data Mining - Classification and Regression for large, high dimensional datasets
Polynomial Cascades were originally developed by Greg Grudic et.al. for high-dimensional regression problems. An extension for classification, Polynomial MPMC Cascades, were developed by Sander Bohte, Greg Grudic and myself. On this page I list some of the features of the regression and the classification method.
Features (Regression)
- Fast - can handle large, high-dimensional problems.
- Accurate models - minimizes mean-squared error
- Can fit non-linear functions without the use of kernels.
- No parameter tuning required - can be run by non-experts.
- Interpretable model - the learned hypothesis is one high-degree, multi-variate polynomial. The hypothesis is not an ensemble of classifiers.
Features (Classification)
- Fast - Runtime for using the features only is O(L x N x d) with L being the number of levels (data dependant, but usually <400), N being the number of training examples and d being the dimensionality of the problem. The method scales to very large problems with millions of examples (read: millions of rows in terms of SQL Databases).
- No parameter tuning required - You can just run the algorithm and it will give you a good model. This means no cross validation is needed to determine a suitable kernel and kernel parameters.
- Can classify non linearly separable classification problems without fine tuning of parameters
- Competitive Performance on benchmarks with the Minimax Probability Machine for Classification (MPMC) and Support Vector Machines (SVM)
- Incremental building of the model - You can stop the learning at any point in time and have a usable model. This can be useful if you have time constraints and a classifier must be learned in real time.
- Bounded misclassification error - The PMC generates a worst case bound on the misclassification error for future, unseen data.
- The underlying assumptions are minimal - No specific distributions are assumed.
- Interpretable model - the learned hypothesis is one high-degree, multi-variate polynomial. The hypothesis is not an ensemble of classifiers.
- Feature Selection - One feature per level is used, chosen by a theoretically well founded metric.
- Kernels - the algorithm can be easily extended by kernels.
Future Work - Possible extensions
- Handling multiple classes - currently the PMC only handles binary classification problems.
- Handling missing values by not accounting for them during the computation of mean and covariance.
Code
You can download Matlab code for the classification cascade from here: Download Polynomial MPMC Cascades sourcecode.
Papers
- Linear Time Nonparametric Classification and Feature Selection with Polynomial MPMC Cascades for large datasets - Available as Technical Report CU-CS-977-04.
- Nonparametric Classification with Polynomial MPMC Cascades - ICML 2004
- Is Nonparametric Learning Practical in Very High Dimensional Spaces? - IJCAI 1997