. # Polynomial Cascades for Data Mining - Classification and Regression for large, high dimensional datasets

Polynomial Cascades were originally developed by Greg Grudic et.al. for high-dimensional regression problems. An extension for classification, Polynomial MPMC Cascades, were developed by Sander Bohte, Greg Grudic and myself. On this page I list some of the features of the regression and the classification method.

## Features (Regression)

• Fast - can handle large, high-dimensional problems.
• Accurate models - minimizes mean-squared error
• Can fit non-linear functions without the use of kernels.
• No parameter tuning required - can be run by non-experts.
• Interpretable model - the learned hypothesis is one high-degree, multi-variate polynomial. The hypothesis is not an ensemble of classifiers.

## Features (Classification)

• Fast - Runtime for using the features only is O(L x N x d) with L being the number of levels (data dependant, but usually <400), N being the number of training examples and d being the dimensionality of the problem. The method scales to very large problems with millions of examples (read: millions of rows in terms of SQL Databases).
• No parameter tuning required - You can just run the algorithm and it will give you a good model. This means no cross validation is needed to determine a suitable kernel and kernel parameters.
• Can classify non linearly separable classification problems without fine tuning of parameters
• Competitive Performance on benchmarks with the Minimax Probability Machine for Classification (MPMC) and Support Vector Machines (SVM)
• Incremental building of the model - You can stop the learning at any point in time and have a usable model. This can be useful if you have time constraints and a classifier must be learned in real time.
• Bounded misclassification error - The PMC generates a worst case bound on the misclassification error for future, unseen data.
• The underlying assumptions are minimal - No specific distributions are assumed.
• Interpretable model - the learned hypothesis is one high-degree, multi-variate polynomial. The hypothesis is not an ensemble of classifiers.
• Feature Selection - One feature per level is used, chosen by a theoretically well founded metric.
• Kernels - the algorithm can be easily extended by kernels.

## Future Work - Possible extensions

• Handling multiple classes - currently the PMC only handles binary classification problems.
• Handling missing values by not accounting for them during the computation of mean and covariance.