Machine Learning Links
Support Vector Machines
Currently the most hyped new machine learning technique. It has a good generalisation performance and is usable for many practical problems. Support Vector Machines can be applied to many problems, including classification, regression, novelty detection and page ranking. Initially only made for creating models for linearly seperable data, extensions to (almost) arbitrary models have been made by using the kernel trick. This requires prior knowledge or cross-validation to find the best kernel. SVM scale somewhat well with the number of examples and dimensions, but have difficulties dealing with very high dimensional problems (non-sparse) or 100k's of examples. However, there are very impressive results from SVM's out there.
- [Kernel Machines - General Kernel related research]
- [SVMlight - A support vector machine implementation for classification, regression and ranking]
- [mySVM - A support vector machine implementation for classification, regression and distribution estimates]
Minimax Probability Machine
A very promissing new approach for classification and regression problems. It can deal with regression and classification problems and also uses the kernel trick for generalisation.
- [Minimax Probability Machine Paper (NIPS 2001)]
- [Minimax Probability Machine Regression]
- [Minimax Probability Machine Regression Matlab code]
A promissing new approach for regression problems. The really cool thing about this one is that it can cope really well with very high dimensional problems. It is also nonparametric. An extended version for classification is on it's way: the nonparametric MPMC cascades (PMC). The PMCs are motivated by cascade-correlation (see Fahlman & Lebiere, 1990), Tower (Gallant, 1990) and others (Nadal, 1989), but work with the same principles as the polynomial regression cascade. In each level the MPMC (Minimax Probability Machine Classification) is used. Nonparametric machine learning is (in my opinion) a promising direction for future research, because it safes you so much time tinkering with parameters, kernel-function etc. Also the algorithm has linear complexity in the number of examples. You can also find more information about polynomial classification and regression cascades here.
- [Polynomial Cascade Paper]
- [ High Dimensional Nonparametric Regression Using Two-Dimensional Polynomial Cascades]
Datasets to let loose on your learning algorithm.
- [ftp.ics.uci.edu/pub/machine-learning-databases/]UCI collection
- [yann.lecun.com/exdb/]Yann LeCun's collection
- [www.kernel-machines.org/data.html]Link collection for evaluating Kernels - includes MNIST, USPS and Reuters
- Delve - Data for Evaluating Learning Valid Experiments
- 20 Newsgroups dataset
Conferences related to machine learning.
Other Link collections
Link collections that I found usefull.
- [ http://www.research.att.com/~schapire/boost.html ] Link collection about boosting
- MLnet Online Information Service
- Kernel Machines