Clustering high-dimensional directional data with the Watson EM algorithm
Machine learning applications often involve data that can be analyzed as unit vectors on a d-dimensional hypersphere, or equivalently are directional in nature. Spectral clustering techniques generate embeddings that constitute an example of directional data and can result in different shapes on a hypersphere (depending on the original structure). Other examples of directional data include text and some sub-domains of bio-informatics.
The Watson distribution for directional data presents a tractable form and has more modeling capability than the simple von Mises-Fisher distribution. In the daily statistical practice it has mostly been used for low-dimensional cases (up to about 4 dimensions). In order to use a generative model of mixtures of Watson distributions on a hypersphere one has to be able to estimate the parameters efficiently, which is a bit tricky due to use of the Kummer function in the distribution and the numerical approximations. This model also allows us to present an explanation for choosing the right embedding dimension for spectral clustering. We analyze the algorithm on a generated example and demonstrate its superiority over the existing algorithms through results on real datasets.
The flexibility of the Watson distributions can lead to better performance, as illustrated with this toy example.
The algorithms strong point is its ability to deal with unit-length data distributed on a hypersphere. It is not very fast right now (despite the tricks used).
The Matlab code from our paper will soon be available here.