Clustering highdimensional directional data with the Watson EM algorithm

FeaturesMachine learning applications often involve data that can be analyzed as unit vectors on a ddimensional hypersphere, or equivalently are directional in nature. Spectral clustering techniques generate embeddings that constitute an example of directional data and can result in different shapes on a hypersphere (depending on the original structure). Other examples of directional data include text and some subdomains of bioinformatics. The Watson distribution for directional data presents a tractable form and has more modeling capability than the simple von MisesFisher distribution. In the daily statistical practice it has mostly been used for lowdimensional cases (up to about 4 dimensions). In order to use a generative model of mixtures of Watson distributions on a hypersphere one has to be able to estimate the parameters efficiently, which is a bit tricky due to use of the Kummer function in the distribution and the numerical approximations. This model also allows us to present an explanation for choosing the right embedding dimension for spectral clustering. We analyze the algorithm on a generated example and demonstrate its superiority over the existing algorithms through results on real datasets. The flexibility of the Watson distributions can lead to better performance, as illustrated with this toy example. The algorithms strong point is its ability to deal with unitlength data distributed on a hypersphere. It is not very fast right now (despite the tricks used). 
Papers
Code
The Matlab code from our paper will soon be available here.