Visualizing High Dimensional Data: On affinity embedding

We have just released VisuMap version 4.2.902. A new mapping algorithm called Affinity Embedding (AE) has been introduced in this release. I'll briefly describe AE as a multidimensional scaling method(MDS); or, in more recent terminology, as a dimensionality reduction method.

Like many MDS methods, AE tries to create a low dimensional representation of high dimensional data points by minimizing a stress function. AE's stress function is as following:

where a_ij is a positive number representing the affinity between data points i and j; d_ij is the Euclidean distance between that two points in the low dimensional space. Affinity as a generic concept quantifies certain kind of similarity between data entities. We notice that most MDS methods try to minimize difference (e.g. squared error) between a dissimilarity matrix and d_ij; whereas AE tries to approximate the similarity matrix.

Since many data sets from practices have been characterized by a distance matrix δ_ij, VisuMap uses the following exponential affinity function to convert a distance matrix to an affinity matrix:

where c_ij are constants estimated based on densities around data points in the high dimensional space. This affinity function is just one simple way to convert dissimilarity to similarity; in fact any monotonous decreasing function may be used as affinity function. For more general applications, VisuMap offers a plugin interface to allow users to implement custom affinities.

AE uses a simple version of gradient descent method to minimize the stress function. The computational complexity for each iterative step is O(kN) where N is the number of data points and k is constant that is typically around value 100. Moreover, the memory complexity of AE for each step is O(N). Thus, AE can be applied on relative large data sets, i.e. with a million and more data points.

The main characteristics of AE is its focus on local information (i.e. data with high affinity or low dissimilarity). The reason for this is that lower affinity are represented by close-to zero values and the optimization algorithm can more effectively ignore zero values; whereas other algorithms like CCA or Sammon map need to reduce the effect of large values (e.g. large distance or large dissimilarities.) which is normally harder to do.

AE has been largely inspired by the algorithm Stochastic Neighbor Embedding(SNE) and the more recent version t-SNE. Both SNE algorithms work pretty well in focusing on local information and both of them use a kind of exponential affinity functions in their stress function. However, because of their probabilistic nature, they require in general quadratic computational and memory complexity, which makes it difficult to attack large data sets.

Like SNE, AE works pretty well in unfolding closed manifolds. The following pictures compare AE and t-SNE in mapping three spheres of different sizes in the 3D space to 2D plane. Notice that AE somehow preserves the size the three sphers; whereas t-SNE basically ignores the size information: