In a recent blog post Terence Tao has put mathematical approaches to study a space X (which can be a space, a manifold etc.) into two fundamental camps:

1. By looking into a map f: Y->X that maps certain (better known) space Y into X.

2. By looking into a map f: X->Y that maps X into certain (better known) space Y.

I think that these statements not only apply to mathematical studies, they also provide useful framework for the exploration of high dimensional data. In our case, the unknown space X becomes a high dimensional dataset that we wish to understand. The space Y is usually a lower dimensional space that we know better and is normally easy to visualize.

For the sake of simplicity I would like to refer to methods from the above two camps as injective and projective methods respectively. That means, injective methods inject known space into the target space, whereas the projective methods project the target space into a better known space.

Following this point of view, algorithms to visualize high dimensional data can be grouped as injective or projective. Most MDS (multidiemsnional scaling) based visualization algorithms are projective. Algorithm like Sammon map and PCA (principal component analysis) map the high dimensional dataset into a low dimensional Euclidean space. The RPM algorithm (relational perspective map) maps high dimensional to flat torus.

On the other side, most clustering algorithms can be considered as injective algorithm. For instance, the method of vector quantization maps a finite discrete set into the target space, whereas the Kohonen net (or Self-organizing map) maps a discrete set with a mesh topology into the target space. The hierarchical clustering algorithms (like the agglomerative clustering algorithm) basically map binary tree structures into the target space.

One major benefit of this classification of algorithms is that it can guide us to apply existing algorithms to new data, in the sense that algorithms from the same camp might be able to share some methods/concepts/tricks. For instance, we might be apply an optimization method used by one clustering algorithm to another injective algorithm.

Furthermore, as be pointed out in Tao's blog post, there is certain "duality" between injective and projective methods. That means, for a given method in one camp you can sometimes find its corresponding counter part in the other camp. This duality can then help us to transfer a method from camp to anther one. As an example for this "duality", the RPM algorithm is basically a dual algorithm of the self-organizing map: they are in different camp, but are dual to each there as they share the flat torus as the image space (i.e. the space Y mentioned at the beginning of this post).

Similar to the duality concept I have blogged before about the "symmetry" between these two camps in more restricted situations, namely when the datasets are multidimensional vector spaces (e.g. data tables). In these cases, we can often transfer a method from one camp to the another one in a straightforward way (e.g. by transposing the data matrix).

The injective and projective algorithm are realized in VisuMap by clustering and mapping services, which form the two main groups of services in the VisuMap application.

## Monday, March 9, 2009

Subscribe to:
Post Comments (Atom)

## No comments:

Post a Comment