Thursday, January 15, 2009

Experiements with kernel PCA

The newly released VisuMap software version 2.7 has added the kernel PCA (kPCA) service. From the user's perspective kPCA basically provides the same service as MDS methods, namely maping an abstract set of data points with a similarity distance to a multidimensional vector space, so that the Euclidean distance in the dimensional space somehow reflect the original similarity distance. Thus, the input for kPCA is a dataset with a similarity distance (or matrix), the output is a table of numerical values with one row vector for each data point.

As there are many effective analysis methods operating on dimensional vector space (like k-mean clustering, support vector machine), kPCA enables us, like MDS, to apply effective methods to a broader range of data.

In order see how kPCA works, I created a dataset with about 3000 data points which form together a sphere in the 3D space. With the Gaussian kernel (a way to measure similarity between data points) kPCA of VisuMap created a 50 dimensional datasets within about 100 seconds. That means, kPCA mapped a 3 D dataset into the 50 D space. The first 3 dimensions in the 50 D space basically mirror the original 3 dimensions. In order to see how other dimensions look like, I fixed the x- and y-axises of a 3D view window to the first two dimensions, then assign the z-axis to other dimensions one after the other while rotating the 3D view window. Then following small video clip shows how the spherical dataset looks like with those extra dimensions:

The first few seconds of the video shows how the first 3 dimension re-constructs the original sphere. Then, each time when the z-axis switched to another dimension, the sphere turned in to another geometrical shape. I am not sure how those geometrical shape get formed and how to take advantage of those new dimensions. But, those dimensions surely look interesting and a little mysterious.