Visualizing High Dimensional Data: August 2012

Wednesday, August 29, 2012

On Gauge Principal Component Analysis

Principal Component Analysis (PCA) is a widely used method to investigate high dimensional data. Basically, PCA, as a dimensionality reduction method, rotates a data set in the high dimensional space so that it shows most information (i.e. variance) from certain view direction. In this note, I am going to describe a very simple yet effective extension of PCA. The method, which I call Guage PCA (GPCA), works as follows: Instead a of using a single global rotation as in normal PCA, GPCA first decomposes the dataset into multiple clusters, then find a rotation for each clusters; and then compose a global map from the rotated clusters. The following picture illustrates the steps of GPCA:

We can image that above maps illustrate the scenario to take snapshot of a fish tank with 4 different fishes. The top map is a random snapshot in which the fishes face different directions. The middle section are the 4 fishes rotated to show maximal information to the viewer. The last map is a snapshot of the fish tank with all fishes magically rotated, so that each fish shows the maximal information to the viewer. This scenario is sort of an extension of the scenario described in the video "A layman's introduction to principal component analysis".

In certain sense, the relationship of PCA to GPCA is analogous to the relationship between linear regression and linear spline: the former uses a single straight line to approximate a curve, whereas the latter uses multiple line segments. It is obvious that linear spline, as approximation method, is much more powerful than linear regression. The following picture illustrates how linear regression and linear spline approximate a set of points:

A key requirement for linear spline is that the line segments have to be joined together to form a single polyline. Similarly for GPCA, we require that the composed resulting map to preserve variance of the map in major directions. It should be noticed that there are many PCA related approaches under the term localized PCA. Those approaches mostly focus on how to segment the data, but ignore the step to compose a single global map for visualization purpose. In contrast, the composition step in GPCA is the key step. The creation of the initial map and the segmentation of the data is actually not part of the algorithm but just initial conditions.

VisuMap has been supporting GPCA for a while now. In order to use GPCA, we first create a MDS map and cluster the data with any available clustering algorithm of VisuMap; then open the PCA view for a selected cluster and click on the capture button to embed the local PCA map back into to original MDS map. The following video shows the process to create GPCA map for a sample dataset from pharmaceutics :

I have borrowed the term gauge from modern physics in which the gauge principle plays a fundamental role. The gauge principle states that a global system behavior is invariant under local gauge rotation. So for instance, when we calculate orbits of planets in our solar system we don't have to care about the orientation of individual planets. The orientation of planets is an additional freedom that has no impact on structure of orbits. This kind of extra degree of freedom has turned out to be the core structure underlying many laws in modern physics.

Thursday, August 23, 2012

Tracking Attributes in a MDS map.

When using MDS (multidimensional scaling) maps in the practice, a frequently asked question is how does a particular attribute, i.e. a data column in the input table, impact the resulting MDS map? One simple way to visualize the effect of an attribute is just create another MDS map without that attribute under investigation. The difference between the two maps can be then ascribed to that attribute. For instance, the following two maps are MDS maps (created with CCA aglrotihm) of the VisuMap sample dataset yeast.xvm: the fisrt is created with the first 5 attributes; second one with just the first 4 attributes.

The difference between the two maps above is thus caused by leaving out the fifth attribute (i.e. the attribute spo2). We notice that the two major data point clusters (colored as yellow and cyan) are more separated in the first map. Thus, we can see that the attribute spo2 provides significant separation for these two clusters.

With the VisuMap plugin module ClipRecorder we can do a much better job to visualize the effect of a attribute. The newly released ClipRecorder version 1.2 includes scripts that help users to create a sequence of MDS maps with gradually decreasing weight for a selected attributes. The difference between two successive maps provides direction vectors which indicate how data points move when the weight for the selected attributes decreases. The following map is one of such map that visualizes the movement direction of all data points at certain moment during the running process of the script:

In above map, each bi-colored bar represents a data point; the red side of the bar points to the moving direction of a data point and the length of the bar indicates the speed of the movement.

The ClipRecorder plugin also records all the map sequences, so that we can replay them any time later as simple map animation. The following short video clip, for instance, shows such an animation:

Thursday, August 2, 2012

New VisuMap Release: More fun with scripting

We have just released VisuMap version 3.5.882. Worth mentioning in this release are two enhancements to the scripting interface. The first enhancement is the auto-complete editing. We have added script analyzer to the editor so that it will automatically suggest class members with documentation when it notices that user is about to add code to access class methods or properties. The following shows a screen short of the auto-complete editing:

The second enhancement to the scripting interface is the so-called property spinor: within an editor window the user can now double click a property value to link the property to the mouse wheel. When the mouse-wheel is linked to a particular value in a script and when the user spins the mouse wheel, the property spinor will automatically change the linked value and execute the whole script. With this feature VisuMap provides a very generic control mechanism to probe different settings, so that user can see immediately the consequence of changed settings.

The following is short video to demonstrate the usage of property spinor:

Visualizing High Dimensional Data

Wednesday, August 29, 2012

On Gauge Principal Component Analysis

Thursday, August 23, 2012

Tracking Attributes in a MDS map.

Thursday, August 2, 2012

New VisuMap Release: More fun with scripting

About Me

Blog Archive

Tweet