Visualizing High Dimensional Data: July 2007

Tuesday, July 24, 2007

The symmetry between repulsive and attractive force

The relational perspective map (RPM) algorithm in its core simulates a multi-particle system on a torus surface in which the particles exerts repulsive forces to each other. Some people has asked the question why just repulsive force? Why not use attractive forces like other force directed mapping methods?

A simple answer to this question is that we prefer to use a simpler dynamic system if it can solve the problem. Just like physicists who try to reduce the number of fundamental interactions, I would prefer to avoid using attractive force if it is not absolutely needed.

Classical multidimensional scaling (MDS) methods have to use both types of forces (as can be derived from their stress function) because their base information space is the infinite open Euclidean space: without attractive force their configuration will quickly degrade to infinite size; and without repulsive forces their configuration will shrink to a single point. With RPM method, the closed manifold (i.e. the torus surface) confines the configuration into a limited size.

Another not-so-obvious answer to above question is that on a closed manifold the repulsive and attractive forces are the manifestation of the same thing, from certain point of view at least. To see this, let use consider the simplest case of 1-dimensional curled space (i.e. the circle). As be shown in the following picture, imaging that we have two ants living on the circle and there are two positively charged particles on the circle which can move freely on the circle but are confined on the circle.

From the point of view of the ant on the left side, the two charges exert repulsive force to each other according to coulomb's law.

However, from the point of view of the ant on the right side, the two charges attract each other. This is a little contra-intuitive for us as observers from the 3-dimensional space. But, we need to remind us that these ants are living in the 1-dimensional space. The ant on the right side has no way to see the two charges moving apart from each other. This ant can only move on the circle, particularly the right arc of the circle, to measure the distance between the two charges. What this ant would find out is that the same force as indicated in the picture will seemly press the two charge closer to each other; and larger the charge, the stronger the force. Thus, the ant on the right side would claim that two charges attract each other according a law similar to the coulomb's law (the anti-coulomb's law?).

Friday, July 20, 2007

RPM, Curled Space and Dimensionality Reduction

The Relational Perspective Map (RPM) uses finite curled spaces as information space to map high dimension data. The curled space has caused some uneasiness for users without related background. In this blog I will explain a little about the curled spaces and describe an advantage of the curled spaces over the normal (infinite) Euclidean space.

First, in order to understand the curled space let us compare the 1-dimensional curled space with the 1-dimensional Euclidean space. The 1-dimensional Euclidean space (denoted as R) comprises all real numerical values from negative infinite to positive infinite. Graphically, the 1-dimensional Euclidean space can be depicted by a straight line that extends in both directions to the infinite as be shown by the first map in the following picture.

The curled 1-dimensional space, denoted as T, is graphically equivalent to a circle with a finite size as show in the middle map in above picture. To illustrate the difference between R and T it would be helpful to image an ant living in the spaces. An ant living in R, can walk in both directions indefinitely without coming back to the same spot. An ant living in T can also walk in both directions indefinitely, but it will come back to the same spot after a finite time of walking.

The curled space T can also represented as a segment of R of a finite length, say w, as be shown in the right map in above picture. In this representation, the begin and the end of the segment has to be considered as stuck together. That means, an imaginative ant walking on the segment can somehow be beamed from one end to anther end in no time. Analytically, T is represented by an interval [0, w]. The distance between two points a and b in T has to be defined differently as in R, though. One simple definition for the distance between two points a and b in T is as follows:

d(a, b) := min{|a-b|, w-|a-b|}

The analytically representation ([0,w], d) of T has the advantage over the graphical representation that it can be easily extended to high dimensional cases as we will see below.

The two dimensional curled space T² is graphically equivalent to the surface of a torus as depicted in the left map of the following picture. The torus surface can also be understood as the trace of a circle (i.e. T) moving along another circle. Analogously, we can represent the T² as rectangle of the width w and height h embedded in the 2-dimensional Euclidean space R² as depicted in the right map of the following picture.

The distance calculation on T² becomes somewhat more complicated. If we define the distance based on the first graphical representation, we would have to use path integral in the R³ which is certainly not trivial. Based on the second representation (as shown in the right map of the picture above) we can have a much simple distance function as follows:

d((x₁,y₁), (x₂,y₂)) := min{|x₂-x₁|, w-|x₂-x₁|} + min{|y₂-y₁|, h-|y₂-y₁|}

This distance function is equivalent to any other valid distance function on T² in the sense that two closely located points with respect to this distance function will also be close to each other with respect to any other valid distance function. This distance can be explained with an imaginative ant on the rectangle as follows: If we assume that the ant can only walk horizontally or vertically, and it can be beamed from any edge to its opposite edge in no time, Then the distance between two points a and b is the shortest walking distance for our ant to walk from a to b.

The 3-dimensional curled dimension T³ cannot be represented as a subspace of R³, but it can represented as a cubic whose opposite planes are identified with each other as depicted in the following picture:

The distance function on T³ can be defined analogously as on T² which we will omit here.
After explained the curled dimension we can now talk about the dimensionality reduction (DR). In general, DR means to squeeze data from high dimensional spaces into lower dimensional spaces. The lower dimensional representation of high dimensional data allows us to study the data by means of visualization, but the cost for DR is that we will lose some information which are present in the original high dimensional space.

One main challenge for DR algorithms is to preserve as much as possible relevant information. In order to do so, most DR methods define some kind of stress function that measures information lost of the lower dimensional representation. In this way, a DR algorithm is converted to an optimization algorithm that minimizes the stress function.

For any no-trivial minimization problem we all know that local minimum is a problem, and there more we squeeze the data, the worse the problem. One typical strategy to avoid local minimum is to introduce some kind of global permutation or variation, that doesn't always reduces the stress function, but might bridge us to a global minimum at the end. Genetic algorithm and simulated annealing, for instance, employ such strategy. The RPM algorithm also implicitly employs global strategy by using very large learning speed that vanishes gradually like the temperature in simulated annealing algorithm.

RPM algorithm also offers another strategy to avoid local minimum because of its use of curled finite dimensions. As mentioned above, there more we squeeze our data, the worse the local minimum problem will become. This also means that we can alleviate the local minimum problem by gradually squeeze data from high dimensional space to lower dimensional space.

More particularly, to create a 2-dimensional map for a dataset we can first map it first to 3-dimensional torus T³, then gradually reduces its height and ultimately change it to a 2-dimensional torus T² as depicted in the following picture:

The gradual dimensionality reduction is possible because the dimensions have finite size, so that we can change a dimension's size gradually to zero to effectively remove the dimension. In other words, we change our information space from dimension 3 to dimension 2.9, 2.8, 2.7, ... till 2.0. The partial dimension reflects the reduced size of the a selected curled dimension. More generally, we can use the same method to squezze a map from any high dimension space, dimension by dimension, into lower dimensional space.
With VisuMap software we can automated the gradual DR with a script. The following picture shows how RPM algorithm gradually squeezes a spherical dataset from T³ to T². In this way, we can expect that the 2-dimensional map is much closer to the global minimum.

Sunday, July 1, 2007

VisuMap and Google Earth

I have been a frequent user of Google Earth. It is really amazing to visit interesting places all over the world. However, the way Google Earth implemented the mouse navigation has caused me a little concern for some time.

With Google Earth (with default settings of 4.0.2091 or older) you zoom into a location by scrolling the mouse wheel forewards which gives you the feeling of move yourself closer to earth. I could not get used to this behavior. When I wanted to zoom in, I often made the mistake to scroll the mouse wheel in opposite direction. In VisuMap software we have several views that simulated the 3D space navigation that allows the user to explore data like flying within the data. After some consideration we have decided to implement the zoom-in navigation contrarily to Google Earth default method, we zoom in closer to data by scrolling the mouse wheel backwards.

For quite long time we have thought to change our navigation method to the one like Google Earth, since we don't want to spoil our user's experience. Google Earth is such a popular software, there must be a reason that they implemented the navigation method that way, we though.

To my great relief today, when I upgraded my Google Earth to version 4.0.2737, the first thing I noticed is that Google Earth changed its default zoom-in method to the way we always though is the better way and implemented in our software.

In retrospect, I guess Google Earth first wanted to simulate flying an airplane around. Most people would probably have thought so too. If you were flying an airplane with the mouse, you probably tend to scroll back to pull the airplane.(e.g. to zoom-out from the image). However, most people are not used to the experience of flying an airplane. We tend rather to use the mouse to control the object (i.e. the earth) before us. Thus, the new Google Earth version has followed the common experience. I have great respect for Google's engineers for making such small yet relevant change. This remind me of some countries which still drive on the left lane, I would be scared to drive on the left lane.

Visualizing High Dimensional Data

Tuesday, July 24, 2007

The symmetry between repulsive and attractive force

Friday, July 20, 2007

RPM, Curled Space and Dimensionality Reduction

Sunday, July 1, 2007

VisuMap and Google Earth

About Me

Blog Archive

Tweet