Visualizing High Dimensional Data: Dimension analysis on mulitdimensional scaling

This post brings forward an inconsistency problem in the study of multidimensional scaling from the perspective of dimensional analysis. This post is rather of philosophical nature. Nevertheless, I find it is worthwhile being aware of this issue
for further theoretical investigations.

Dimensional analysis is a simple and useful rule to check the consistence of a mathematical formula or equation. Roughly speaking, this rule demands that both sides of an equation (that has any physical or chemical bearings) must have the same dimension, or unit of measurement. So for instance the equation E = mC² is consistent with the dimensional analysis, since both sides of the equation have Energy as dimension. The equation E=mC is not consistent, since the right side would have momentum as dimension.

Similarly, if a formula is composed of multiple terms by addition, then all terms must have the same dimension. So for instance, the formula: S:=π·r² + r²/2 makes sense, but not S:=π·r² + r/2.

Now, let's turn to a problem of multidimensional scaling (MDS) from the perspective of the dimensional analysis. Roughly speaking, MDS maps data points from an abstract space with a distance function to a low dimensional, normally, Euclidean space like R². The goal of a MDS algorithm is to construct the map in a way, so that the distance in the low dimensional space approximates the distance in the original abstract space.

More formally, let D_ij be the distance between two data points in the original space, d_ij be the distance low dimensional space after MDS mapping; Then a MDS algorithm tries to minimize the following quantity (often called stress):

S := Σ ( D_ij - d_ij)²

Now, from the view point of dimensional analysis the stress function in its above form is not consistent, since D_ij and d_ij have arguably different dimensions as they measure distance in different spaces.

One work-around to solve this inconsistency issue is to add a special constant or apply a monotonous function on d_ij like:

S := Σ ( D_ij - C·d_ij)² or S := Σ ( D_ij - f(d_ij))²

The constant C or the function f should magically convert the dimension of d_ij to that of D_ij. This kind of ad-hoc techniques look pretty artificial. Most scientists probably won't like this method.

It should be noticed that the RPM (relational perspecitve map) algorithm avoided this inconsistency by using the following stress function:

S := Σ D_ij/d_ij

The dimensional analysis rule does not prohibit composing terms of different dimensions by multiplication. In above formula, both sides of the equation have a completely new dimension, that is the dimension of D_ij divided by the dimension of d_ij.

Visualizing High Dimensional Data

Thursday, March 5, 2009

Dimension analysis on mulitdimensional scaling

No comments:

About Me

Blog Archive

Tweet