The RPM (relational perspective map) algorithm has a seemly add-hoc measure that gradually decreases the learning rate to force the convergence of the algorithm. I have tried to avoid such measure with various methods (e.g. conjugated algorithm exploring second order gradient), but was not able to find anything working.
I have notice a significant problem here: In the original Newton-Raphson algorithm, the algorithm should converge to a local minimum if the learning rate is sufficiently small. But, this not so for the optimization problem of RPM. The algorithm never converges, no matter how small the learning rate is (in a way that is doable with double precision arithmetic). If I keep the learning rate constant at certain small value, the simulated dynamic system goes in to an equilibrium state with constant variance.
Thus, it looks like NR method, as it is, does not work for massively complex dynamic system, it is not stable enough. Letting the learning rate gradually vanish seems to be the simplest method to enforce the convergence.
Mathematically, a very fine granulated gradient descent algorithm should be able to find a local minimum. But, I think this is practically not possible. To understand this, we notice that the learning rate practically controls the average variance of the positions of the simulated particles. That means, the learning rate is a kind of "temperature" of the system. In physics, we know that if we keep the temperature constant, the system will never converge to a frozen state. We can only reach that state when we gradually reduce the temperature to zero.
It is interesting to notice that the cooling dynamic system simulated by RPM is theoretically equivalent to a n-body dynamic system that is ever expanding. To see this, we notice that the learning rate is directly linked to the average variance of the particle's positions. Thus, instead of gradually lowering learning rate, we can just reduce the scale of unit length of the space compared to total size of the space (e.g. the size of the torus). The latter is again equivalent to a system in which we keep the scale of unit length constant, but gradually expand the size of the whole space.
We see here certain similarity of RPM model with the inflationary big-bang model from cosmology. Whereas RPM keeps the total size of the system stationary and gradually reduces the energy, the big-bang model keeps the total energy constant but let the universe expands. It is probably not a too crazy idea to consider a dual big-bang model that keeps the apparent universe stationary, but magically reduces the total energy/mass/temperature.
Friday, April 4, 2008
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment