We operate through the lens of ordinary differential equations and control
theory to study the concept of observability in the context of neural
state-space models and the Mamba architecture. We develop strategies to enforce
observability, which are tailored to a learning context, specifically where the
hidden states are learnable at initial time, in conjunction to over its
continuum, and high-dimensional. We also highlight our methods emphasize
eigenvalues, roots of unity, or both. Our methods effectuate computational
efficiency when enforcing observability, sometimes at great scale. We formulate
observability conditions in machine learning based on classical control theory
and discuss their computational complexity. Our nontrivial results are
fivefold. We discuss observability through the use of permutations in neural
applications with learnable matrices without high precision. We present two
results built upon the Fourier transform that effect observability with high
probability up to the randomness in the learning. These results are worked with
the interplay of representations in Fourier space and their eigenstructure,
nonlinear mappings, and the observability matrix. We present a result for Mamba
that is similar to a Hautus-type condition, but instead employs an argument
using a Vandermonde matrix instead of eigenvectors. Our final result is a
shared-parameter construction of the Mamba system, which is computationally
efficient in high exponentiation. We develop a training algorithm with this
coupling, showing it satisfies a Robbins-Monro condition under certain
orthogonality, while a more classical training procedure fails to satisfy a
contraction with high Lipschitz constant.
Cet article explore les excursions dans le temps et leurs implications.
Télécharger PDF:
2504.15758v1