Observability is an important concept of classical control theory. Quite often it is motivated by abstract concepts, that are not intuitive at all. In this article, we will take a look at observability from a Bayesian perspective and will find a natural interpretation of observability.

Let’s begin by stating the definition of observability from classical control theory.

Formally, a system is said to be observable, if for any possible sequence of state and control vectors (the latter being variables whose values one can choose), the current state (the values of the underlying dynamically evolving variables) can be determined in finite time using only the outputs.

We can easily translate this definition into the language of Bayesian inference:

A system is said to be observable if for any possible initial state and sequence of control vectors, the probability mass of the posterior of the current state will collapse into a single point in finite time.

Normally, we are using the idea of observability in the setting of deterministic time-invariant linear state space models, which are defined by

with state \(x_t\), output \(y_t\), input \(u_t\), system matrix \(A\), input matrix \(B\) and output matrix \(C\).

Based on the methods shown in the last post about linear algebra with Gauss and Bayes, we can reformulate these deterministic equations with Gaussian distributions

and

where \(\delta \to 0\).

Now that we arrived at a probabilistic description, we can use Bayesian inference to infer the current state \(x_t\). In particular, we are interested in the uncertainty of our estimate of the current state: Our system will be observable if the covariance of the estimate will go to zero.

Derivation

First of all, we are defining a Gaussian prior distribution of the initial state \(x_0\)

The choice of the mean and covariance are actually arbitrary, as long as the covariance is positive definite.

Now let’s plug our distributions into the equations of the Bayes filter, which are described by the prediction step

and the update step

Fortunately, we already know how to do inference in linear Gaussian state space models. We can simply use the equations of the Kalman filter and obtain the following equations for the prediction step

and the update step

If this was too fast, please check out the earlier blog post on Kalman filtering.

Observability depends only on the covariance of the estimates \( P \). Therefore, the question of observability of a linear state space model is reduced to the question, if the equations

are going to transform an arbitrary positive definite initial covariance matrix \(P_0\) to 0.

When we combine the prediction and update to a single equation

and look very closely we can identify the discrete-time algebraic Ricatti equation

Let’s try to interpret what our two equations are doing with the covariance estimate \(P\). As a mental model, it is helpful to imagine the particular covariance matrices as subspaces. We begin with our prior variance \(P_0\). We have selected our prior variance in such a way, that it describes the entire state space.

We are starting by taking an update step. The update step can be interpreted as calculating the intersection of the prior subspace and the subspace defined by all points \(x\) that map to the observed output \(y_0\). We will call this last subspace the inverse subspace. We took the intersection of the whole state space and inverse subspace. As a result, our posterior will be simply the inverse subspace. Let’s see what happens, if we take the prediction step. If we assume that \(A\) has full rank, the dimensionality of the subspace will remain the same. Depending on the matrix \(A\) two things can happen:

  1. The transformation won’t change the subspace, but only the representation of the subspace. It is an invariant subspace with respect to transformation \(A\).
  2. The transformation is changing the subspace.

Depending on these two cases we will have two cases for the next update step:

The transformation didn’t change the subspace. In this case, the update step would have no effect, because we are intersecting again with the same inverse subspace. Formally, after the prediction step our posterior would still be the orthogonal projector onto the kernel of \(C\)

We know that \(C(I - C^+C) = 0\) and \((I - C^+C)C^T = 0\), therefore, our update step simplifies to

It seems, that we can’t rid of this unobservable subspace. Therefore, we have no observability.

The transformation did change the subspace. In this case, the intersection with the inverse subspace will again have an effect. The dimensionality of the posterior subspace will get smaller.

We have to repeat the process of prediction and updating until the subspace of our posterior has either dimension zero or the prediction step is again not changing our subspace. In the first case, we have no uncertainty: The system is observable. In the second case, we identified a invariant subspace. Therefore, the system is not observable.

Summary

In this post, we looked at the concept of observability from a Bayesian standpoint. We found an intuitive way to reason about the effect of the update step and prediction step in terms of subspaces, described by covariance matrices.