Nonlinear filtering: Introduction
This post will start a series of articles that will treat common nonlinear filtering methods that are based on the Bayes filter. The motivation is to provide an intuitive understanding of these methods by deriving them directly from the general Bayes filter. This derivation is done in steps, that are supposed to be as atomic as possible. Furthermore, each nonlinear filtering method will be shown in action by providing an interactive example to play around with. This series will require some basic knowledge in math. Especially in linear algebra and probability theory.
Introduction
Recently, I wrote an article about the derivation of the Kalman filter. When we are using the Kalman filter we assume that our models are linear Gaussian state space models. In this scenario, we are lucky because we can compute everything analytically in closed form. But at some point in time, we have to face the cruel truth that the real world is normally anything but linear. With nonlinear models, it is not possible to compute the equation of the Bayes filter in closed form, anymore. Nonetheless, we are still lucky, because there are many methods that provide approximations of the Bayes filter. Unfortunately, learning about those methods can be quite frustrating, many resources simply state the corresponding equations and are not providing any further explanations and intuitions. This series of blog posts is an attempt to fix this and provide derivations of these methods directly from the equations of the Bayes filter. Furthermore, presenting merely concatenations of equations can be very frustrating and tiring as well. Thus, the derivations are enriched with figures and examples to obtain a better understanding of what we are actually doing. The current plan is to write articles about the following methods:
In this first post, I will take the chance to introduce the concept of the Bayes filter and to present the running example for the rest of this series. Let’s get started!
Bayes filter
Let’s try to state the main idea of the Bayes filter in a compact manner.
The Bayes filter is used to infer the current state of a probabilistic state space model given all observations and inputs up to the current timestep and a prior distribution of the initial state.
We define our probabilistic state space model by
and an initial state distribution \(p(x_0)\). In the figure below, our model is visualized as a probabilistic graphical model.
Knowledge about probabilistic graphical models is not required to be able to follow this series, but it can be quite helpful for understanding the dependencies between the state, input and output.
Now that we have briefly introduced the state space model, we can formulate the objective of Bayes filters in a more rigorous way: We want to calculate \(p(x_t|y_{0:t},u_{0:t-1})\) given the prior initial distribution \(p(x_0)\), where the shorthand \(y_{n:m}\) stands for the set \(y_n,…,y_m\). With help of the basic rules of probability, we can find a way to calculate this expression in a recursive fashion. This recursion is composed of two distinct steps: the update and predict step.
Recursive formula of the Bayes filter
The recursive formula for the Bayes filter in state space models consists of the prediction step
and the update step
The recursion is initialized with a prior distribution over the initial state \(p(x_0)\).
We will use the definition above as our starting point for the derivations of the particular nonlinear filtering methods. But first, it’s time to introduce the example, that will stay with us for the rest of this series.
Interactive example: Race track
The example that will accompany us for the rest of this series, consists of a race car that can drive around a race track. The race car can be controlled by the discrete actions of forward, backward and stop. After the action is taken some process noise is added. Therefore, we won’t know exactly where the race car will be after we have taken this action. Inside the race course is a tree, which serves as a natural landmark. The race car has a distance meter on board, which obtains noisy measurements of the distance to this tree. The following graphic shows the race car in action. Please use your keyboard to set the input of the race car: A (Backward), S (Stop), D (Forward). Alternatively, you can use the buttons below.
Now that we have a good idea how the model will behave, you have the choice of either going directly to the first post about the grid-based filter or to learn more about the model of the dynamic \(p(x_{t+1}|x_{t}, u_{t})\) and observation \(p(y_t|x_t)\) of the example in the following section.
Simulation model
If you want to apply Bayes filters to real-life scenarios, a mathematical model of your system is needed. In general, you will have to learn it from data or derive it directly from the laws of physics. In our case we are lucky, because we are the designer of the real model and simply will use this exact model. Let’s start by taking a closer look at the system dynamic.
System dynamic
The state \(x_t\) of the system is the current position on the race track. It is defined as the path length from the start to the current position following the race track. Having only the position without the velocity as a state representation is not very realistic, because it is not possible to model momentum. But for the sake of simplicity and visualisation, we will go without it. The input \(u_t\) is either \(-1\), \(0\) or \(1\) for backward, stop and forward.
Our system dynamics are defined by a Gaussian distribution
with nonlinear mean \(\mu_s(x_t, u_t)\) and variance \(\sigma_s^2(x_t)\). To obtain the mean of the next state \(x_{t+1}\) the input \(u_t\), which is weighted by \(v\) and \(b(\kappa)\), is simply added to the current state \(x_t\):
The factor \(v\) is the velocity of the car, we will call it the step size. Intuitively, by multiplying \(u_t\) with the step size \(v\) you map the input from action space to the race track space. The weighting factor \(b(\kappa) = e^{- c\kappa}\), with the hand-tweaked parameter \(c\), is used to model a more realistic driving behavior that depends on the curvature
of the track at the current position \(x_t\). The function \(L(x_t)\) maps the current position of the car \(x_t\) to \((x,y)\) coordinates
Intuitively, if we are in a sharp curve the curvature is low and we drive faster. If we are on a more straight part of the track, the curvature is high and we drive faster.
The variance of the next state \(x_{t+1}\) is defined by
The variance depends mainly on the curvature \(\kappa\) at the current position. If we have a low curvature, the mean of the next step will be larger. But if we take a larger step it is natural to assume, that we have a higher variance. The hand-tweaked parameter \(d\) and the step size \(v\) are fixed for the whole environment. Please be aware, that the variance is not depending on the input \(u_t\) itself, but only on the step size \(v\).
If you move your mouse or your finger over the race track below, you will notice a small blue strip outside the race track. This is the probability density of our dynamic model \(p(x_{t+1}|x_t,u_t)\). Therefore, it shows the distribution over the next state \(x_{t+1}\) given the current state \(x_t\) and action \(u_t\). If you want to check out the distribution for other inputs, you can use again your keyboard or the buttons below the race track.
Observation model
The race car has a distance meter on board, which will provide us with noisy measurements of the distance to the tree inside the race track, where the position of the tree in \((x,y)\) coordinates is defined as \(T = (T_x, T_y)\). As in the model of the system dynamics, we will model the uncertainty of the measurement with a Gaussian distribution
with nonlinear mean \(\mu_o(x_t)\) and variance \(\sigma_o^2(x_t)\).
The mean of our observation is the exact distance to the tree
where \(d(\cdot,\cdot)\) is defined as the Euclidean distance.
The variance of our measuring device
depends on the distance as well. The farther we are away from the tree, the more noise will be present in the signal. The parameter \(a\) is again hand-tweaked and constant.
By moving with your mouse or finger over the race track below you will notice two things. Again, there appears a strip, this time inside the race track and in a brownish color. Furthermore, you will notice another light green strip at the trunk of the tree. Both strips are showing parts of our observation model \(p(y_t|x_t)\). The light green strip shows the probability of observing a distance measurement at our current position. Not surprisingly, we find the maximum probability density at the true distance of tree. Like stated above, the variance is varying depending on the distance to the tree. The brownish strip answers another question: Given the true distance \(d\) at our current position, what is the probability of obtaining this measurement \(d\) from another position on the race track. It represents \(p(y_t=d|x_t)\) and is, therefore, a function of the race track \(x\). Please be aware, that this is not a proper probability density, because it is not integrating to 1.
Now that we know the inner workings of our model, we are well prepared to start the series with the grid-based filter.