Nonlinear filtering: Extended Kalman filter
This article is the second part of the nonlinear filtering series, where we will derive the extended Kalman filter with non-additive and additive noise directly from the recursive equations of the Bayes filter. If you haven’t read the introduction, I would recommend to read it first. Before we dive into the derivation, let’s try to state the main idea behind extended Kalman filter.
The extended Kalman filter approximates the Bayes filter by linearizing the system and observation equations.
Derivation
We will start the derivation directly from the recursive equations of the Bayes filter with the prediction step
and the update step
The extended Kalman filter is normally formulated with nonlinear functions with additive noise. In this article, we directly derive the general case for non-additive noise and obtain the extended Kalman filter as a special case. Therefore, our equations of the system are
with Gaussian process noise \( w_t \sim \mathcal{N}(w_t|0, Q_t) \) and Gaussian observation noise \( v_t \sim \mathcal{N}(v_t|0, R_t) \).
In our formula of the Bayes filter, we can’t find any functions \(f(x_t, u_t, w_t)\) or \(h(x_k, v_k)\). Therefore, our first step will be to express these functions as probability distributions \(p(x_{t+1}|x_{t}, u_{t})\) and \(p(y_t|x_t)\). The next box will show how to achieve this in general. Warning: Distributions are very weird and the following treatment is not rigorous.
We want to calculate \(p(y|x)\) given the function \(y=f(x,z)\) and \(p(z)\). We can express the deterministic function \(y=f(x,z)\) as probability distribution
with the Dirac delta function \(\delta(x)\).
In order to calculate \(p(y|x)\) we can simply marginalize out \(z\):
By using the composition rule of the Dirac delta function, we can express this as
where the sum goes over all \(z_i\) which satisfy the equation \(y = f(x,z)\).
In our case, we can express our system function \(x_{t+1} = f(x_t,u_t,w_t)\) as a probability distribution
where the sum goes over all \(w_i\) which satisfy \(x_{t+1} = f(x_t,u_t,w_t)\).
Similarly, we can express our observation function \(y_t = h(x_k, v_k)\) as probability distribution
where the sum goes over all \(v_i\) which satisfy \(x_{t+1} = h(x_t,v_t)\).
Even if we are assuming Gaussian process and measurement noise, the resulting distributions of our model will, in general, be non-Gaussian. This is where the extended Kalman filter comes into play. It approximates our probability distributions by using linearization. In my limited scope, linearization of a probability distribution makes no sense. But what is linearized instead?
Instead of linearizing our probability distributions we will do this with our deterministic functions before we express them as probability distributions.
In order to linearize, we are performing a first-order Taylor expansion
around the current state estimate \(x_t = \hat x_{t|t}\), current input \(u_t=u\) and zero process noise \(w_t=0\).
To unclutter the notation, we define \(A_t = \nabla_{x_t} f(x_t, u_t, w_t)^T|_ {x_t=\hat x_{t|t},u_t=u,w_t=0}\) , \(B_t = \nabla_{u_t} f(x_t, u_t, w_t)^T|_ {x_t=\hat x_{t|t},u_t=u,w_t=0}\) and \(L_t = \nabla_{w_t} f(x_t, u_t, w_t)^T|_ {x_t=\hat x_{t|t},u_t=u,w_t=0}\) to finally obtain the much cleaner representation
Now we are ready to transform our linearized deterministic system function into a probability distribution. We will use the same formula as above, but replace the true function \(f(x_t, u_t, w_t)\) with \(\hat{f}(x_t, u_t, w_t)\), and arrive at
where the sum is over all \(w_i\) which satisfy \(x_{t+1} = \hat{f}(x_t, u_t, w_t)\). Let’s try to simplify this expression! First, we notice that if the matrix \(L_t\) is invertible, then there is exactly one \(w\) that satisfies \(x_{t+1} = \hat{f}(x_t, u_t, w_t)\). We can express it as
Note, that if the martrix \(L_t\) is not invertible, we would have to integrate over the whole null space and the sum would become an integral.
Next, let’s look at the denominator. We can express the derivative of \(\hat{f}(x_t, u_t, w_t)\) with respect to \(w_t\) as
Let’s plug in the information our new information about \(w\) and the derivative:
We apply the transformation identity (Formula 35, Toussaint):
And apply it once more, to obtain our final result:
We are done! The linearization of our function has lead us back to Gaussianity!
With the same strategy, we obtain for our observation model
with \(C_t = \nabla_{x_t} h(x_t, v_t)^T|_ {x_t=\hat x_{t|t-1},v_t=0}\) and \(M_t = \nabla_{v_t} h(x_t, v_t)^T|_ {x_t=\hat x_{t|t-1},v_t=0}\)
Our linearization led to a Gaussian system and observation model. Therefore, the distribution of the updated state estimate
and the distribution of the predicted state estimate
will be Gaussians as well.
Now we are ready to plug our surrogate into the equations of the Bayes filter:
Prediction step
Update step
Let’s try to simplify these equations!
Prediction step
We will start with the prediction step
To find an expression for our prediction step we can simply use the propagation formula (Formula 37, Toussaint)
By comparison with our expression, we see that
Update step
We will start to simplify the update step
by focussing on the numerator first. We notice that we can rewrite it as a joint distribution (Formula 39, Toussaint)
Then again, this joint distribution can be rewritten as
We can combine the two previous equations to the following expression
By comparison with the numerator of our update step, we obtain
which simplifies to
We applied the same trick as in the derivation of the Kalman filter: Conceptually, we only transformed
If we look closely at the final expression, we see that \(p(y)\) is canceling out. Therefore, the result is simply the remaining part
If our reasoning is correct the denominator should be equal to \(\mathcal{N}(y_{t}|h(x, 0),M_t^TR_tM_t + C_tP_{t|t-1}C_t^T)\), which was canceled out. The denominator can be simplified with the propagation formula (Formula 37, Toussaint)
Yay! We see, that the denominator is exactly the same as the canceled factor in the numerator.
Let’s summarize our results:
Extended Kalman filter with non-additive noise
The recursive formula for the extended Kalman filter with non-additive noise consists of the prediction step
and the update step
with
That’s it! We derived the equations of the extended Kalman filter. To bring the equations in a more implementation friendly form, we are restating the extended Kalman filter as:
Extended Kalman filter with non-additive noise
The recursive formula for the extended Kalman filter with non-additive noise consists of the prediction step
and the update step
with
As promised we will also look at the special case with additive noise. Therefore, our functions will look like:
In this case, the matrix \(L_t\) and \(M_t\) will be identity matrices:
Finally, we can state the equations of the extended Kalman filter with additive noise.
Extended Kalman filter with additive noise
The recursive formula for the extended Kalman filter with additive noise consists of the prediction step
and the update step
with
Example
Enough of the dry theory! Let’s play around with the grid-based filter in our race track example.
On the outside of the race track, you will notice a blue colored strip. This strip represents our current posterior of the current position of the race car. We will start with a Gaussian prior distribution around the true mean. By pressing the OBSERVE button two things will happen: first, we will take a measurement of the distance to the tree and second, we will display the likelihood for this observed distance on the brown strip inside the race track. By pressing the UPDATE STEP button, we will perform our update step and show the resulting posterior at the outer strip. Now we are ready for the next time step. Take an action, by pressing the corresponding button below the race track. After the step is performed, you have to update your posterior by pressing the PREDICT STEP button. You will see that the outer strip will change accordingly. Now we finished one full cycle of the filtering process and are ready to start a new cycle by taking a measurement.
What if our distance meter is not working anymore? By either clicking on the tree or pressing the W button on your keyboard, you can turn off your measurement device. Therefore, the observation and update step will be skipped. The tree will become opaque, if your measurement device is turned off.
If you want to reset the environment, just press the reset button in the bottom left corner or press the R button on your keyboard. As before you can control the car by using your keyboard: A (Backward), S (Stop), D (Forward) or the buttons below the race track.
We see, that it is actually working pretty well. But one thing seems particularly weird: At certain positions on the race track, the brownish inner strip (our likelihood) seems to be uniformly distributed. This is not a bug, but a shortcoming of the linearization. We will always experience this behavior, if the part of the road at our current posterior mean is pointing only in tangential direction, with the tree as the center. In other words: A small change in position wouldn’t change the distance. When we are linearizing, we assume this local behavior applies for the whole system and we won’t get any new information out of our measurement. To get an intuitive understanding of this, you can imagine two parallel lines. Our car is driving along one of the parallel lines and we take the nearest distance to the other line as a measurement. This measurement would be uninformative because the distance to the parallel line is always the same.
Are you curious about the derivation of unscented Kalman filter? Then you should definitely check out the next article of the nonlinear filter series covering the unscented Kalman filter. See you there!