This article is the second part of the nonlinear filtering series, where we will derive the extended Kalman filter with non-additive and additive noise directly from the recursive equations of the Bayes filter. If you haven’t read the introduction, I would recommend to read it first. Before we dive into the derivation, let’s try to state the main idea behind extended Kalman filter.

The extended Kalman filter approximates the Bayes filter by linearizing the system and observation equations.

Derivation

We will start the derivation directly from the recursive equations of the Bayes filter with the prediction step

and the update step

The extended Kalman filter is normally formulated with nonlinear functions with additive noise. In this article, we directly derive the general case for non-additive noise and obtain the extended Kalman filter as a special case. Therefore, our equations of the system are

with Gaussian process noise \( w_t \sim \mathcal{N}(w_t|0, Q_t) \) and Gaussian observation noise \( v_t \sim \mathcal{N}(v_t|0, R_t) \).

In our formula of the Bayes filter, we can’t find any functions \(f(x_t, u_t, w_t)\) or \(h(x_k, v_k)\). Therefore, our first step will be to express these functions as probability distributions \(p(x_{t+1}|x_{t}, u_{t})\) and \(p(y_t|x_t)\). The next box will show how to achieve this in general. Warning: Distributions are very weird and the following treatment is not rigorous.

We want to calculate \(p(y|x)\) given the function \(y=f(x,z)\) and \(p(z)\). We can express the deterministic function \(y=f(x,z)\) as probability distribution

with the Dirac delta function \(\delta(x)\).

In order to calculate \(p(y|x)\) we can simply marginalize out \(z\):

By using the composition rule of the Dirac delta function, we can express this as

where the sum goes over all \(z_i\) which satisfy the equation \(y = f(x,z)\).

In our case, we can express our system function \(x_{t+1} = f(x_t,u_t,w_t)\) as a probability distribution

where the sum goes over all \(w_i\) which satisfy \(x_{t+1} = f(x_t,u_t,w_t)\).

Similarly, we can express our observation function \(y_t = h(x_k, v_k)\) as probability distribution

where the sum goes over all \(v_i\) which satisfy \(x_{t+1} = h(x_t,v_t)\).

Even if we are assuming Gaussian process and measurement noise, the resulting distributions of our model will, in general, be non-Gaussian. This is where the extended Kalman filter comes into play. It approximates our probability distributions by using linearization. In my limited scope, linearization of a probability distribution makes no sense. But what is linearized instead?

Instead of linearizing our probability distributions we will do this with our deterministic functions before we express them as probability distributions.

In order to linearize, we are performing a first-order Taylor expansion

around the current state estimate \(x_t = \hat x_{t|t}\), current input \(u_t=u\) and zero process noise \(w_t=0\).

To unclutter the notation, we define \(A_t = \nabla_{x_t} f(x_t, u_t, w_t)^T|_ {x_t=\hat x_{t|t},u_t=u,w_t=0}\) , \(B_t = \nabla_{u_t} f(x_t, u_t, w_t)^T|_ {x_t=\hat x_{t|t},u_t=u,w_t=0}\) and \(L_t = \nabla_{w_t} f(x_t, u_t, w_t)^T|_ {x_t=\hat x_{t|t},u_t=u,w_t=0}\) to finally obtain the much cleaner representation

Now we are ready to transform our linearized deterministic system function into a probability distribution. We will use the same formula as above, but replace the true function \(f(x_t, u_t, w_t)\) with \(\hat{f}(x_t, u_t, w_t)\), and arrive at

where the sum is over all \(w_i\) which satisfy \(x_{t+1} = \hat{f}(x_t, u_t, w_t)\). Let’s try to simplify this expression! First, we notice that if the matrix \(L_t\) is invertible, then there is exactly one \(w\) that satisfies \(x_{t+1} = \hat{f}(x_t, u_t, w_t)\). We can express it as

Note, that if the martrix \(L_t\) is not invertible, we would have to integrate over the whole null space and the sum would become an integral.

Next, let’s look at the denominator. We can express the derivative of \(\hat{f}(x_t, u_t, w_t)\) with respect to \(w_t\) as

Let’s plug in the information our new information about \(w\) and the derivative:

We apply the transformation identity (Formula 35, Toussaint):

And apply it once more, to obtain our final result:

We are done! The linearization of our function has lead us back to Gaussianity!

With the same strategy, we obtain for our observation model

with \(C_t = \nabla_{x_t} h(x_t, v_t)^T|_ {x_t=\hat x_{t|t-1},v_t=0}\) and \(M_t = \nabla_{v_t} h(x_t, v_t)^T|_ {x_t=\hat x_{t|t-1},v_t=0}\)

Our linearization led to a Gaussian system and observation model. Therefore, the distribution of the updated state estimate

and the distribution of the predicted state estimate

will be Gaussians as well.

Now we are ready to plug our surrogate into the equations of the Bayes filter:

Prediction step

Update step

Let’s try to simplify these equations!

Prediction step

We will start with the prediction step

To find an expression for our prediction step we can simply use the propagation formula (Formula 37, Toussaint)

By comparison with our expression, we see that

Update step

We will start to simplify the update step

by focussing on the numerator first. We notice that we can rewrite it as a joint distribution (Formula 39, Toussaint)

Then again, this joint distribution can be rewritten as

We can combine the two previous equations to the following expression

By comparison with the numerator of our update step, we obtain

which simplifies to

We applied the same trick as in the derivation of the Kalman filter: Conceptually, we only transformed

If we look closely at the final expression, we see that \(p(y)\) is canceling out. Therefore, the result is simply the remaining part

If our reasoning is correct the denominator should be equal to \(\mathcal{N}(y_{t}|h(x, 0),M_t^TR_tM_t + C_tP_{t|t-1}C_t^T)\), which was canceled out. The denominator can be simplified with the propagation formula (Formula 37, Toussaint)

Yay! We see, that the denominator is exactly the same as the canceled factor in the numerator.

Let’s summarize our results:

Extended Kalman filter with non-additive noise

The recursive formula for the extended Kalman filter with non-additive noise consists of the prediction step

and the update step

with

That’s it! We derived the equations of the extended Kalman filter. To bring the equations in a more implementation friendly form, we are restating the extended Kalman filter as:

Extended Kalman filter with non-additive noise

The recursive formula for the extended Kalman filter with non-additive noise consists of the prediction step

and the update step

with

As promised we will also look at the special case with additive noise. Therefore, our functions will look like:

In this case, the matrix \(L_t\) and \(M_t\) will be identity matrices:

Finally, we can state the equations of the extended Kalman filter with additive noise.

Extended Kalman filter with additive noise

The recursive formula for the extended Kalman filter with additive noise consists of the prediction step

and the update step

with

Example

Enough of the dry theory! Let’s play around with the grid-based filter in our race track example.

Backward
No action
Forward
Predict step
Observe
Update step

On the outside of the race track, you will notice a blue colored strip. This strip represents our current posterior of the current position of the race car. We will start with a Gaussian prior distribution around the true mean. By pressing the OBSERVE button two things will happen: first, we will take a measurement of the distance to the tree and second, we will display the likelihood for this observed distance on the brown strip inside the race track. By pressing the UPDATE STEP button, we will perform our update step and show the resulting posterior at the outer strip. Now we are ready for the next time step. Take an action, by pressing the corresponding button below the race track. After the step is performed, you have to update your posterior by pressing the PREDICT STEP button. You will see that the outer strip will change accordingly. Now we finished one full cycle of the filtering process and are ready to start a new cycle by taking a measurement.

What if our distance meter is not working anymore? By either clicking on the tree or pressing the W button on your keyboard, you can turn off your measurement device. Therefore, the observation and update step will be skipped. The tree will become opaque, if your measurement device is turned off.

If you want to reset the environment, just press the reset button in the bottom left corner or press the R button on your keyboard. As before you can control the car by using your keyboard: A (Backward), S (Stop), D (Forward) or the buttons below the race track.

We see, that it is actually working pretty well. But one thing seems particularly weird: At certain positions on the race track, the brownish inner strip (our likelihood) seems to be uniformly distributed. This is not a bug, but a shortcoming of the linearization. We will always experience this behavior, if the part of the road at our current posterior mean is pointing only in tangential direction, with the tree as the center. In other words: A small change in position wouldn’t change the distance. When we are linearizing, we assume this local behavior applies for the whole system and we won’t get any new information out of our measurement. To get an intuitive understanding of this, you can imagine two parallel lines. Our car is driving along one of the parallel lines and we take the nearest distance to the other line as a measurement. This measurement would be uninformative because the distance to the parallel line is always the same.

Are you curious about the derivation of unscented Kalman filter? Then you should definitely check out the next article of the nonlinear filter series covering the unscented Kalman filter. See you there!

Acknowledgement

The vector graphics of the car were created by Freepik.