This article, treating the derivation of the particle filter, marks the last part of the nonlinear filtering series. We will derive the particle filter algorithm directly from the equations of the Bayes filter. In the end, we will have the opportunity to play around with the particle filter with our toy example. If you haven’t read the introduction, I would recommend to read it first. Before we dive into the derivation, let’s try to state the main idea behind the particle filter.

The particle filter is a sample-based approximation of the Bayes filter. It is used to infer the current state of an arbitrary probabilistic state space model given all observations and inputs up to the current timestep and a prior distribution of the initial state.

Derivation

In this section, we will develop the method of particle filtering directly from the Bayes filter and mean field particle methods.

We learned that the recursive equations of the Bayes filter consist of the prediction step

$p(x_{t+1}|y_{0:t},u_{0:t}) = \int_{x_{t}} p(x_{t+1}|x_{t}, u_{t})p(x_{t}|y_{0:t},u_{0:t-1}) dx_{t}$

and the update step

$p(x_t|y_{0:t},u_{0:t-1}) = \frac{p(y_t|x_t)p(x_t|y_{0:t-1},u_{0:t-1})}{\int_{x_t}p(y_t|x_t)p(x_t|y_{0:t-1},u_{0:t-1})\,dx_t}$

and that the integrals, contained in these equations, are in general intractable. The main idea of the particle filter is to approximate the Bayes filter by approximating the current posterior $p(x_{t+1}|y_{0:t},u_{0:t})$ and $p(x_t|y_{0:t},u_{0:t-1})$ with empirical measures $\hat{p}(x_{t+1}|y_{0:t},u_{0:t})$ and $\hat{p}(x_t|y_{0:t},u_{0:t-1})$. Let’s take a brief look at empirical measures.

Empirical measure

The empirical measure of $p(x)$ is defined as

$p_N(x) = \frac{1}{N}\sum_{i=1}^{N} \delta_{x_i}(x),$

where $\delta_{x_i}(x)$ is an abbreviation for the shifted Dirac delta function $\delta(x-x_i)$ and $x_{1:N}$ are $N$ samples from $p(x)$. If the number of samples goes to infinity, the empirical measure will almost surely converge to the distribution $p(x)$. The following figure shows the distribution $p(x)$ in red and the empirical measure $p_N(x)$ in black.

Please be aware that the lines, representing the Dirac delta functions, would actually be infinitely high. But in order to visualize it, we set the length of the lines to be the area under the corresponding Dirac delta function. In our case, this area will be $\frac{1}{N}$.

Up until now, we assumed that each Dirac delta function is weighted uniformly. We can also define a weighted empirical measure by

$p_N(x) =\sum_{i=1}^{N} w_i\delta_{x_i}(x),$

with $\sum_{i=1}^{N}w_i = 1$.

In the following, we will call the tuple $(x_i, w_i)$, corresponding to the position and weight of the $i$th Dirac delta function, a particle. Now that we have an idea about empirical measures, we can directly start our derivation with the update step.

Update step

The first step is to replace the posterior distribution $p(x_t|y_{0:t},u_{0:t-1})$ with the empirical measure $\hat{p}(x_t|y_{0:t-1},u_{0:t-1}) = \frac{1}{N}\sum_{i=1}^N \delta_{\xi_t^i}(x_t) $. This empirical measure could look like this:

We plug our empirical measure into the update step and obtain

$\hat{q}(x_t|y_{0:t},u_{0:t-1}) = \frac{p(y_t|x_t)\frac{1}{N}\sum_{i=1}^N \delta_{\xi_t^i}(x_t)}{\int_{x_t} p(y_t|x_t)\frac{1}{N}\sum_{i=1}^N \delta_{\xi_t^i}(x_t) dx_t} .$

We are multiplying the empirical measure and the likelihood function pointwise in the numerator. This pointwise multiplication can be understood graphically:

$*$

$=$

If we look closely at the denominator, we see that we are integrating a function which is weighted by a Dirac delta function. Therefore, we can use the sifting property of the Dirac delta function to obtain

$\hat{q}(x_t|y_{0:t},u_{0:t-1}) = \frac{p(y_t|x_t)\sum_{i=1}^N \delta_{\xi_t^i}(x_t)}{\sum_{i=1}^N p(y_t|x_t=\xi_t^i)},$

where we canceled out the factor $\frac{1}{N}$. As a result, we have a weighted empirical measure

$\hat{q}(x_t|y_{0:t},u_{0:t-1}) =\sum_{i=1}^{N} w_i\delta_{\xi_t^i}(x_t),$

with weights

$w_i = \frac{p(y_t|x_t=\xi_t^i))}{\sum_{i=1}^N p(y_t|x_t=\xi_t^i)}.$

This leads us directly to the important resampling step.

Resampling step

In the resampling step, we are replacing our weighted empirical measure with an (unweighted) empirical measure. This is equivalent to sampling from a categorical distribution, where our weights are defining the probability mass function. Therefore, the probability of drawing particles with high weights is higher than drawing particles with low weights. As a result, it is possible that particles with low weights are not drawn at all and particles with high weights are drawn multiple times. You may ask, why we are doing this weird resampling step? The main idea is quite simple: We only have a limited number of particles, therefore, we want to discard hypotheses that have low probability. Otherwise, weights of some of the particles could get close to zero and would become useless.

It can also be interpreted as a selection step in the context of evolutionary algorithms. Only the fittest particles are able to produce the next generation of particles.

As a result of the resampling step, we will obtain $N$ particles $(\hat{\xi} _ t^{i},\frac{1}{N})_{1:N}$, which were drawn from

$\hat{\xi}_t^i \sim \hat{q}(x_t|y_{0:t},u_{0:t-1}).$

Prediction step

Let’s look at the prediction step and replace the posterior with our empirical measure from the update step:

$\hat{q}(x_{t+1}|y_{0:t},u_{0:t}) = \int_{x_{t}} p(x_{t+1}|x_{t}, u_{t})\frac{1}{N}\sum_{i=1}^N \delta_{\hat{\xi}_t^i}(x_t) dx_{t}.$

By changing the order of the sum and integral

$\hat{q}(x_{t+1}|y_{0:t},u_{0:t}) = \frac{1}{N}\sum_{i=1}^N\int_{x_{t}} p(x_{t+1}|x_{t}, u_{t}) \delta_{\hat{\xi}_t^i}(x_t) dx_{t},$

we note that we can use the sifting property again and finally obtain

$\hat{q}(x_{t+1}|y_{0:t},u_{0:t}) = \frac{1}{N}\sum_{i=1}^N p(x_{t+1}|x_{t} = \hat{\xi}_t^i, u_{t}).$

Therefore, the distribution $\hat{q}(x_{t+1}|y_{0:t},u_{0:t})$ is a weighted sum of continuous distributions over $x_{t+1}$, which is shown schematically in the following figure.

We note, that even if we have an empirical measure as current belief distribution, we will obtain a continuous distribution for our new belief.

Fortunately, we already know how to obtain an empirical measure of a continous distribution: We simply have to sample from it. But how can we sample from our new belief?

First, we take a sample from our current empirical posterior distribution. Based on this sample we can sample from the system distribution to obtain a sample from the new posterior. This sampling procedure samples exactly from our new continuous posterior distribution.

Nice! We are essentially done, we expressed the update and prediction step in terms of empirical measures.

But one thing seems a little bit weird. In order to obtain the empirical measure of the new posterior in the prediction step, we have to sample from the current posterior, that was obtained from the resampling step. But in the resampling step, we already sampled from our posterior distribution. Therefore, we can take the particles of the current posterior directly as our samples.

Let’s summarize the algorithm of the particle filter.

Algorithm of the particle filter

The algorithm is initialized with a set of $N$ particles $(\xi_0^{i},\frac{1}{N})_{1:N}$.

We are taking the update step and obtain the distribution

$\hat{q}(x_t|y_{0:t},u_{0:t-1}) = \frac{p(y_t|x_t)\frac{1}{N}\sum_{i=1}^N \delta_{\xi_t^i}(x_t)}{\frac{1}{N}\sum_{i=1}^N p(y_t|x_t=\xi_t^i)},$

which is equivalent to the particles $(\xi _ t^{i},\frac{p(y_t|x_t=\xi_t^i)}{\sum_{i=1}^N p(y_t|x_t=\xi_t^i)})_{1:N}$.

In the resampling step, we are taking $N$ samples

$\hat{\xi}_t^i \sim \hat{q}(x_t|y_{0:t},u_{0:t-1})$

and obtain new particles $(\hat{\xi} _ t^{i},\frac{1}{N})_{1:N}$.

Finally, we are taking the prediction step. We obtain the distribution

$\hat{q}(x_{t+1}|y_{0:t},u_{0:t}) = \frac{1}{N}\sum_{i=1}^N p(x_{t+1}|x_{t} = \hat{\xi}_t^i, u_{t})$

and sample from it to get new particles $(\xi _ {t+1}^{i},\frac{1}{N})_{1:N}$.

The process starts again with the update step.

Now that we are finally done with the derivation, let’s see how the particle filter performs in our toy example.

Example

Backward

No action

Forward

Predict step

Observe

Update step

Resampling

On the outside of the race track, you will notice uniformly distributed blue dots. These dots are our particles and, therefore, represent a set of hypotheses of the position of the race car. By pressing the OBSERVE button two things will happen: first, we will take a measurement to the distance of the tree and second, we will display the likelihood for this observed distance on the brown strip inside the race track. By pressing the UPDATE STEP button, we will perform our update step. The size of the blue dots will change according to their weighting. By pressing the RESAMPLING button, we are performing the resampling step. As a result, you will notice that only the dots with high weights remains. Now we are ready for the next time step. Take an action, by pressing the corresponding button below the race track. After the step is performed, you have to update your posterior by pressing the PREDICT STEP button. You will see that the dots will change accordingly to your chosen action. Now we finished one full cycle of the filtering process and are ready to start a new cycle by taking a measurement.

If you want to reset the environment, just press the reset button in the bottom left corner or press the R button on your keyboard. As before you can control the car by using your keyboard: A (Backward), S (Stop), D (Forward) or the buttons below the race track.

Acknowledgement

The vector graphics of the car were created by Freepik.