This article is a brief summary of some relationships between the log-likelihood, score, Kullback-Leibler divergence and Fisher information. No explanations, just pure math.
Log-likelihood
The log-likelihood is defined the logarithm of the likelihood
Let’s perform a Taylor approximation of the log-likelihood \(\log p(x|\theta’)\) around the current estimate \(\theta\):
$$\begin{align*}
\log p(x|\theta') =& \log p(x|\theta')|_{\theta' = \theta} + \sum_i \left. \frac{\partial \log p(x|\theta')}{\partial \theta'_i} \right| _{\theta' = \theta} (\theta'_i - \theta_i) \\ &+ \frac{1}{2}\sum_i\sum_j \left. \frac{\partial^2 \log p(x|\theta')}{\partial \theta'_i\partial \theta'_j}\right| _{\theta' = \theta} (\theta'_i - \theta_i)(\theta'_j - \theta_j) + ...
\end{align*}$$
$$\begin{align*}
\log p(x|\theta') =& \log p(x|\theta')|_{\theta' = \theta} + \left. \nabla_{\theta'} \log p(x|\theta')^T\right| _{\theta' = \theta} (\theta' - \theta) \\ &+ \frac{1}{2}\left. (\theta' - \theta)^T\nabla^2_{\theta'} \log p(x|\theta')\right| _{\theta' = \theta}(\theta' - \theta) + ...
\end{align*}$$
The linear term \(\frac{\partial}{\partial \theta’_i} \log p(x|\theta’)\) \(\nabla_{\theta’} \log p(x|\theta’)\) in this decomposition can be written as
$$ \frac{\partial}{\partial \theta'_i} \log p(x|\theta') = \frac{1}{p(x|\theta')} \frac{\partial}{\partial \theta'_i} p(x|\theta') $$
$$ \nabla_{\theta'} \log p(x|\theta') = \frac{1}{p(x|\theta')}\nabla_{\theta'} p(x|\theta') $$
by using the log derivative trick .
Evaluated at \( \theta’ = \theta \) you receive
$$ \left. \frac{\partial}{\partial \theta'_i} \log p(x|\theta') \right|_{\theta' = \theta} = \frac{1}{p(x|\theta)} \frac{\partial}{\partial \theta_i} p(x|\theta). $$
$$ \left. \nabla_{\theta'} \log p(x|\theta') \right|_{\theta' = \theta} = \frac{1}{p(x|\theta)}\nabla_{\theta} p(x|\theta). $$
The quadratic term of the decomposition
\(\frac{\partial^2 \log p(x|\theta’)}{\partial \theta’_i\partial \theta’_j}\)
\(\nabla^2_{\theta’} \log p(x|\theta’)\)
can be written as
$$
\begin{align*}
\frac{\partial^2 \log p(x|\theta')}{\partial \theta'_\partial \theta'_j} =& \frac{\partial }{\partial \theta'_j} \left( \frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right) \\
=& \frac{\partial }{\partial \theta'_j} \left( \frac{1}{p(x|\theta')} \frac{\partial}{\partial \theta'_i} p(x|\theta')\right) \\
=& \frac{\partial }{\partial \theta'_j} \left( \frac{1}{p(x|\theta')}\right) \frac{\partial}{\partial \theta'_i} p(x|\theta') + \frac{1}{p(x|\theta')} \frac{\partial }{\partial \theta'_j} \left(\frac{\partial}{\partial \theta'_i} p(x|\theta')\right)\\
=& \frac{\partial }{\partial \theta'_j} \left( \frac{1}{p(x|\theta')}\right) \frac{\partial}{\partial \theta'_i} p(x|\theta') + \frac{1}{p(x|\theta')} \frac{\partial^2 p(x|\theta')}{\partial \theta'_\partial \theta'_j} \\
=& - \frac{1}{p(x|\theta')^2} \frac{\partial}{\partial \theta'_j} p(x|\theta') \frac{\partial}{\partial \theta'_i} p(x|\theta') + \frac{1}{p(x|\theta')} \frac{\partial^2 p(x|\theta')}{\partial \theta'_\partial \theta'_j} \\
=& - \frac{\partial}{\partial \theta'_j} \log p(x|\theta') \frac{\partial}{\partial \theta'_i} \log p(x|\theta') + \frac{1}{p(x|\theta')} \frac{\partial^2 p(x|\theta')}{\partial \theta'_\partial \theta'_j}
\end{align*}
$$
$$
\begin{align*}
\nabla^2_{\theta'} \log p(x|\theta') =& \nabla_{\theta'} \nabla_{\theta'}^T \log p(x|\theta') \\
=& \nabla_{\theta'} \left(\frac{1}{p(x|\theta')}\nabla_{\theta'}^T p(x|\theta')\right) \\
=& \nabla_{\theta'} p(x|\theta') \nabla_{\theta'}^T \left(\frac{1}{p(x|\theta')}\right) + \frac{1}{p(x|\theta')}\nabla_{\theta'}^T\nabla_{\theta'} p(x|\theta') \\
=&- \frac{1}{p(x|\theta')^2} \nabla_{\theta'} p(x|\theta') \nabla_{\theta'} p(x|\theta') ^T + \frac{1}{p(x|\theta')}\nabla_{\theta'}^2 p(x|\theta') \\
=&- \nabla_{\theta'} \log p(x|\theta') \nabla_{\theta'} \log p(x|\theta') ^T + \frac{1}{p(x|\theta')}\nabla_{\theta'}^2 p(x|\theta')
\end{align*}
$$
Evaluated at \( \theta’ = \theta \) you receive
$$
\begin{align*}
\left. \frac{\partial^2 \log p(x|\theta')}{\partial \theta'_\partial \theta'_j}\right|_{\theta' = \theta} =& - \frac{\partial}{\partial \theta_j} \log p(x|\theta) \frac{\partial}{\partial \theta_i} \log p(x|\theta) + \frac{1}{p(x|\theta)} \frac{\partial^2 p(x|\theta)}{\partial \theta_\partial \theta_j}
\end{align*}.
$$
$$
\begin{align*}
\left. \nabla^2_{\theta'} \log p(x|\theta')\right|_{\theta' = \theta} =&- \nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta) ^T + \frac{1}{p(x|\theta)}\nabla_{\theta}^2 p(x|\theta)
\end{align*}.
$$
Finally, you can express the Taylor approximation as
$$\begin{align*}
\log p(x|\theta') =& \log p(x|\theta) + \sum_i \frac{1}{p(x|\theta)} \frac{\partial}{\partial \theta_i} p(x|\theta) (\theta'_i - \theta_i) \\ &+ \frac{1}{2} \sum_i\sum_j \left(- \nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta) ^T + \frac{1}{p(x|\theta)}\nabla_{\theta}^2 p(x|\theta)\right) (\theta'_i - \theta_i)(\theta'_j - \theta_j) + ...
\end{align*}$$
$$\begin{align*}
\log p(x|\theta') =& \log p(x|\theta) + \left. \nabla_{\theta'} \log p(x|\theta')^T\right| _{\theta' = \theta} (\theta' - \theta) \\ &+ \left. \frac{1}{2} (\theta' - \theta)^T\nabla^2_{\theta'} \log p(x|\theta')\right| _{\theta' = \theta}(\theta' - \theta) + ...
\end{align*}$$
Mean
Variance
Score
The score is defined as the derivative of the log-likelihood
$$ V_i(x;\theta') = \frac{\partial}{\partial \theta'_i} \log p(x|\theta') $$
$$ V(x;\theta') = \nabla_{\theta'} \log p(x|\theta') $$
Mean
$$ \mathbb{E}_{p(x|\theta^*)}\left[\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right] = \int \limits_{-\infty}^{\infty} p(x|\theta^*) \frac{\partial}{\partial \theta'_i} \log p(x|\theta')dx$$
$$ \mathbb{E}_{p(x|\theta^*)}\left[\nabla_{\theta'} \log p(x|\theta')\right] = \int \limits_{-\infty}^{\infty} p(x|\theta^*) \nabla_{\theta'} \log p(x|\theta')dx$$
Variance
\begin{align*} \text{Var}\left(\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right) &= \mathbb{E}_{p(x|\theta^*)}\left[\left(\frac{\partial}{\partial \theta'_i} \log p(x|\theta')-\mathbb{E}_{p(x|\theta^*)}\left[\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right]\right)^2\right] \\
&= \mathbb{E}_{p(x|\theta^*)}\left[\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right] + \mathbb{E}_{p(x|\theta^*)}\left[\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right]^2 \\
\end{align*}
\begin{align*} \text{Var}\left(\nabla_{\theta'} \log p(x|\theta')\right) &= \mathbb{E}_{p(x|\theta^*)}\left[\left(\nabla_{\theta'}\log p(x|\theta')-\mathbb{E}_{p(x|\theta^*)}\left[\nabla_{\theta'} \log p(x|\theta')\right]\right)\left(\nabla_{\theta'}\log p(x|\theta')-\mathbb{E}_{p(x|\theta^*)}\left[\nabla_{\theta'} \log p(x|\theta')\right]\right)^T\right] \\
&= \mathbb{E}_{p(x|\theta^*)}\left[\nabla_{\theta'} \log p(x|\theta')\nabla_{\theta'} \log p(x|\theta')^T\right] + \mathbb{E}_{p(x|\theta^*)}\left[\nabla_{\theta'} \log p(x|\theta')\right]\mathbb{E}_{p(x|\theta^*)}\left[\nabla_{\theta'} \log p(x|\theta')\right]^T \\
\end{align*}
Kullback-Leibler divergence
The Kullback-Leibler divergence is defined as
Let’s perform a Taylor approximation around the current estimate \(\theta\):
$$
\begin{align*}
D_\textrm{KL}(\theta||\theta') =& D_\textrm{KL}(\theta||\theta')|_{\theta' = \theta} + \sum_i \left.\frac{\partial D_\textrm{KL}(\theta||\theta')}{\partial \theta'_i}\right|_{\theta' = \theta} (\theta'_i - \theta_i) \\ &+ \frac{1}{2}\sum_i\sum_j \left.\frac{\partial^2 D_\textrm{KL}(\theta||\theta')}{\partial \theta'_i\partial \theta'_j}\right|_{\theta' = \theta} (\theta'_i - \theta_i) (\theta'_j - \theta_j) + ...
\end{align*}
$$
$$
\begin{align*}
D_\textrm{KL}(\theta||\theta') =& D_\textrm{KL}(\theta||\theta')|_{\theta' = \theta} + \left. \nabla_{\theta'}D_\textrm{KL}(\theta||\theta')^T\right|_{\theta' = \theta}(\theta' - \theta) \\
& + \frac{1}{2}\left. (\theta' - \theta)^T\nabla^2_{\theta'}D_\textrm{KL}(\theta||\theta')\right|_{\theta' = \theta}(\theta' - \theta) + ...
\end{align*}
$$
The constant term can be written as
$$
\begin{align*}
D_\textrm{KL}(\theta||\theta')|_{\theta' = \theta} =& 0.
\end{align*}
$$
$$
\begin{align*}
D_\textrm{KL}(\theta||\theta')|_{\theta' = \theta} =& 0.
\end{align*}
$$
The linear term
\(\frac{\partial D_\textrm{KL}(\theta||\theta’)}{\partial \theta’_i}\)
\(\nabla_{\theta’}D_\textrm{KL}(\theta||\theta’)\)
in this decomposition can be written as
$$
\begin{align*}
\frac{\partial D_\textrm{KL}(\theta||\theta')}{\partial \theta'_i} =& \frac{\partial}{\partial \theta'_i}\left(\int \limits_{-\infty}^{\infty} p(x|\theta) \log p(x|\theta) dx -\int \limits_{-\infty}^{\infty} p(x|\theta) \log p(x|\theta') dx\right) \\
=& -\int \limits_{-\infty}^{\infty} p(x|\theta) \frac{\partial}{\partial \theta'_i}\log p(x|\theta') dx\\
=& -\int \limits_{-\infty}^{\infty} p(x|\theta)\frac{1}{p(x|\theta')}\frac{\partial}{\partial \theta'_i} p(x|\theta') dx \\
=& - \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta'_i} \log p(x|\theta')\right]\\
=& - \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta')}\frac{\partial}{\partial \theta'_i} p(x|\theta')\right].
\end{align*}
$$
$$
\begin{align*}
\nabla_{\theta'}D_\textrm{KL}(\theta||\theta') =& \nabla_{\theta'}\left(\int \limits_{-\infty}^{\infty} p(x|\theta) \log p(x|\theta) dx -\int \limits_{-\infty}^{\infty} p(x|\theta) \log p(x|\theta') dx\right) \\
=& -\int \limits_{-\infty}^{\infty} p(x|\theta)\nabla_{\theta'} \log p(x|\theta') dx \\
=& -\int \limits_{-\infty}^{\infty} p(x|\theta)\frac{1}{p(x|\theta')}\nabla_{\theta'} p(x|\theta') dx \\
=& - \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta'} \log p(x|\theta')\right]\\
=& - \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta')}\nabla_{\theta'} p(x|\theta')\right].
\end{align*}
$$
Evaluated at \( \theta’ = \theta \) you receive
$$
\begin{align*}
\left. \frac{\partial D_\textrm{KL}(\theta||\theta')}{\partial \theta'_i} \right|_{\theta' = \theta} =& - \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)}\frac{\partial}{\partial \theta_i} p(x|\theta)\right] \\
=& \int \limits_{-\infty}^{\infty} p(x|\theta) \frac{1}{p(x|\theta)}\frac{\partial}{\partial \theta_i}p(x|\theta) dx \\
=& \int \limits_{-\infty}^{\infty} \frac{\partial}{\partial \theta_i} p(x|\theta) dx \\
=& \frac{\partial}{\partial \theta_i}\int \limits_{-\infty}^{\infty} p(x|\theta) dx \\
=& \frac{\partial}{\partial \theta_i} 1 \\
=& 0
\end{align*}
$$
$$
\begin{align*}
\left. \nabla_{\theta'}D_\textrm{KL}(\theta||\theta') \right|_{\theta' = \theta} =& - \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)}\nabla_{\theta} p(x|\theta)\right] \\
=& \int \limits_{-\infty}^{\infty} p(x|\theta) \frac{1}{p(x|\theta)}\nabla_{\theta} p(x|\theta) dx \\
=& \int \limits_{-\infty}^{\infty} \nabla_{\theta} p(x|\theta) dx \\
=& \nabla_{\theta}\int \limits_{-\infty}^{\infty} p(x|\theta) dx \\
=& \nabla_{\theta} 1 \\
=& 0
\end{align*}
$$
The quadratic term
\(\frac{\partial^2 D_\textrm{KL}(\theta||\theta’)}{\partial \theta’_i\partial \theta’_j}\)
\(\nabla^2_ {\theta’}D_\textrm{KL}(\theta’||\theta’)\)
in this decomposition can be written as
$$
\begin{align*}
\frac{\partial^2 D_\textrm{KL}(\theta||\theta')}{\partial \theta'_i\partial \theta'_j} =& \frac{\partial }{\partial \theta'_j} \left( \frac{\partial}{\partial \theta'_i} D_\textrm{KL}(\theta||\theta')\right) \\
=& -\frac{\partial }{\partial \theta'_j} \left( \int \limits_{-\infty}^{\infty} p(x|\theta)\frac{\partial}{\partial \theta'_i} \log p(x|\theta') dx \right) \\
=& - \int \limits_{-\infty}^{\infty} p(x|\theta)\frac{\partial^2 \log p(x|\theta')}{\partial \theta'_i \partial \theta'_j} dx \\
=& -\mathbb{E}_{p(x|\theta)}\left[\frac{\partial^2 \log p(x|\theta')}{\partial \theta'_i \partial \theta'_j}\right]
\end{align*}$$
$$
\begin{align*}
\nabla^2_{\theta'}D_\textrm{KL}(\theta||\theta') =& \nabla_{\theta'} \nabla_{\theta'}^T D_\textrm{KL}(\theta||\theta') \\
=& -\nabla_{\theta'} \left( \int \limits_{-\infty}^{\infty} p(x|\theta)\nabla_{\theta'}^T \log p(x|\theta') dx \right) \\
=& - \int \limits_{-\infty}^{\infty} p(x|\theta)\nabla_{\theta'}^2 \log p(x|\theta') dx \\
=& -\mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta'}^2 \log p(x|\theta')\right]\\
\end{align*}$$
Evaluated at \( \theta’ = \theta \) you receive
$$
\begin{align*}
\left. \frac{\partial^2 D_\textrm{KL}(\theta||\theta')}{\partial \theta'_i\partial \theta'_j} \right|_{\theta' = \theta} =& -\mathbb{E}_{p(x|\theta)}\left[\frac{\partial^2 \log p(x|\theta)}{\partial \theta_i \partial \theta_j}\right]
\end{align*}$$
$$
\begin{align*}
\left. \nabla^2_{\theta'}D_\textrm{KL}(\theta||\theta') \right|_{\theta' = \theta} =& -\mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta}^2 \log p(x|\theta)\right]\\
\end{align*}$$
Finally, you can express the Taylor approximation as
$$
\begin{align*}
D_\textrm{KL}(\theta||\theta') =& -\frac{1}{2}\sum_i\sum_j \mathbb{E}_{p(x|\theta)}\left[\frac{\partial^2 \log p(x|\theta)}{\partial \theta_i \partial \theta_j} \right] (\theta'_i - \theta_i) (\theta'_j - \theta_j) + ...
\end{align*}
$$
$$
\begin{align*}
D_\textrm{KL}(\theta||\theta') =& -\frac{1}{2} (\theta' - \theta)^T\mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta}^2 \log p(x|\theta)\right](\theta' - \theta) + ...
\end{align*}
$$
Cross entropy
The cross entropy is defined as
The cross entropy and the Kullback-Leibler differ only by the constant entropy \(H(\theta\). Therefore, the linear and quadratic terms of those are the same.
The fisher definition can be defined as …
… expectation of the squared score
$$ \left[\mathcal{I(\theta)}\right]_{ij} = \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_j} \log p(x|\theta) \frac{\partial}{\partial \theta_i} \log p(x|\theta)\right] $$
$$ \mathcal{I(\theta)} = \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta)^T\right] $$
… variance of the score
$$
\begin{align*}
\left[\mathcal{I(\theta)}\right]_{ij} &= \text{Var}\left(\frac{\partial}{\partial \theta_i} \log p(x|\theta)\right) \\
&= \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_i} \log p(x|\theta)\frac{\partial}{\partial \theta_i} \log p(x|\theta)\right] + \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_i} \log p(x|\theta)\right]^2 \\
\end{align*}
$$
$$
\begin{align*}
\mathcal{I(\theta)} &= \text{Var}\left(\nabla_{\theta'} \log p(x|\theta')\right) \\
&= \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta'} \log p(x|\theta')\nabla_{\theta'} \log p(x|\theta')^T\right] + \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta'} \log p(x|\theta')\right]\mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta'} \log p(x|\theta')\right]^T \\
\end{align*}
$$
With
$$
\begin{align*} \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_i} \log p(x|\theta)\right] =& \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)}\frac{\partial}{\partial \theta_i} p(x|\theta)\right] \\
=& \int \limits_{-\infty}^{\infty} p(x|\theta) \frac{1}{p(x|\theta)}\frac{\partial}{\partial \theta_i}p(x|\theta) dx \\
=& \int \limits_{-\infty}^{\infty} \frac{\partial}{\partial \theta_i} p(x|\theta) dx \\
=& \frac{\partial}{\partial \theta_i}\int \limits_{-\infty}^{\infty} p(x|\theta) dx \\
=& \frac{\partial}{\partial \theta_i} 1 \\
=& 0
\end{align*}$$
$$
\begin{align*} \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta} \log p(x|\theta)\right] =& \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)}\nabla_{\theta} p(x|\theta)\right] \\
=& \int \limits_{-\infty}^{\infty} p(x|\theta) \frac{1}{p(x|\theta)}\nabla_{\theta} p(x|\theta) dx \\
=& \int \limits_{-\infty}^{\infty} \nabla_{\theta} p(x|\theta) dx \\
=& \nabla_{\theta}\int \limits_{-\infty}^{\infty} p(x|\theta) dx \\
=& \nabla_{\theta} 1 \\
=& 0
\end{align*}$$
follows
$$ \left[\mathcal{I(\theta)}\right]_{ij} = \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_j} \log p(x|\theta) \frac{\partial}{\partial \theta_i} \log p(x|\theta)\right] $$
$$ \mathcal{I(\theta)} = \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta)^T\right] $$
… curvature of the KL-divergence
$$
\begin{align*}
\left[\mathcal{I(\theta)}\right]_{ij} =&\left. \frac{\partial^2 D_\textrm{KL}(\theta||\theta')}{\partial \theta_i\partial \theta_j} \right|_{\theta' = \theta}\\
=& -\mathbb{E}_{p(x|\theta)}\left[\frac{\partial^2 \log p(x|\theta)}{\partial \theta_i \partial \theta_j}\right] \\
=& - \mathbb{E}_{p(x|\theta)}\left[-\frac{\partial}{\partial \theta_j} \log p(x|\theta) \frac{\partial}{\partial \theta_i} \log p(x|\theta) + \frac{1}{p(x|\theta)} \frac{\partial^2 p(x|\theta)}{\partial \theta_i\partial \theta_j}\right] \\
=& \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_j} \log p(x|\theta) \frac{\partial}{\partial \theta_i} \log p(x|\theta)\right] - \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)} \frac{\partial^2 p(x|\theta)}{\partial \theta_\partial \theta_j}\right]\\
\end{align*}$$
$$
\begin{align*}
\mathcal{I(\theta)} =&\left. \nabla^2_{\theta}D_\textrm{KL}(\theta||\theta) \right|_{\theta = \theta}\\
=& -\mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta}^2 \log p(x|\theta)\right]\\
=& - \mathbb{E}_{p(x|\theta)}\left[-\nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta)^T + \frac{1}{p(x|\theta)} \nabla^2_{\theta} \log p(x|\theta)\right] \\
=& \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta)^T\right] - \mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)} \nabla^2_{\theta} \log p(x|\theta)\right]\\
\end{align*}$$
With
$$
\begin{align*}
\mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)} \frac{\partial^2 p(x|\theta)}{\partial \theta_\partial \theta_j}\right]=& \int \limits_{-\infty}^{\infty} p(x|\theta) \frac{1}{p(x|\theta)} \frac{\partial^2 p(x|\theta)}{\partial \theta_i\partial \theta_j} dx \\
=& \int \limits_{-\infty}^{\infty} \frac{\partial^2 p(x|\theta)}{\partial \theta_i\partial \theta_j} dx \\
=& \frac{\partial^2 }{\partial \theta_i\partial \theta_j}\int \limits_{-\infty}^{\infty} p(x|\theta) dx \\
=& \frac{\partial^2 }{\partial \theta_i\partial \theta_j}1 \\
=& 0 .
\end{align*}
$$
$$
\begin{align*}
\mathbb{E}_{p(x|\theta)}\left[\frac{1}{p(x|\theta)} \nabla^2_{\theta} \log p(x|\theta)\right]=& \int \limits_{-\infty}^{\infty} p(x|\theta) \frac{1}{p(x|\theta)} \nabla^2_{\theta} \log p(x|\theta) dx \\
=& \int \limits_{-\infty}^{\infty} \nabla^2_{\theta} \log p(x|\theta) dx \\
=& \nabla^2_{\theta}\int \limits_{-\infty}^{\infty} p(x|\theta) dx \\
=& \nabla^2_{\theta}1 \\
=&0
\end{align*}$$
follows
$$ \left[\mathcal{I(\theta)}\right]_{ij} = \mathbb{E}_{p(x|\theta)}\left[\frac{\partial}{\partial \theta_j} \log p(x|\theta) \frac{\partial}{\partial \theta_i} \log p(x|\theta)\right] $$
$$ \mathcal{I(\theta)} = \mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta} \log p(x|\theta) \nabla_{\theta} \log p(x|\theta)^T\right] $$
… curvature of the cross entropy
Cross entropy and Kullback-Leibler divergence only differ by constant additive term. Therefore, curvature has to be identical.
… expected negative curvature of log-likelihood
$$
\begin{align*}
\left[\mathcal{I(\theta)}\right]_{ij} =& -\mathbb{E}_{p(x|\theta)}\left[\frac{\partial^2 \log p(x|\theta)}{\partial \theta_i \partial \theta_j}\right] \\
\end{align*}$$
$$
\begin{align*}
\mathcal{I(\theta)} =& -\mathbb{E}_{p(x|\theta)}\left[\nabla_{\theta}^2 \log p(x|\theta)\right]\\
\end{align*}$$
See curvature of the KL-divergence .
$$ \left[\mathcal{I(\theta)}\right]_{ij} = \mathbb{E}_{p(x|\theta)}\left[\left[\mathcal{J(\theta)}\right]_{ij}\right] $$
$$ \mathcal{I(\theta)} = \mathbb{E}_{p(x|\theta)}\left[\mathcal{J(\theta)}\right] $$
where the observed information \(\mathcal{J(\theta)}\) is defined as
$$ \left[\mathcal{J(\theta)}\right]_{ij} = -\frac{\partial^2 \log p(x|\theta)}{\partial \theta_i \partial \theta_j} $$
$$ \mathcal{J(\theta)} = -\nabla_{\theta}^2 \log p(x|\theta) $$
We note that this case has the same form as the curvature of the KL-divergence.