# Jensen-Shannon Divergence¶

The Jensen-Shannon divergence is a principled divergence measure which is always finite for finite random variables. It quantifies how “distinguishable” two or more distributions are from each other. In its basic form it is:

$\JSD{X || Y} = \H{\frac{X + Y}{2}} - \frac{\H{X} + \H{Y}}{2}$

That is, it is the entropy of the mixture minus the mixture of the entropy. This can be generalized to an arbitrary number of random variables with arbitrary weights:

$\JSD{X_{0:n}} = \H{\sum w_i X_i} - \sum \left( w_i \H{X_i} \right)$
In [1]: In [1]: from dit.divergences import jensen_shannon_divergence


## Derivation¶

Where does this equation come from? Consider Jensen’s inequality:

$\Psi \left( \mathbb{E}(x) \right) \geq \mathbb{E} \left( \Psi(x) \right)$

where $$\Psi$$ is a concave function. If we consider the divergence of the left and right side we find:

$\Psi \left( \mathbb{E}(x) \right) - \mathbb{E} \left( \Psi(x) \right) \geq 0$

If we make that concave function $$\Psi$$ the Shannon entropy $$\H{}$$, we get the Jensen-Shannon divergence. Jensen from Jensen’s inequality, and Shannon from the use of the Shannon entropy.

Note

Some people look at the Jensen-Rényi divergence (where $$\Psi$$ is the Rényi Entropy) and the Jensen-Tsallis divergence (where $$\Psi$$ is the Tsallis Entropy).

## Metric¶

The square root of the Jensen-Shannon divergence, $$\sqrt{\JSD{}}$$, is a true metric between distributions.

## Relationship to the Other Measures¶

The Jensen-Shannon divergence can be derived from other, more well known information measures; notably the Kullback-Leibler Divergence and the Mutual Information.

### Kullback-Leibler divergence¶

The Jensen-Shannon divergence is the average Kullback-Leibler divergence of $$X$$ and $$Y$$ from their mixture distribution, $$M$$:

$\begin{split}\JSD{X || Y} &= \frac{1}{2} \left( \DKL{X || M} + \DKL{Y || M} \right) \\ M &= \frac{X + Y}{2}\end{split}$

### Mutual Information¶

$\JSD{X || Y} = \I{Z : M}$

where $$M$$ is the mixture distribution as before, and $$Z$$ is an indicator variable over $$X$$ and $$Y$$. In essence, if $$X$$ and $$Y$$ are each an urn containing colored balls, and I randomly selected one of the urns and draw a ball from it, then the Jensen-Shannon divergence is the mutual information between which urn I drew the ball from, and the color of the ball drawn.

## API¶

jensen_shannon_divergence(dists, weights=None)[source]

The Jensen-Shannon Divergence: H(sum(w_i*P_i)) - sum(w_i*H(P_i)).

The square root of the Jensen-Shannon divergence is a distance metric.

Parameters
• dists ([Distribution]) – The distributions, P_i, to take the Jensen-Shannon Divergence of.

• weights ([float], None) – The weights, w_i, to give the distributions. If None, the weights are assumed to be uniform.

Returns

jsd – The Jensen-Shannon Divergence

Return type

float

Raises
• ditException – Raised if there dists and weights have unequal lengths.

• InvalidNormalization – Raised if the weights do not sum to unity.

• InvalidProbability – Raised if the weights are not valid probabilities.