Jensen-Shannon Divergence

The Jensen-Shannon divergence is a principled divergence measure which is always finite for finite random variables. It quantifies how “distinguishable” two or more distributions are from each other. In its basic form it is:

\[\JSD{X || Y} = \H{\frac{X + Y}{2}} - \frac{\H{X} + \H{Y}}{2}\]

That is, it is the entropy of the mixture minus the mixture of the entropy. This can be generalized to an arbitrary number of random variables with arbitrary weights:

\[\JSD{X_{0:n}} = \H{\sum w_i X_i} - \sum \left( w_i \H{X_i} \right)\]
In [1]: In [1]: from dit.divergences import jensen_shannon_divergence


Where does this equation come from? Consider Jensen’s inequality:

\[\Psi \left( \mathbb{E}(x) \right) \geq \mathbb{E} \left( \Psi(x) \right)\]

where \(\Psi\) is a concave function. If we consider the divergence of the left and right side we find:

\[\Psi \left( \mathbb{E}(x) \right) - \mathbb{E} \left( \Psi(x) \right) \geq 0\]

If we make that concave function \(\Psi\) the Shannon entropy \(\H{}\), we get the Jensen-Shannon divergence. Jensen from Jensen’s inequality, and Shannon from the use of the Shannon entropy.


Some people look at the Jensen-Rényi divergence (where \(\Psi\) is the Rényi Entropy) and the Jensen-Tsallis divergence (where \(\Psi\) is the Tsallis Entropy).


The square root of the Jensen-Shannon divergence, \(\sqrt{\JSD{}}\), is a true metric between distributions.

Relationship to the Other Measures

The Jensen-Shannon divergence can be derived from other, more well known information measures; notably the Kullback-Leibler Divergence and the Mutual Information.

Kullback-Leibler divergence

The Jensen-Shannon divergence is the average Kullback-Leibler divergence of \(X\) and \(Y\) from their mixture distribution, \(M\):

\[\begin{split}\JSD{X || Y} &= \frac{1}{2} \left( \DKL{X || M} + \DKL{Y || M} \right) \\ M &= \frac{X + Y}{2}\end{split}\]

Mutual Information

\[\JSD{X || Y} = \I{Z : M}\]

where \(M\) is the mixture distribution as before, and \(Z\) is an indicator variable over \(X\) and \(Y\). In essence, if \(X\) and \(Y\) are each an urn containing colored balls, and I randomly selected one of the urns and draw a ball from it, then the Jensen-Shannon divergence is the mutual information between which urn I drew the ball from, and the color of the ball drawn.


jensen_shannon_divergence(dists, weights=None)[source]

The Jensen-Shannon Divergence: H(sum(w_i*P_i)) - sum(w_i*H(P_i)).

The square root of the Jensen-Shannon divergence is a distance metric.

  • dists ([Distribution]) – The distributions, P_i, to take the Jensen-Shannon Divergence of.

  • weights ([float], None) – The weights, w_i, to give the distributions. If None, the weights are assumed to be uniform.


jsd – The Jensen-Shannon Divergence

Return type


  • ditException – Raised if there dists and weights have unequal lengths.

  • InvalidNormalization – Raised if the weights do not sum to unity.

  • InvalidProbability – Raised if the weights are not valid probabilities.