JensenShannon Divergence¶
The JensenShannon divergence is a principled divergence measure which is always finite for finite random variables. It quantifies how “distinguishable” two or more distributions are from each other. In its basic form it is:
That is, it is the entropy of the mixture minus the mixture of the entropy. This can be generalized to an arbitrary number of random variables with arbitrary weights:
In [1]: from dit.divergences import jensen_shannon_divergence
In [2]: X = dit.ScalarDistribution(['red', 'blue'], [1/2, 1/2])
In [3]: Y = dit.ScalarDistribution(['blue', 'green'], [1/2, 1/2])
In [4]: jensen_shannon_divergence([X, Y])
Out[4]: 0.5
In [5]: jensen_shannon_divergence([X, Y], [3/4, 1/4])
Out[5]: 0.40563906222956647
In [6]: Z = dit.ScalarDistribution(['blue', 'yellow'], [1/2, 1/2])
In [7]: jensen_shannon_divergence([X, Y, Z])
Out[7]: 0.7924812503605778
In [8]: jensen_shannon_divergence([X, Y, Z], [1/2, 1/4, 1/4])
Out[8]: 0.75
Derivation¶
Where does this equation come from? Consider Jensen’s inequality:
where \(\Psi\) is a concave function. If we consider the divergence of the left and right side we find:
If we make that concave function \(\Psi\) the Shannon entropy \(\H\), we get the JensenShannon divergence. Jensen from Jensen’s inequality, and Shannon from the use of the Shannon entropy.
Note
Some people look at the JensenRényi divergence (where \(\Psi\) is the Rényi Entropy) and the JensenTsallis divergence (where \(\Psi\) is the Tsallis Entropy).
Metric¶
The square root of the JensenShannon divergence, \(\sqrt{\JSD}\), is a true metric between distributions.
Relationship to the Other Measures¶
The JensenShannon divergence can be derived from other, more well known information measures; notably the KullbackLeibler Divergence and the Mutual Information.
KullbackLeibler divergence¶
The JensenShannon divergence is the average KullbackLeibler divergence of \(X\) and \(Y\) from their mixture distribution, \(M\):
Mutual Information¶
where \(M\) is the mixture distribution as before, and \(Z\) is an indicator variable over \(X\) and \(Y\). In essence, if \(X\) and \(Y\) are each an urn containing colored balls, and I randomly selected one of the urns and draw a ball from it, then the JensenShannon divergence is the mutual information between which urn I drew the ball from, and the color of the ball drawn.
API¶

jensen_shannon_divergence
(*args, **kwargs)[source]¶ The JensenShannon Divergence: H(sum(w_i*P_i))  sum(w_i*H(P_i)).
The square root of the JensenShannon divergence is a distance metric.
Parameters:  dists ([Distribution]) – The distributions, P_i, to take the JensenShannon Divergence of.
 weights ([float], None) – The weights, w_i, to give the distributions. If None, the weights are assumed to be uniform.
Returns: jsd – The JensenShannon Divergence
Return type: Raises: ditException
– Raised if there dists and weights have unequal lengths.InvalidNormalization
– Raised if the weights do not sum to unity.InvalidProbability
– Raised if the weights are not valid probabilities.