Cross Entropy¶
The cross entropy between two distributions \(p(x)\) and \(q(x)\) is given by:
This quantifies the average cost of representing a distribution defined by the probabilities \(p(x)\) using the probabilities \(q(x)\). For example, the cross entropy of a distribution with itself is the entropy of that distribion because the entropy quantifies the average cost of representing a distribution:
In [1]: In [1]: from dit.divergences import cross_entropy
If, however, we attempted to model a fair coin with a biased on, we could compute this mismatch with the cross entropy:
In [2]: In [4]: q = dit.Distribution(['0', '1'], [3/4, 1/4])
In [3]: In [5]: cross_entropy(p, q)

NameError Traceback (most recent call last)
<ipythoninput3135b3ec6acbf> in <module>
> 1 cross_entropy(p, q)
NameError: name 'p' is not defined
In [4]: Out[5]: 1.207518749639422
Meaning, we will on average use about \(1.2\) bits to represent the flips of a fair coin. Turning things around, what if we had a biased coin that we attempted to represent with a fair coin:
In [5]: In [6]: cross_entropy(q, p)

NameError Traceback (most recent call last)
<ipythoninput5ecb6c3af528a> in <module>
> 1 cross_entropy(q, p)
NameError: name 'p' is not defined
In [6]: Out[6]: 1.0
So although the entropy of \(q\) is less than \(1\), we will use a full bit to represent its outcomes. Both of these results can easily be seen by considering the following identity:
So in representing \(p\) using \(q\), we of course must at least use \(\H{p}\) bits – the minimum required to represent \(p\) – plus the KullbackLeibler divergence of \(q\) from \(p\).
API¶

cross_entropy
(dist1, dist2, rvs=None, crvs=None, rv_mode=None)[source]¶ The cross entropy between dist1 and dist2.
 Parameters
dist1 (Distribution) – The first distribution in the cross entropy.
dist2 (Distribution) – The second distribution in the cross entropy.
rvs (list, None) – The indexes of the random variable used to calculate the cross entropy between. If None, then the cross entropy is calculated over all random variables.
rv_mode (str, None) – Specifies how to interpret rvs and crvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of crvs and rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted, which defaults to ‘indices’.
 Returns
xh – The cross entropy between dist1 and dist2.
 Return type
 Raises
ditException – Raised if either dist1 or dist2 doesn’t have rvs or, if rvs is None, if dist2 has an outcome length different than dist1.