General Information¶

Documentation:

http://docs.dit.io

https://pypi.org/project/dit/

https://anaconda.org/conda-forge/dit

Dependencies:

Optional Dependencies¶

• colorama: colored column heads in PID indicating failure modes

• cython: faster sampling from distributions

• hypothesis: random sampling of distributions

• matplotlib, python-ternary: plotting of various information-theoretic expansions

• numdifftools: numerical evaluation of gradients and hessians during optimization

• pint: add units to informational values

• scikit-learn: faster nearest-neighbor lookups during entropy/mutual information estimation from samples

Mailing list:

None

Code and bug tracker:

https://github.com/dit/dit

BSD 3-Clause, see LICENSE.txt for details.

Quickstart¶

The basic usage of dit corresponds to creating distributions, modifying them if need be, and then computing properties of those distributions. First, we import:

In [1]: In [1]: import dit


Suppose we have a really thick coin, one so thick that there is a reasonable chance of it landing on its edge. Here is how we might represent the coin in dit.

In [2]: In [2]: d = dit.Distribution(['H', 'T', 'E'], [.4, .4, .2])

In [3]: In [3]: print(d)
Class:          Distribution
Alphabet:       ('E', 'H', 'T') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 1
RV Names:       None

x   p(x)
E   1/5
H   2/5
T   2/5

In [4]: Class:          Distribution

In [5]: Alphabet:       ('E', H', 'T') for all rvs
...: Base:           linear
...: Outcome Class:  str
...: Outcome Length: 1
...: RV Names:       None
...:
File "<ipython-input-5-765b249d398d>", line 1
Alphabet:       ('E', H', 'T') for all rvs
^
SyntaxError: invalid syntax


Calculate the probability of $$H$$ and also of the combination: $$H~\mathbf{or}~T$$.

In [6]: In [4]: d['H']
Out[6]: 0.4

In [7]: Out[4]: 0.4

In [8]: In [50]: d.event_probability(['H','T'])
Out[8]: 0.8

In [9]: Out[50]: 0.8


Calculate the Shannon entropy and extropy of the joint distribution.

In [10]: In [10]: dit.shannon.entropy(d)
Out[10]: 1.5219280948873621

In [11]: Out[10]: 1.5219280948873621

In [12]: In [11]: dit.other.extropy(d)
Out[12]: 1.1419011889093373

In [13]: Out[11]: 1.1419011889093373


Create a distribution representing the $$\mathbf{xor}$$ logic function. Here, we have two inputs, $$X$$ and $$Y$$, and then an output $$Z = \mathbf{xor}(X,Y)$$.

In [14]: In [6]: import dit.example_dists


Calculate the Shannon mutual informations $$\I[X:Z]$$, $$\I[Y:Z]$$, and $$\I[X,Y:Z]$$.

In [15]: In [12]: dit.shannon.mutual_information(d, ['X'], ['Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-15-1c8c59aabbb1> in <module>
----> 1 dit.shannon.mutual_information(d, ['X'], ['Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in mutual_information(dist, rvs_X, rvs_Y, rv_mode)
157
158     """
--> 159     H_X = entropy(dist, rvs_X, rv_mode=rv_mode)
160     H_Y = entropy(dist, rvs_Y, rv_mode=rv_mode)
161     # Make sure to union the indexes. This handles the case when X and Y

72             rv_mode = RV_MODES.INDICES
73
---> 74         d = dist.marginal(rvs, rv_mode=rv_mode) # pylint: disable=no-member
75     else:
76         d = dist

1288         # We parse the rv_mode now, so that we can reassign their names
1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
1291
1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
334         msg = 'rvs contains invalid random variables, {0}, {1} {2}.'
335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
337
338     # Sort the random variable names (or indexes) by their index.

ditException: rvs contains invalid random variables, ['X'], set() 0.

In [16]: Out[12]: 0.0

In [17]: In [13]: dit.shannon.mutual_information(d, ['Y'], ['Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-17-90efbc2156b7> in <module>
----> 1 dit.shannon.mutual_information(d, ['Y'], ['Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in mutual_information(dist, rvs_X, rvs_Y, rv_mode)
157
158     """
--> 159     H_X = entropy(dist, rvs_X, rv_mode=rv_mode)
160     H_Y = entropy(dist, rvs_Y, rv_mode=rv_mode)
161     # Make sure to union the indexes. This handles the case when X and Y

72             rv_mode = RV_MODES.INDICES
73
---> 74         d = dist.marginal(rvs, rv_mode=rv_mode) # pylint: disable=no-member
75     else:
76         d = dist

1288         # We parse the rv_mode now, so that we can reassign their names
1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
1291
1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
334         msg = 'rvs contains invalid random variables, {0}, {1} {2}.'
335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
337
338     # Sort the random variable names (or indexes) by their index.

ditException: rvs contains invalid random variables, ['Y'], set() 0.

In [18]: Out[13]: 0.0

In [19]: In [14]: dit.shannon.mutual_information(d, ['X', 'Y'], ['Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-19-1af669dd1aec> in <module>
----> 1 dit.shannon.mutual_information(d, ['X', 'Y'], ['Z'])

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/shannon/shannon.py in mutual_information(dist, rvs_X, rvs_Y, rv_mode)
157
158     """
--> 159     H_X = entropy(dist, rvs_X, rv_mode=rv_mode)
160     H_Y = entropy(dist, rvs_Y, rv_mode=rv_mode)
161     # Make sure to union the indexes. This handles the case when X and Y

72             rv_mode = RV_MODES.INDICES
73
---> 74         d = dist.marginal(rvs, rv_mode=rv_mode) # pylint: disable=no-member
75     else:
76         d = dist

1288         # We parse the rv_mode now, so that we can reassign their names
1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
1291
1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
334         msg = 'rvs contains invalid random variables, {0}, {1} {2}.'
335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
337
338     # Sort the random variable names (or indexes) by their index.

ditException: rvs contains invalid random variables, ['X', 'Y'], set() 0.

In [20]: Out[14]: 1.0


Calculate the marginal distribution $$P(X,Z)$$. Then print its probabilities as fractions, showing the mask.

In [21]: In [15]: d2 = d.marginal(['X', 'Z'])
---------------------------------------------------------------------------
ditException                              Traceback (most recent call last)
<ipython-input-21-b067ba4a93be> in <module>
----> 1 d2 = d.marginal(['X', 'Z'])

1288         # We parse the rv_mode now, so that we can reassign their names
1289         # after coalesce has finished.
-> 1290         rvs, indexes = parse_rvs(self, rvs, rv_mode, unique=True, sort=True)
1291
1292         ## Eventually, add in a method specialized for dense distributions.

~/checkouts/readthedocs.org/user_builds/dit/conda/latest/lib/python3.7/site-packages/dit/helpers.py in parse_rvs(dist, rvs, rv_mode, unique, sort)
334         msg = 'rvs contains invalid random variables, {0}, {1} {2}.'
335         msg = msg.format(indexes, good_indexes, rv_mode)
--> 336         raise ditException(msg)
337
338     # Sort the random variable names (or indexes) by their index.

ditException: rvs contains invalid random variables, ['X', 'Z'], set() 0.

In [22]: In [16]: print(d2.to_string(show_mask=True, exact=True))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-40352d6ba310> in <module>

NameError: name 'd2' is not defined

In [23]: Class:          Distribution

In [24]: Alphabet:       ('0', '1') for all rvs
....: Base:           linear
....: Outcome Class:  str
....: Outcome Length: 2 (mask: 3)
....: RV Names:       ('X', 'Z')
....:
File "<ipython-input-24-6b5343e0ae87>", line 1
Alphabet:       ('0', '1') for all rvs
^
SyntaxError: invalid syntax


Convert the distribution probabilities to log (base 3.5) probabilities, and access its probability mass function.

In [25]: In [17]: d2.set_base(3.5)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-a4c25fbf4cdd> in <module>
----> 1 d2.set_base(3.5)

NameError: name 'd2' is not defined

In [26]: In [18]: d2.pmf
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-26-1667a2505e35> in <module>
----> 1 d2.pmf

NameError: name 'd2' is not defined

In [27]: array([-1.10658951, -1.10658951, -1.10658951, -1.10658951])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-27-bbf92d577a74> in <module>
----> 1 array([-1.10658951, -1.10658951, -1.10658951, -1.10658951])

NameError: name 'array' is not defined


Draw 5 random samples from this distribution.

In [28]: In [19]: d2.rand(5)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-28-6015593867b3> in <module>
----> 1 d2.rand(5)

NameError: name 'd2' is not defined

In [29]: Out[19]: ['01', '10', '00', '01', '00']


Enjoy!