Optimization

It is often useful to construct a distribution \(d^\prime\) which is consistent with some marginal aspects of \(d\), but otherwise optimizes some information measure. For example, perhaps we are interested in constructing a distribution which matches pairwise marginals with another, but otherwise has maximum entropy:

In [1]: In [1]: from dit.algorithms.distribution_optimizers import MaxEntOptimizer

Helper Functions

There are three special functions to handle common optimization problems:

In [2]: In [7]: from dit.algorithms import maxent_dist, marginal_maxent_dists

The first is maximum entropy distributions with specific fixed marginals. It encapsulates the steps run above:

In [3]: In [8]: print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-b385aa28608c> in <module>
----> 1 print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))

NameError: name 'xor' is not defined

In [4]: Class:          Distribution

In [5]: Alphabet:       ('0', '1') for all rvs
   ...: Base:           linear
   ...: Outcome Class:  str
   ...: Outcome Length: 3
   ...: RV Names:       None
   ...: 
  File "<ipython-input-5-35057d4d19b8>", line 1
    Alphabet:       ('0', '1') for all rvs
                                 ^
SyntaxError: invalid syntax

The second constructs several maximum entropy distributions, each with all subsets of variables of a particular size fixed:

In [6]: In [9]: k0, k1, k2, k3 = marginal_maxent_dists(xor)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-71e6864e6530> in <module>
----> 1 k0, k1, k2, k3 = marginal_maxent_dists(xor)

NameError: name 'xor' is not defined

where k0 is the maxent dist corresponding the same alphabets as xor; k1 fixes \(p(x_0)\), \(p(x_1)\), and \(p(x_2)\); k2 fixes \(p(x_0, x_1)\), \(p(x_0, x_2)\), and \(p(x_1, x_2)\) (as in the maxent_dist example above), and finally k3 fixes \(p(x_0, x_1, x_2)\) (e.g. is the distribution we started with).