# Optimization¶

It is often useful to construct a distribution $$d^\prime$$ which is consistent with some marginal aspects of $$d$$, but otherwise optimizes some information measure. For example, perhaps we are interested in constructing a distribution which matches pairwise marginals with another, but otherwise has maximum entropy:

In : from dit.algorithms.distribution_optimizers import MaxEntOptimizer

In : xor = dit.example_dists.Xor()

In : meo = MaxEntOptimizer(xor, [[0,1], [0,2], [1,2]])

In : meo.optimize()
Out:
fun: -3.0000017320700905
jac: array([-3.00000018, -3.00000015, -3.00000006, -2.9999997 , -2.99999976,
-2.99999982, -2.99999994, -2.99999964])
message: 'Optimization terminated successfully.'
nfev: 892
nit: 81
njev: 81
status: 0
success: True
x: array([0.1250001 , 0.12500008, 0.12500006, 0.12500008, 0.12500008,
0.12500005, 0.12500006, 0.12500007])

In : dp = meo.construct_dist()

In : print(dp)
Class:          Distribution
Alphabet:       ('0', '1') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 3
RV Names:       None

x     p(x)
000   1793883/14351063
001   2538379/20307033
010   1569035/12552281
011   6389891/51119127
100   7040815/56326519
101   2856306/22850449
110   1/8
111   1845800/14766399


## Helper Functions¶

There are three special functions to handle common optimization problems:

In : from dit.algorithms import maxent_dist, marginal_maxent_dists


The first is maximum entropy distributions with specific fixed marginals. It encapsulates the steps run above:

In : print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))
Class:          Distribution
Alphabet:       ('0', '1') for all rvs
Base:           linear
Outcome Class:  str
Outcome Length: 3
RV Names:       None

x     p(x)
000   1257477/10059817
001   1093061/8744487
010   1576401/12611207
011   1184638/9477105
100   1254660/10037279
101   994733/7957865
110   983321/7866569
111   760230/6081839


The second constructs several maximum entropy distributions, each with all subsets of variables of a particular size fixed:

In : k0, k1, k2, k3 = marginal_maxent_dists(xor)


where k0 is the maxent dist corresponding the same alphabets as xor; k1 fixes $$p(x_0)$$, $$p(x_1)$$, and $$p(x_2)$$; k2 fixes $$p(x_0, x_1)$$, $$p(x_0, x_2)$$, and $$p(x_1, x_2)$$ (as in the maxent_dist example above), and finally k3 fixes $$p(x_0, x_1, x_2)$$ (e.g. is the distribution we started with).