Optimization

It is often useful to construct a distribution \(d^\prime\) which is consistent with some marginal aspects of \(d\), but otherwise optimizes some information measure. For example, perhaps we are interested in constructing a distribution which matches pairwise marginals with another, but otherwise has maximum entropy:

In [1]: from dit.algorithms.distribution_optimizers import MaxEntOptimizer

In [2]: xor = dit.example_dists.Xor()

In [3]: meo = MaxEntOptimizer(xor, [[0,1], [0,2], [1,2]])

In [4]: meo.optimize()
Out[4]: 
     message: Optimization terminated successfully
     success: True
      status: 0
         fun: -3.000001615505876
           x: [ 1.250e-01  1.250e-01  1.250e-01  1.250e-01  1.250e-01
                1.250e-01  1.250e-01  1.250e-01]
         nit: 87
         jac: [-3.000e+00 -3.000e+00 -3.000e+00 -3.000e+00 -3.000e+00
               -3.000e+00 -3.000e+00 -3.000e+00]
        nfev: 175
        njev: 87
 multipliers: [-1.771e+06]

In [5]: dp = meo.construct_dist()

In [6]: print(dp)
Class:    Distribution
Alphabet: (('0', '1'), ('0', '1'), ('0', '1'))
Base:     linear

x                 p(X0,X1,X2)
('0', '0', '0')   1/8
('0', '0', '1')   1784702/14277617
('0', '1', '0')   2075854/16606833
('0', '1', '1')   2309353/18474823
('1', '0', '0')   1897263/15178103
('1', '0', '1')   1/8
('1', '1', '0')   5109551/40876407
('1', '1', '1')   1/8

Helper Functions

There are three special functions to handle common optimization problems:

In [7]: from dit.algorithms import maxent_dist, marginal_maxent_dists

The first is maximum entropy distributions with specific fixed marginals. It encapsulates the steps run above:

In [8]: print(maxent_dist(xor, [[0,1], [0,2], [1,2]]))
Class:    Distribution
Alphabet: (('0', '1'), ('0', '1'), ('0', '1'))
Base:     linear

x                 p(X0,X1,X2)
('0', '0', '0')   1/8
('0', '0', '1')   1/8
('0', '1', '0')   1/8
('0', '1', '1')   1/8
('1', '0', '0')   1/8
('1', '0', '1')   1/8
('1', '1', '0')   1/8
('1', '1', '1')   1/8

The second constructs several maximum entropy distributions, each with all subsets of variables of a particular size fixed:

In [9]: k0, k1, k2, k3 = marginal_maxent_dists(xor)

where k0 is the maxent dist corresponding the same alphabets as xor; k1 fixes \(p(x_0)\), \(p(x_1)\), and \(p(x_2)\); k2 fixes \(p(x_0, x_1)\), \(p(x_0, x_2)\), and \(p(x_1, x_2)\) (as in the maxent_dist example above), and finally k3 fixes \(p(x_0, x_1, x_2)\) (e.g. is the distribution we started with).

Maximum Entropy Solver (IPF)

By default, maxent_dist computes the maximum entropy distribution using Iterative Proportional Fitting (IPF), the classic algorithm from reconstructability analysis and log-linear modeling. Starting from the uniform distribution, IPF cyclically rescales the working distribution so each constrained marginal matches the data, iterating until convergence. It is typically far faster than the general scipy convex optimizer. IPF converges only linearly on cyclic structures with induced structural zeros, however, so when it fails to converge within its iteration budget maxent_dist automatically falls back to the scipy optimizer to preserve accuracy. The scipy backend can also be requested explicitly via method='scipy':

In [10]: print(maxent_dist(xor, [[0,1], [0,2], [1,2]], method='scipy'))
Class:    Distribution
Alphabet: (('0', '1'), ('0', '1'), ('0', '1'))
Base:     linear

x                 p(X0,X1,X2)
('0', '0', '0')   1/8
('0', '0', '1')   1784702/14277617
('0', '1', '0')   2075854/16606833
('0', '1', '1')   2309353/18474823
('1', '0', '0')   1897263/15178103
('1', '0', '1')   1/8
('1', '1', '0')   5109551/40876407
('1', '1', '1')   1/8

Reconstructability Analysis

The maximum entropy reconstruction underlies reconstructability analysis, which decomposes a distribution into a structure of marginals and assesses each structure by two quantities: its error (transmission, the information lost relative to the data) and its complexity (degrees of freedom, the number of free parameters). The dependency decomposition (see Information Profiles) evaluates these over the whole lattice of structures, yielding the “decomposition spectrum”:

In [11]: from dit.algorithms import degrees_of_freedom

In [12]: from dit.multivariate import transmission

In [13]: from dit.profiles import DependencyDecomposition

In [14]: from dit.multivariate import entropy

In [15]: print(DependencyDecomposition(xor, measures={'H': entropy, 'T': transmission, 'df': degrees_of_freedom}))
+-----------------------------------+
|      Dependency Decomposition     |
+------------+--------+--------+----+
| dependency |   H    |   T    | df |
+------------+--------+--------+----+
|    012     |  2.000 |  0.000 | 7  |
|  01:02:12  |  3.000 |  1.000 | 6  |
|   01:02    |  3.000 |  1.000 | 5  |
|   01:12    |  3.000 |  1.000 | 5  |
|   02:12    |  3.000 |  1.000 | 5  |
|    01:2    |  3.000 |  1.000 | 4  |
|    02:1    |  3.000 |  1.000 | 4  |
|    12:0    |  3.000 |  1.000 | 4  |
|   0:1:2    |  3.000 |  1.000 | 3  |
+------------+--------+--------+----+

Optimization Backends

By default, dit uses NumPy and SciPy for numerical optimization. Three additional backends are available that leverage automatic differentiation for computing exact gradients, which can improve convergence for large or complex problems:

JAX Backend

Install with pip install "dit[jax]". The JAX backend (dit.algorithms.optimization_jax) provides:

Automatic differentiation via jax.grad for exact gradient computation
JIT compilation for improved performance
GPU/TPU acceleration when available

PyTorch Backend

Install with pip install "dit[torch]". The PyTorch backend (dit.algorithms.optimization_torch) provides:

Automatic differentiation via torch.autograd for exact gradient computation
GPU acceleration via CUDA or MPS when available
torch.compile support for PyTorch 2.0+

PyTensor Backend

Install with pip install "dit[pytensor]". The PyTensor backend (dit.algorithms.optimization_pytensor) uses PyTensor (the maintained successor to the archived Aesara) and provides:

Symbolic graph compilation of the objective, its exact gradient via pytensor.grad, and each constraint Jacobian, compiled once and reused across the optimization
A native augmented-Lagrangian solver (compiled value/gradient with L-BFGS-B inner solves) for moderate problem sizes, with a SciPy SLSQP fallback
Optional Numba compilation of the compiled functions (set DIT_PYTENSOR_MODE=NUMBA)

Set DIT_PYTENSOR_COMPILEDIR to persist PyTensor’s compilation cache across process launches.

For measures that use the Markov variable optimizer (e.g. common informations), the backend can be selected via the backend parameter:

from dit.multivariate import wyner_common_information

# Default NumPy backend
wyner_common_information(d)

# JAX backend (requires jax)
wyner_common_information(d, backend='jax')

# PyTorch backend (requires torch)
wyner_common_information(d, backend='torch')

# PyTensor backend (requires pytensor)
wyner_common_information(d, backend='pytensor')

Logging

The optimization modules emit structured log messages via loguru. Logging is disabled by default. To enable it:

from loguru import logger
logger.enable("dit")

This will show optimization progress including problem dimensions, convergence status, and objective values.