Partial Information Decomposition

The partial information decomposition (PID), put forth by Williams & Beer [WB10], is a framework for decomposing the information shared between a set of variables we will refer to as inputs, \(X_0, X_1, \ldots\), and another random variable we will refer to as the output, \(Y\). This decomposition seeks to partition the information \(\I{X_0,X_1,\ldots : Y}\) among the antichains of the inputs.

Background

It is often desirable to determine how a set of inputs influence the behavior of an output. Consider the exclusive or logic gates, for example:

In [1]: from dit.pid.distributions import bivariates, trivariates

In [2]: xor = bivariates['synergy']

In [3]: print(xor)
Class:    Distribution
Alphabet: (('0', '1'), ('0', '1'), ('0', '1'))
Base:     linear

x                 p(X0,X1,X2)
('0', '0', '0')   1/4
('0', '1', '1')   1/4
('1', '0', '1')   1/4
('1', '1', '0')   1/4

We can see from inspection that either input (the first two indexes) is independent of the output (the final index), yet the two inputs together determine the output. One could call this “synergistic” information. Next, consider the giant bit distribution:

In [4]: gb = bivariates['redundant']

In [5]: print(gb)
Class:    Distribution
Alphabet: (('0', '1'), ('0', '1'), ('0', '1'))
Base:     linear

x                 p(X0,X1,X2)
('0', '0', '0')   1/2
('1', '1', '1')   1/2

Here, we see that either input informs us of exactly what the output is. One could call this “redundant” information. Furthermore, consider the Co-Information of these distributions:

In [6]: from dit.multivariate import coinformation as I

In [7]: I(xor)
Out[7]: -1.0

In [8]: I(gb)
Out[8]: 1.0

This could lead one to intuit that negative values of the coinformation correspond to synergistic effects in a distribution, while positive values correspond to redundant effects. This intuition, however, is at best misleading: the coinformation of a 4-variable giant bit and 4-variable parity distribution are both positive:

In [9]: I(dit.example_dists.giant_bit(4, 2))
Out[9]: 1.0

In [10]: I(dit.example_dists.n_mod_m(4, 2))
Out[10]: 1.0

This, as well as other issues, lead Williams & Beer [WB10] to propose the partial information decomposition.

Framework

The goal of the partial information is to assign to each some non-negative portion of \(\I{\{X_i\} : Y}\) to each antichain over the inputs. An antichain over the inputs is a set of sets, where each of those sets is not a subset of any of the others. For example, \(\left\{ \left\{X_0, X_1\right\}, \left\{X_1, X_2\right\} \right\}\) is an antichain, but \(\left\{ \left\{X_0, X_1\right\}, \left\{X_0 X_1, X_2\right\} \right\}\) is not.

The antichains for a lattice based on this partial order:

\[\alpha \leq \beta \iff \forall \mathbf{b} \in \beta, \exists \mathbf{a} \in \alpha, \mathbf{a} \subseteq \mathbf{b}\]

From here, we wish to find a redundancy measure, \(\Icap{\bullet}\) which would assign a fraction of \(\I{\{X_i\} : Y}\) to each antichain intuitively quantifying what portion of the information in the output could be learned by observing any of the sets of variables within the antichain. In order to be a viable measure of redundancy, there are several axioms a redundancy measure must satisfy.

Bivariate Lattice

Let us consider the special case of two inputs. The lattice consists of four elements: \(\left\{\left\{X_0\right\}, \left\{X_1\right\}\right\}\), \(\left\{\left\{X_0\right\}\right\}\), \(\left\{\left\{X_1\right\}\right\}\), and \(\left\{\left\{X_0, X_1\right\}\right\}\). We can interpret these elements as the redundancy provided by both inputs, the information uniquely provided by \(X_0\), the information uniquely provided by \(X_1\), and the information synergistically provided only by both inputs together. Together these for elements decompose the input-output mutual information:

\[\I{X_0, X_1 : Y} = \Ipart{\left\{X_0\right\}, \left\{X_1\right\} : Y} + \Ipart{\left\{X_0\right\} : Y} + \Ipart{\left\{X_1\right\} : Y} + \Ipart{\left\{X_0, X_1\right\} : Y}\]

Furthermore, due to the self-redundancy axiom (described ahead), the single-input mutual informations decomposed in the following way:

\[ \begin{align}\begin{aligned}\I{X_0 : Y} = \Ipart{\left\{X_0\right\}, \left\{X_1\right\} : Y} + \Ipart{\left\{X_0\right\} : Y}\\\I{X_1 : Y} = \Ipart{\left\{X_0\right\}, \left\{X_1\right\} : Y} + \Ipart{\left\{X_1\right\} : Y}\end{aligned}\end{align} \]

Colloquially, from input \(X_0\) one can learn what is redundantly provided by either input, plus what is uniquely provided by \(X_0\), but not what is uniquely provided by \(X_1\) or what can only be learned synergistically from both inputs.

Axioms

The following three axioms were provided by Williams & Beer.

Symmetry

The redundancy \(\Icap{X_{0:n} : Y}\) is invariant under reorderings of \(X_i\).

Self-Redundancy

The redundancy of a single input is its mutual information with the output:

\[\Icap{X_i : Y} = \I{X_i : Y}\]

Monotonicity

The redundancy should only decrease with in inclusion of more inputs:

\[\Icap{\mathcal{A}_1, \ldots, \mathcal{A}_{k-1}, \mathcal{A}_k : Y} \leq \Icap{\mathcal{A}_1, \ldots, \mathcal{A}_{k-1} : Y}\]

with equality if \(\mathcal{A}_{k-1} \subseteq \mathcal{A}_k\).

There have been other axioms proposed following from those of Williams & Beer.

Identity

The identity axiom [HSP13] states that if the output is identical to the inputs, then the redundancy is the mutual information between the inputs:

\[\Icap{X_0, X_1 : \left(X_0, X_1\right)} = \I{X_0 : X_1}\]

Target (output) Monotonicity

This axiom states that redundancy can not increase when replacing the output by a function of itself.

\[\Icap{X_{0:n} : Y} \ge \Icap{X_{0:n} : f(Y)}\]

It first appeared in [BROJ13] and was expanded upon in [RBO+17].

Measures

We now turn our attention a variety of methods proposed to flesh out this partial information decomposition.

In [11]: from dit.pid import *

\(\Imin{\bullet}\)

\(\Imin{\bullet}\)[WB10] was Williams & Beer’s initial proposal for a redundancy measure. It is given by:

\[\Imin{\mathcal{A}_1, \mathcal{A}_2, \ldots : Y} = \sum_{y \in Y} p(y) \min_{\mathcal{A}_i} \I{\mathcal{A}_i : Y=y}\]

However, this measure has been criticized for acting in an unintuitive manner [GK14]:

In [12]: d = dit.Distribution(['000', '011', '102', '113'], [1/4]*4)

In [13]: PID_WB(d)
Out[13]: 
+--------------------------+
|          I_min           |
+--------+--------+--------+
| I_min  |  I_r   |   pi   |
+--------+--------+--------+
| {0:1}  | 2.0000 | 1.0000 |
|  {0}   | 1.0000 | 0.0000 |
|  {1}   | 1.0000 | 0.0000 |
| {0}{1} | 1.0000 | 1.0000 |
+--------+--------+--------+

We have constructed a distribution whose inputs are independent random bits, and whose output is the concatenation of those inputs. Intuitively, the output should then be informed by one bit of unique information from \(X_0\) and one bit of unique information from \(X_1\). However, \(\Imin{\bullet}\) assesses that there is one bit of redundant information, and one bit of synergistic information. This is because \(\Imin{\bullet}\) quantifies redundancy as the least amount of information one can learn about an output given any single input. Here, however, the one bit we learn from \(X_0\) is, in a sense, orthogonal from the one bit we learn from \(X_1\). This observation has lead to much of the follow-on work.

\(\Immi{\bullet}\)

One potential measure of redundancy is the minimum mutual information [BROJ13]:

\[\Immi{X_{0:n} : Y} = \min_{i} \I{X_i : Y}\]

This measure, though crude, is known to be correct for multivariate gaussian variables [OBR15].

\(\Iwedge{\bullet}\)

Redundancy seems to intuitively be related to common information Common Informations. This intuition lead to the development of \(\Iwedge{\bullet}\) [GCJ+14]:

\[\Iwedge{X_{0:n} : Y} = \I{ \meet X_i : Y}\]

That is, redundancy is the information the Gács-Körner Common Information of the inputs shares with the output.

Warning

This measure can result in a negative PID.

\(\Iproj{\bullet}\)

Utilizing information geometry, Harder et al [HSP13] have developed a strictly bivariate measure of redundancy, \(\Iproj{\bullet}\):

\[\Iproj{\left\{X_0\right\}\left\{X_1\right\} : Y} = \min \{ I^\pi_Y[X_0 \mss X_1], I^\pi_Y[X_1 \mss X_0] \}\]

where

\[ \begin{align}\begin{aligned}I^\pi_Y[X_0 \mss X_1] = \sum_{x_0, y} p(x_0, y) \log \frac{p_{(x_0 \mss X_1)}(y)}{p(y)}\\p_{(x_0 \mss X_1)}(Y) = \pi_{C_{cl}(\langle X_1 \rangle_Y)}(p(Y | x_0)\\\pi_B(p) = \arg \min_{r \in B} \DKL{p || r}\\C_{cl}(\langle X_1 \rangle_Y) = C_{cl}(\left\{p(Y | x_1) : x_1 \in X_1 \right\})\end{aligned}\end{align} \]

where \(C_{cl}(\bullet)\) denotes closure. Intuitively, this measures seeks to quantify redundancy as the minimum of how much \(p(Y | X_0)\) can be expressed when \(X_0\) is projected on to \(X_1\), and vice versa.

\(\Ibroja{\bullet}\)

In a very intuitive effort, Bertschinger et al (henceforth BROJA) [BRO+14, GK14] defined unique information as the minimum conditional mutual informations obtainable while holding the input-output marginals fixed:

\[ \begin{align}\begin{aligned}\Delta = \{ Q : \forall i : p(x_i, y) = q(x_i, y) \}\\\Ibroja{X_{0:n} : Y} = \min_{Q \in \Delta} \I{X_i : Y | X_{\overline{\{i\}}}}\end{aligned}\end{align} \]

For bivariate sources (two inputs), PID_BROJA accepts a method keyword selecting how the marginal-matching optimization is solved:

'scipy' (default for small alphabets under 'auto') — SLSQP on the free joint pmf parameters.
'admui' — alternating divergence minimization [BRMontufar17].
'cone' — exponential cone program solved with ECOS [MTV18] (requires pip install dit[broja]).
'auto' — picks scipy or admui by joint alphabet size, with fallback to the other methods on failure.

Note

In the bivariate case, Griffith independently suggested the same decomposition but from the viewpoint of synergy [GK14].

The BROJA measure has recently been criticized for behaving in an unintuitive manner on some examples. Consider the reduced or distribution:

In [14]: bivariates['reduced or']
Out[14]: 
Class:    Distribution
Alphabet: (('0', '1'), ('0', '1'), ('0', '1'))
Base:     linear

x                 p(X0,X1,X2)
('0', '0', '0')   1/2
('0', '1', '1')   1/4
('1', '0', '1')   1/4

In [15]: print(PID_BROJA(bivariates['reduced or']))
+---------------------------+
|          I_broja          |
+---------+--------+--------+
| I_broja |  I_r   |   pi   |
+---------+--------+--------+
|  {0:1}  | 1.0000 | 0.6887 |
|   {0}   | 0.3113 | 0.0000 |
|   {1}   | 0.3113 | 0.0000 |
|  {0}{1} | 0.3113 | 0.3113 |
+---------+--------+--------+

We see that in this instance BROJA assigns no partial information to either unique information. However, it is not difficult to argue that in the case that either input is a 1, that input then has unique information regarding the output.

\(\Iproj{\bullet}\) and \(\Ibroja{\bullet}\) are Distinct

In the BROJA paper [BRO+14] the only example given where their decomposition differs from that of Harder et al. is the dit.example_dists.summed_dice(). We can find a simpler example where they differ using hypothesis:

In [16]: from hypothesis import find

In [17]: from dit.utils.testing import distribution_structures

In [18]: find(distribution_structures(3, 2, True), lambda d: PID_Proj(d) != PID_BROJA(d))
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/pandas/core/indexes/base.py:3641, in Index.get_loc(self, key)
   3640 try:
-> 3641     return self._engine.get_loc(casted_key)
   3642 except KeyError as err:

File pandas/_libs/index.pyx:168, in pandas._libs.index.IndexEngine.get_loc()
--> 168 'Could not get source, probably due dynamically evaluated source code.'
File pandas/_libs/index.pyx:176, in pandas._libs.index.IndexEngine.get_loc()
--> 176 'Could not get source, probably due dynamically evaluated source code.'
File pandas/_libs/index.pyx:583, in pandas._libs.index.StringObjectEngine._check_type()
--> 583 'Could not get source, probably due dynamically evaluated source code.'
KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/xarray/core/indexes.py:878, in PandasIndex.sel(self, labels, method, tolerance)
    877 try:
--> 878     indexer = self.index.get_loc(label_value)
    879 except KeyError as e:

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/pandas/core/indexes/base.py:3648, in Index.get_loc(self, key)
   3647         raise InvalidIndexError(key) from err
-> 3648     raise KeyError(key) from err
   3649 except TypeError:
   3650     # If we have a listlike key, _check_indexing_error will raise
   3651     #  InvalidIndexError. Otherwise we fall through and re-raise
   3652     #  the TypeError.

KeyError: 0

The above exception was the direct cause of the following exception:

KeyError                                  Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/dit/distribution.py:2400, in Distribution.__getitem__(self, key)
   2399 try:
-> 2400     return self._coerce_prob(self.data.sel(sel2))
   2401 except KeyError as exc:

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/xarray/core/dataarray.py:1722, in DataArray.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   1614 """Return a new DataArray whose data is given by selecting index
   1615 labels along the specified dimension(s).
   1616 
   (...)   1720 Dimensions without coordinates: points
   1721 """
-> 1722 ds = self._to_temp_dataset().sel(
   1723     indexers=indexers,
   1724     drop=drop,
   1725     method=method,
   1726     tolerance=tolerance,
   1727     **indexers_kwargs,
   1728 )
   1729 return self._from_temp_dataset(ds)

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/xarray/core/dataset.py:3020, in Dataset.sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   3018         """
   3019         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 3020         query_results = map_index_queries(
   3021             self, indexers=indexers, method=method, tolerance=tolerance

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/xarray/core/indexing.py:214, in map_index_queries(obj, indexers, method, tolerance, **indexers_kwargs)
    213     else:
--> 214         results.append(index.sel(labels, **options))
    216 merged = merge_sel_results(results)

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/xarray/core/indexes.py:880, in PandasIndex.sel(self, labels, method, tolerance)
    879         except KeyError as e:
--> 880             raise KeyError(
    881                 f"not all values found in index {coord_name!r}. "
    882                 "Try setting the `method` keyword argument (example: method='nearest')."
    883             ) from e
    885 elif label_array.dtype.kind == "b":

KeyError: "not all values found in index 'X0'. Try setting the `method` keyword argument (example: method='nearest')."

The above exception was the direct cause of the following exception:

InvalidOutcome                            Traceback (most recent call last)
Cell In[18], line 1
----> 1 find(distribution_structures(3, 2, True), lambda d: PID_Proj(d) != PID_BROJA(d))

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/hypothesis/core.py:2449, in find(specifier, condition, settings, random, database_key)
   2446 test._hypothesis_internal_database_key = database_key  # type: ignore
   2448 try:
-> 2449     test()
   2450 except Found:
   2451     return last[0]

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/hypothesis/core.py:2437, in find.<locals>.test()
   2432 specifier.validate()
   2434 last: list[Ex] = []
   2436 @settings
-> 2437 @given(specifier)
   2438 def test(v):
   2439     if condition(v):
   2440         last[:] = [v]

    [... skipping hidden 1 frame]

Cell In[18], line 1, in <lambda>(d)
----> 1 find(distribution_structures(3, 2, True), lambda d: PID_Proj(d) != PID_BROJA(d))

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/dit/pid/pid.py:149, in BasePID.__init__(self, dist, sources, target, reds, pis, compute, **kwargs)
    146 self._pis = {} if pis is None else pis
    148 if compute:
--> 149     self._compute()

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/dit/pid/pid.py:955, in BaseBivariatePID._compute(self)
    953 for node in self._lattice:
    954     if len(node) == 2 and node not in self._reds:
--> 955         self._reds[node] = self._measure(self._dist, node, self._target, **self._kwargs)
    957 super()._compute()

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/dit/pid/measures/iproj.py:223, in PID_Proj._measure(d, sources, target)
    220     msg = f"This method needs exact two sources, {len(sources)} given."
    221     raise ditException(msg)
--> 223 pi_0 = projected_information(d, sources[0], sources[1], target)
    224 pi_1 = projected_information(d, sources[1], sources[0], target)
    225 return min(pi_0, pi_1)

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/dit/pid/measures/iproj.py:184, in projected_information(dist, X, Y, Z)
    182 for x, p_proj_z in zip(p_x.outcomes, projections, strict=True):
    183     for z in p_z.outcomes:
--> 184         vals.append(p_xz[(x, z)] * np.log2(p_proj_z[z] / p_z[z]))
    185 val = np.nansum(vals)
    187 return val

File ~/checkouts/readthedocs.org/user_builds/dit/envs/latest/lib/python3.12/site-packages/dit/distribution.py:2402, in Distribution.__getitem__(self, key)
   2400         return self._coerce_prob(self.data.sel(sel2))
   2401     except KeyError as exc:
-> 2402         raise InvalidOutcome(msg=f"Outcome {key!r} is not in the sample space.") from exc
   2403 raise InvalidOutcome(msg=f"Invalid outcome: {key!r}")

InvalidOutcome: Outcome (0, 0) is not in the sample space.
Falsifying example: test(
    v=Class:    Distribution
    Alphabet: ((0, 1), (0, 1), (0, 1))
    Base:     linear
    
    x           p(X0,X1,X2)
    (0, 0, 0)   1/2
    (1, 1, 1)   1/2,  # or any other generated value
)

\(\Iccs{\bullet}\)

Taking a pointwise point of view, Ince has proposed a measure of redundancy based on the Co-Information [Inc17a]:

\[\Iccs{X_{0:n} : Y} = \sum p(x_0, \ldots, x_n, y) \I{x_0 : \ldots : x_n : y}~~\textrm{if}~~\operatorname{sign}(\I{x_i : y}) = \operatorname{sign}(\I{x_0 : \ldots : x_n : y})\]

While this measure behaves intuitively in many examples, it also assigns negative values to some partial information atoms in some instances.

This decomposition also displays an interesting phenomena, that of subadditive redundancy. The gband distribution is an independent mix of a giant bit (redundancy of 1 bit) and the and distribution (redundancy of 0.1038 bits), and yet gband has 0.8113 bits of redundancy:

In [19]: PID_CCS(bivariates['gband'])
Out[19]: 
+--------------------------+
|          I_ccs           |
+--------+--------+--------+
| I_ccs  |  I_r   |   pi   |
+--------+--------+--------+
| {0:1}  | 1.8113 | 0.0000 |
|  {0}   | 1.3113 | 0.5000 |
|  {1}   | 1.3113 | 0.5000 |
| {0}{1} | 0.8113 | 0.8113 |
+--------+--------+--------+

Warning

This measure can result in a negative PID.

\(\Idep{\bullet}\)

James et al [JEC17] have developed a method of quantifying unique information based on the Dependency Decomposition. Unique information from variable \(X_i\) is evaluated as the least change in sources-target mutual information when adding the constraint \(X_i Y\).

In [20]: PID_dep(bivariates['not two'])
Out[20]: 
+--------------------------+
|          I_dep           |
+--------+--------+--------+
| I_dep  |  I_r   |   pi   |
+--------+--------+--------+
| {0:1}  | 0.5710 | 0.5364 |
|  {0}   | 0.0200 | 0.0146 |
|  {1}   | 0.0200 | 0.0146 |
| {0}{1} | 0.0054 | 0.0054 |
+--------+--------+--------+

\(\Ipm{\bullet}\)

Also taking a pointwise view, Finn & Lizier’s \(\Ipm{\bullet}\) [FL18] instead splits the pointwise mutual information into two components:

\[i(s, t) = h(s) - h(s|t)\]

They then define two partial information lattices, one quantified locally by \(h(s)\) and the other by \(h(s|t)\). By averaging these local lattices and then recombining them, we arrive at a standard Williams & Beer redundancy lattice.

In [21]: PID_PM(bivariates['pnt. unq.'])
Out[21]: 
+--------------------------+
|           I_±            |
+--------+--------+--------+
|  I_±   |  I_r   |   pi   |
+--------+--------+--------+
| {0:1}  | 1.0000 | 0.0000 |
|  {0}   | 0.5000 | 0.5000 |
|  {1}   | 0.5000 | 0.5000 |
| {0}{1} | 0.0000 | 0.0000 |
+--------+--------+--------+

Warning

This measure can result in a negative PID.

\(\Isx{\bullet}\)

Shared Exclusions PID (\(I^{\mathrm{sx}}\)) by Makkeh et al.

Makkeh et al. propose a pointwise redundancy measure built on the notion of shared exclusions in probability space ([MCTV21]). Intuitively, the measure quantifies how much probability mass for the target Y is commonly excluded by observing sets of sources simultaneously — that common excluded mass is taken as the redundant (shared) contribution.

\[\Isx{X_{0:1} : Y} \;=\; \sum_{x_0,x_1,y} p(x_0,x_1,y)\; \log_2 \frac{p(x_0 \cup x_1 | y)}{p(x_0 \cup x_1)}\]

For the general multivariate definition (which is implemented in dit), refer to [MCTV21]. For large datasets with up to five source variables, refer to the reference implementation at https://github.com/Abzinger/SxPID

Warning

This measure can result in a negative PID.

\(\Irav{\bullet}\)

Taking a functional perspective as in \(\Iwedge{\bullet}\), \(\Irav{\bullet}\) defines bivariate redundancy as the maximum coinformation between the two sources \(X_0, X_1\), a target \(Y\), and a deterministic function of the inputs \(f(X_0,X_1)\).

\[\Irav{X_{0:2} : Y} = \max_f\left(\I{X_0\!:\!X_1\!:\!Y\!:\!f(X_0,X_1)}\right)\]

This measure is designed to exploit the conflation of synergy and redundancy in the three variable coinformation: \(\I{X_0\!:\!X_1\!:\!Y} = R - S\).

Note

TODO: reference. No canonical publication for \(\Irav{\bullet}\) has been identified in the dit source; a citation should be added if/when the origin paper is confirmed.

In [22]: PID_RAV(bivariates['pnt. unq.'])
Out[22]: 
+--------------------------+
|          I_RAV           |
+--------+--------+--------+
| I_RAV  |  I_r   |   pi   |
+--------+--------+--------+
| {0:1}  | 1.0000 | 0.0000 |
|  {0}   | 0.5000 | 0.5000 |
|  {1}   | 0.5000 | 0.5000 |
| {0}{1} | 0.0000 | 0.0000 |
+--------+--------+--------+

\(\Irr{\bullet}\)

In order to combine \(\Immi{\bullet}\) with the coinformation, Goodwell and Kumar [GK17] have introduced their rescaled redundancy:

\[ \begin{align}\begin{aligned}\Irr{X_0 : X_1} = R_{\text{min}} + I_{S} (\Immi{X_{0:2} : Y} - R_{\text{min}}\\R_{\text{min}} = \max\{ 0, \I{X_0 : X_1 : Y} \}\\I_{S} = \frac{\I{X_0 : X_1}}{\min\{ \H{X_0}, \H{X_1} \}}\end{aligned}\end{align} \]

In [23]: PID_RR(bivariates['pnt. unq.'])
Out[23]: 
+--------------------------+
|           I_rr           |
+--------+--------+--------+
|  I_rr  |  I_r   |   pi   |
+--------+--------+--------+
| {0:1}  | 1.0000 | 0.3333 |
|  {0}   | 0.5000 | 0.1667 |
|  {1}   | 0.5000 | 0.1667 |
| {0}{1} | 0.3333 | 0.3333 |
+--------+--------+--------+

\(\Ira{\bullet}\)

Drawing from the reconstructability analysis work of Zwick [Zwi04], we can define \(Ira{\bullet}\) as a restricted form of \(\Idep{\bullet}\).

Warning

This measure can result in a negative PID.

\(\Irdr{\bullet}\)

The measure of Mages & Rohner [MR23] can be interpreted as a pointwise version of \(\Ibroja{\bullet}\) to provide a non-negative partial information decomposition for an arbitrary number of sources. It obtains its operational interpretation from valuating the reachable decision regions (achievable type I/II error pairs) for each state of the target variable.

\(\Ict{\bullet}\)

Sigtermans [Sig20] proposes a bivariate redundancy measure based on causal tensors and path-based information flow between the two sources and the target.

\(\Ideg{\bullet}\)

The degradation intersection information of Kolchinsky [Kol22] (further developed by Gomes & Figueiredo [GF23]) defines redundancy as the maximum mutual information carried by a channel that is a Blackwell degradation of every source channel:

\[\Ideg{X_{0:n} \to Y} = \max_{Q :\; Q \preceq_d X_i \;\forall i} \I{Q : Y}\]

\(\Idelta{\bullet}\)

The \(\delta\)-PID of Banerjee, Olbrich, Jost & Rauh [BOJR18] quantifies unique information via the weighted output KL deficiency, the cost of approximating one source channel from the other via output randomization:

\[\delta(Y : X_0 \setminus X_1) = \inf_{P(X_0'|X_1)} \mathbb{E}_{Y}\!\left[ \DKL{P(X_0|Y) \;\|\; P(X_0'|X_1) \circ P(X_1|Y)} \right]\]

Redundancy is then min-symmetrized across the two sources.

\(\IdeltaLambda{\bullet}\)

Venkatesh, Gurushankar & Schamberg [VGS23] introduce a Lagrangian generalization that smoothly interpolates between the \(\delta\)-PID of [BOJR18] and the BROJA PID of [BRO+14]:

\[\delta_{\lambda}(Y : X_0 \setminus X_1) = \inf_{P(X_0'|X_1 Y)} \mathbb{E}_{Y}\!\left[\DKL{P(X_0|Y) \;\|\; P(X_0'|Y)}\right] + \lambda \, \I{Y : X_0' | X_1}\]

As \(\lambda \to \infty\) the measure recovers \(\Idelta{\bullet}\); as \(\lambda \to 0\) it recovers \(\Ibroja{\bullet}\).

\(\Igh{\bullet}\)

The redundancy measure of Griffith & Ho [GCJ+14] constructs a shared-information quantity via an auxiliary-variable optimization over channels from the joint sources to an intersection variable.

Note

TODO: confirm the canonical Griffith & Ho reference. The implementation in dit only identifies the authors; [GCJ+14] is used here as the best match among published Griffith papers in the PID literature.

\(\Iig{\bullet}\)

Niu & Quinn [NQ19] propose an information-geometric PID: the synergy is defined as the KL divergence from the true joint distribution to the nearest distribution satisfying \(X_0 - X_1 - Y\) or \(X_1 - X_0 - Y\) Markov chains, minimized over a convex mixture of the two.

\(\Iipid{\bullet}\)

The I-PID of Venkatesh, Gurushankar & Schamberg [VGS23] defines unique information via an information deficiency that maximizes the mutual-information gap attainable by a Markov test channel \(T - Y - (X_0, X_1)\):

\[\delta_I(Y : X_0 \setminus X_1) = \sup_{P(T|Y)} \left[ \I{T : X_0} - \I{T : X_1} \right]\]

Redundancy is obtained by min-symmetrization, analogously to \(\Idelta{\bullet}\).

\(\Imc{\bullet}\)

The more-capable intersection information of Gomes & Figueiredo [GF23] (building on [Kol22]) replaces the Blackwell-degradation order in \(\Ideg{\bullet}\) with the weaker more-capable channel preorder:

\[\Imc{X_{0:n} \to Y} = \max_{Q :\; Q \preceq_{\mathrm{mc}} X_i \;\forall i} \I{Q : Y}\]

where \(Q \preceq_{\mathrm{mc}} X_i\) iff \(\I{Q : Y} \leq \I{X_i : Y}\) for every input distribution on \(Y\).

\(\Imes{\bullet}\)

\(\Imes{\bullet}\) is a maximum-entropy PID inspired by BROJA’s “*” marginal-consistency assumption.

Note

TODO: reference. No canonical publication has been identified for \(\Imes{\bullet}\) in the dit source; the measure is described only as “inspired by BROJA’s * assumption”.

\(\Iprec{\bullet}\)

Kolchinsky [Kol22] defines a redundancy measure using the Blackwell precedence order on channels:

\[\Iprec{X_{0:n} \to Y} = \min_{s_{Q|Y}} \I{Y : Q} \quad \text{s.t.} \quad s_{Q|Y} \preceq p_{X_i|Y} \;\;\forall i\]

The constraint set is a convex polytope and the objective is convex, so the optimum lies at a vertex.

Secret Key Agreement Rates

One can associate Secret Key Agreement rates with unique informations [BG15] by considering the rate at which one source and the target can agree upon a secret key while the other source eavesdrops. This results in four possibilities: - neither source nor target communicate - only the source communicates - only the target communicates - both the source and the target communicate

No Communication

\[\Ipart{X_i \rightarrow Y \setminus X_j} = \operatorname{S}[X_i : Y || X_j]\]

Warning

This measure can result in an inconsistent PID.

One-Way Communication

Camel

\[\Ipart{X_i \rightarrow Y \setminus X_j} = \operatorname{S}[X_i \rightarrow Y || X_j]\]

Elephant

\[\Ipart{X_i \rightarrow Y \setminus X_j} = \operatorname{S}[X_i \leftarrow Y || X_j]\]

Warning

This measure can result in an inconsistent PID.

Two-Way Communication

\[\Ipart{X_i \rightarrow Y \setminus X_j} = \operatorname{S}[X_i \leftrightarrow Y || X_j]\]

Warning

This measure can result in an inconsistent PID.

Partial Entropy Decomposition

Ince [Inc17b] proposed applying the PID framework to decompose multivariate entropy (without considering information about a separate target variable). This partial entropy decomposition (PED), seeks to partition a mutlivariate entropy \(\H{X_0,X_1,\ldots}\) among the antichains of the variables. The PED perspective shows that bivariate mutual information is equal to the difference between redundant entropy and synergistic entropy.

\[\I{X_0 : X_1} = \Hpart{\left\{X_0\right\}, \left\{X_1\right\}} - \Hpart{\left\{X_0,X_1\right\}}\]

\(\Hcs{\bullet}\)

Taking a pointwise point of view, following \(\Iccs{\bullet}\), Ince has proposed a measure of redundant entropy based on the Co-Information [Inc17b]:

\[\Hcs{X_{0:n}} = \sum p(x_0, \ldots, x_n) \I{x_0 : \ldots : x_n}~~\textrm{if}~~(\I{x_0 : \ldots : x_n} > 0)\]

While this measure behaves intuitively in many examples, it also assigns negative values to some partial entropy atoms in some instances. However, Ince [Inc17b] argues that concepts such as mechanistic information redundnacy (non-zero information redundancy between independent predictors, c.f. AND) necessitate negative partial entropy terms.

Like \(\Iccs{\bullet}\), \(\Hcs{\bullet}\) is also subadditive.

In [24]: PED_CS(dit.Distribution(['00','01','10','11'],[0.25]*4))
Out[24]: 
+--------------------------+
|           H_cs           |
+--------+--------+--------+
|  H_cs  |  H_r   |  H_d   |
+--------+--------+--------+
| {0:1}  | 2.0000 | 0.0000 |
|  {0}   | 1.0000 | 1.0000 |
|  {1}   | 1.0000 | 1.0000 |
| {0}{1} | 0.0000 | 0.0000 |
+--------+--------+--------+