Operations¶
There are several operations possible on joint random variables. Let’s consider the standard xor
distribution:
In [1]: d = dit.Distribution(['000', '011', '101', '110'], [1/4]*4)
In [2]: d.set_rv_names('XYZ')
Marginal¶
dit
supports two ways of selecting only a subset of random variables. marginal()
returns a distribution containing only the random variables specified, whereas marginalize()
return a distribution containing all random variables except the ones specified:
In [3]: In [3]: print(d.marginal('XY'))
Class: Distribution
Alphabet: ('0', '1') for all rvs
Base: linear
Outcome Class: str
Outcome Length: 2
RV Names: ('X', 'Y')
x p(x)
00 1/4
01 1/4
10 1/4
11 1/4
In [4]: Class: Distribution
In [5]: Alphabet: ('0', '1') for all rvs
...: Base: linear
...: Outcome Class: str
...: Outcome Length: 2
...: RV Names: ('X', 'Y')
...:
File "<ipython-input-5-1a7c698d2608>", line 1
Alphabet: ('0', '1') for all rvs
^
SyntaxError: invalid syntax
-
Distribution.
marginal
(rvs, rv_mode=None)[source]¶ Returns a marginal distribution.
- Parameters
rvs (list) – The random variables to keep. All others are marginalized.
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of self._rv_mode is consulted.
- Returns
d – A new joint distribution with the random variables in rvs kept and all others marginalized.
- Return type
joint distribution
-
Distribution.
marginalize
(rvs, rv_mode=None)[source]¶ Returns a new distribution after marginalizing random variables.
- Parameters
rvs (list) – The random variables to marginalize. All others are kept.
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of self._rv_mode is consulted.
- Returns
d – A new joint distribution with the random variables in rvs marginalized and all others kept.
- Return type
joint distribution
Conditional¶
We can also condition on a subset of random variables:
In [6]: In [5]: marginal, cdists = d.condition_on('XY')
-
Distribution.
condition_on
(crvs, rvs=None, rv_mode=None, extract=False)[source]¶ Returns distributions conditioned on random variables
crvs
.Optionally,
rvs
specifies which random variables should remain.NOTE: Eventually this will return a conditional distribution.
- Parameters
crvs (list) – The random variables to condition on.
rvs (list, None) – The random variables for the resulting conditional distributions. Any random variable not represented in the union of
crvs
andrvs
will be marginalized. IfNone
, then every random variable not appearing incrvs
is used.rv_mode (str, None) – Specifies how to interpret
crvs
andrvs
. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements ofcrvs
andrvs
are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random varible names. IfNone
, then the value ofself._rv_mode
is consulted, which defaults to ‘indices’.extract (bool) – If the length of either
crvs
orrvs
is 1 andextract
isTrue
, then instead of the new outcomes being 1-tuples, we extract the sole element to create scalar distributions.
- Returns
cdist (dist) – The distribution of the conditioned random variables.
dists (list of distributions) – The conditional distributions for each outcome in
cdist
.
Examples
First we build a distribution P(X,Y,Z) representing the XOR logic gate.
>>> pXYZ = dit.example_dists.Xor() >>> pXYZ.set_rv_names('XYZ')
We can obtain the conditional distributions P(X,Z|Y) and the marginal of the conditioned variable P(Y) as follows:
>>> pY, pXZgY = pXYZ.condition_on('Y')
If we specify
rvs='Z'
, then only ‘Z’ is kept and thus, ‘X’ is marginalized out:>>> pY, pZgY = pXYZ.condition_on('Y', rvs='Z')
We can condition on two random variables:
>>> pXY, pZgXY = pXYZ.condition_on('XY')
The equivalent call using indexes is:
>>> pXY, pZgXY = pXYZ.condition_on([0, 1], rv_mode='indexes')
Join¶
We can construct the join of two random variables:
Where \(\min\) is understood to be minimizing with respect to the entropy.
In [7]: In [11]: from dit.algorithms.lattice import join
-
join
(dist, rvs, rv_mode=None, int_outcomes=True)[source]¶ Returns the distribution of the join of random variables defined by rvs.
- Parameters
dist (Distribution) – The distribution which defines the base sigma-algebra.
rvs (list) – A list of lists. Each list specifies a random variable to be joined with the other lists. Each random variable can defined as a series of unique indexes. Multiple random variables can use the same index. For example, [[0,1],[1,2]].
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted.
int_outcomes (bool) – If True, then the outcomes of the join are relabeled as integers instead of as the atoms of the induced sigma-algebra.
- Returns
d – The distribution of the join.
- Return type
ScalarDistribution
-
insert_join
(dist, idx, rvs, rv_mode=None)[source]¶ Returns a new distribution with the join inserted at index idx.
The join of the random variables in rvs is constructed and then inserted into at index idx.
- Parameters
dist (Distribution) – The distribution which defines the base sigma-algebra.
idx (int) – The index at which to insert the join. To append the join, set idx to be equal to -1 or dist.outcome_length().
rvs (list) – A list of lists. Each list specifies a random variable to be met with the other lists. Each random variable can defined as a series of unique indexes. Multiple random variables can use the same index. For example, [[0,1],[1,2]].
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted.
- Returns
d – The new distribution with the join at index idx.
- Return type
Distribution
Meet¶
We can construct the meet of two random variabls:
Where \(\max\) is understood to be maximizing with respect to the entropy.
In [8]: In [13]: from dit.algorithms.lattice import meet
-
meet
(dist, rvs, rv_mode=None, int_outcomes=True)[source]¶ Returns the distribution of the meet of random variables defined by rvs.
- Parameters
dist (Distribution) – The distribution which defines the base sigma-algebra.
rvs (list) – A list of lists. Each list specifies a random variable to be met with the other lists. Each random variable can defined as a series of unique indexes. Multiple random variables can use the same index. For example, [[0,1],[1,2]].
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted.
int_outcomes (bool) – If True, then the outcomes of the meet are relabeled as integers instead of as the atoms of the induced sigma-algebra.
- Returns
d – The distribution of the meet.
- Return type
ScalarDistribution
-
insert_meet
(dist, idx, rvs, rv_mode=None)[source]¶ Returns a new distribution with the meet inserted at index idx.
The meet of the random variables in rvs is constructed and then inserted into at index idx.
- Parameters
dist (Distribution) – The distribution which defines the base sigma-algebra.
idx (int) – The index at which to insert the meet. To append the meet, set idx to be equal to -1 or dist.outcome_length().
rvs (list) – A list of lists. Each list specifies a random variable to be met with the other lists. Each random variable can defined as a series of unique indexes. Multiple random variables can use the same index. For example, [[0,1],[1,2]].
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted.
- Returns
d – The new distribution with the meet at index idx.
- Return type
Distribution
Minimal Sufficient Statistic¶
This method constructs the minimal sufficient statistic of \(X\) about \(Y\): \(X \mss Y\):
In [9]: In [18]: from dit.algorithms import insert_mss
Again, \(\min\) is understood to be over entropies.
-
mss
(dist, rvs, about=None, rv_mode=None, int_outcomes=True)[source]¶ - Parameters
dist (Distribution) – The distribution which defines the base sigma-algebra.
rvs (list) – A list of random variables to be compressed into a minimal sufficient statistic.
about (list) – A list of random variables for which the minimal sufficient static will retain all information about.
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted.
int_outcomes (bool) – If True, then the outcomes of the minimal sufficient statistic are relabeled as integers instead of as the atoms of the induced sigma-algebra.
- Returns
d – The distribution of the minimal sufficient statistic.
- Return type
ScalarDistribution
Examples
>>> d = Xor() >>> print(mss(d, [0], [1, 2])) Class: ScalarDistribution Alphabet: (0, 1) Base: linear x p(x) 0 0.5 1 0.5
-
insert_mss
(dist, idx, rvs, about=None, rv_mode=None)[source]¶ Inserts the minimal sufficient statistic of rvs about about into dist at index idx.
- Parameters
dist (Distribution) – The distribution which defines the base sigma-algebra.
idx (int) – The location in the distribution to insert the minimal sufficient statistic.
rvs (list) – A list of random variables to be compressed into a minimal sufficient statistic.
about (list) – A list of random variables for which the minimal sufficient static will retain all information about.
rv_mode (str, None) – Specifies how to interpret the elements of rvs. Valid options are: {‘indices’, ‘names’}. If equal to ‘indices’, then the elements of rvs are interpreted as random variable indices. If equal to ‘names’, the the elements are interpreted as random variable names. If None, then the value of dist._rv_mode is consulted.
- Returns
d – The distribution dist modified to contain the minimal sufficient statistic.
- Return type
Distribution
Examples
>>> d = Xor() >>> print(insert_mss(d, -1, [0], [1, 2])) Class: Distribution Alphabet: ('0', '1') for all rvs Base: linear Outcome Class: str Outcome Length: 4 RV Names: None x p(x) 0000 0.25 0110 0.25 1011 0.25 1101 0.25