Numpybased ScalarDistribution¶
ScalarDistributions are used to represent distributions over real numbers, for example a sixsided die or the number of heads when flipping 100 coins.
Playing with ScalarDistributions¶
First we will enable two optional features: printing fractions by default, and using __str__()
as __repr__()
. Be careful using either of these options, they can incur significant performance hits on some distributions.
In [1]: dit.ditParams['print.exact'] = dit.ditParams['repr.print'] = True
We next construct a sixsided die:
In [2]: from dit.example_dists import uniform
In [3]: d6 = uniform(1, 7)
In [4]: d6
Out[4]:
Class: ScalarDistribution
Alphabet: (1, 2, 3, 4, 5, 6)
Base: linear
x p(x)
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
6 1/6
We can perform standard mathematical operations with scalars, such as adding, subtracting from or by, multiplying, taking the modulo, or testing inequalities.
In [5]: d6 + 3
Out[5]:
Class: ScalarDistribution
Alphabet: (4, 5, 6, 7, 8, 9)
Base: linear
x p(x)
4 1/6
5 1/6
6 1/6
7 1/6
8 1/6
9 1/6
In [6]: d6  1
Out[6]:
Class: ScalarDistribution
Alphabet: (0, 1, 2, 3, 4, 5)
Base: linear
x p(x)
0 1/6
1 1/6
2 1/6
3 1/6
4 1/6
5 1/6
In [7]: 10  d6
Out[7]:
Class: ScalarDistribution
Alphabet: (4, 5, 6, 7, 8, 9)
Base: linear
x p(x)
4 1/6
5 1/6
6 1/6
7 1/6
8 1/6
9 1/6
In [8]: 2 * d6
Out[8]:
Class: ScalarDistribution
Alphabet: (2, 4, 6, 8, 10, 12)
Base: linear
x p(x)
2 1/6
4 1/6
6 1/6
8 1/6
10 1/6
12 1/6
In [9]: d6 % 2
Out[9]:
Class: ScalarDistribution
Alphabet: (0, 1)
Base: linear
x p(x)
0 1/2
1 1/2
In [10]: (d6 % 2).is_approx_equal(d6 <= 3)
Out[10]: True
Furthermore, we can perform such operations with two distributions:
In [11]: d6 + d6
Out[11]:
Class: ScalarDistribution
Alphabet: (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12)
Base: linear
x p(x)
2 1/36
3 1/18
4 1/12
5 1/9
6 5/36
7 1/6
8 5/36
9 1/9
10 1/12
11 1/18
12 1/36
In [12]: (d6 + d6) % 4
Out[12]:
Class: ScalarDistribution
Alphabet: (0, 1, 2, 3)
Base: linear
x p(x)
0 1/4
1 2/9
2 1/4
3 5/18
In [13]: d6 // d6
Out[13]:
Class: ScalarDistribution
Alphabet: (0, 1, 2, 3, 4, 5, 6)
Base: linear
x p(x)
0 5/12
1 1/3
2 1/9
3 1/18
4 1/36
5 1/36
6 1/36
In [14]: d6 % (d6 % 2 + 1)
Out[14]:
Class: ScalarDistribution
Alphabet: (0, 1)
Base: linear
x p(x)
0 3/4
1 1/4
There are also statistical functions which can be applied to ScalarDistributions
:
In [15]: from dit.algorithms.stats import *
In [16]: median(d6+d6)
Out[16]: 7.0
In [17]: from dit.example_dists import binomial
In [18]: d = binomial(10, 1/3)
In [19]: d
Out[19]:
Class: ScalarDistribution
Alphabet: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
Base: linear
x p(x)
0 409/23585
1 4302/49615
2 1280/6561
3 5120/19683
4 4480/19683
5 896/6561
6 1120/19683
7 320/19683
8 20/6561
9 9/26572
10 1/59046
In [20]: mean(d)
Out[20]: 3.3333333333333335
In [21]: median(d)
Out[21]: 3.0
In [22]: standard_deviation(d)
Out[22]: 1.4907119849998596
API¶

ScalarDistribution.
__init__
(outcomes, pmf=None, sample_space=None, base=None, prng=None, sort=True, sparse=True, trim=True, validate=True)[source]¶ Initialize the distribution.
Parameters:  outcomes (sequence, dict) – The outcomes of the distribution. If outcomes is a dictionary, then the keys are used as outcomes, and the values of the dictionary are used as pmf instead. Note: an outcome is any hashable object (except None) which is equality comparable. If sort is True, then outcomes must also be orderable.
 pmf (sequence) – The outcome probabilities or log probabilities. If None, then outcomes is treated as the probability mass function and the outcomes are consecutive integers beginning from zero.
 sample_space (sequence) – A sequence representing the sample space, and corresponding to the complete set of possible outcomes. The order of the sample space is important. If None, then the outcomes are used to determine the sample space instead.
 base (float, None) – If pmf specifies log probabilities, then base should specify the base of the logarithm. If ‘linear’, then pmf is assumed to represent linear probabilities. If None, then the value for base is taken from ditParams[‘base’].
 prng (RandomState) – A pseudorandom number generator with a rand method which can generate random numbers. For now, this is assumed to be something with an API compatible to NumPy’s RandomState class. This attribute is initialized to equal dit.math.prng.
 sort (bool) – If True, then the sample space is sorted before finalizing it. Usually, this is desirable, as it normalizes the behavior of distributions which have the same sample space (when considered as a set). Note that addition and multiplication of distributions is defined only if the sample spaces (as tuples) are equal.
 sparse (bool) – Specifies the form of the pmf. If True, then outcomes and pmf will only contain entries for nonnull outcomes and probabilities, after initialization. The order of these entries will always obey the order of sample_space, even if their number is not equal to the size of the sample space. If False, then the pmf will be dense and every outcome in the sample space will be represented.
 trim (bool) – Specifies if nulloutcomes should be removed from pmf when make_sparse() is called (assuming sparse is True) during initialization.
 validate (bool) – If True, then validate the distribution. If False, then assume the distribution is valid, and perform no checks.
Raises: InvalidDistribution
– If the length of values and outcomes are unequal. See
validate()
for a list of other potential exceptions.