Notation

dit is a scientific tool, and so, much of this documentation will contain mathematical expressions. Here we will describe this notation.

Basic Notation

A random variable \(X\) consists of outcomes \(x\) from an alphabet \(\mathcal{X}\). As such, we write the entropy of a distribution as \(\H{X} = \sum_{x \in \mathcal{X}} p(x) \log_2 p(x)\), where \(p(x)\) denote the probability of the outcome \(x\) occuring.

Many distributions are joint distribution. In the absence of variable names, we index each random variable with a subscript. For example, a distribution over three variables is written \(X_0X_1X_2\). As a shorthand, we also denote those random variables as \(X_{0:3}\), meaning start with \(X_0\) and go through, but not including \(X_3\) — just like python slice notation.

If a set of variables \(X_{0:n}\) are independent, we will write \(\ind X_{0:n}\). If a set of variables \(X_{0:n}\) are independent conditioned on \(V\), we write \(\ind X_{0:n} \mid V\).

If we ever need to describe an infinitely long chain of variables we drop the index from the side that is infinite. So \(X_{:0} = \ldots X_{-3}X_{-2}X_{-1}\) and \(X_{0:} = X_0X_1X_2\ldots\). For an arbitrary set of indices \(A\), the corresponding collection of random variables is denoted \(X_A\). For example, if \(A = \{0,2,4\}\), then \(X_A = X_0 X_2 X_4\). The complement of \(A\) (with respect to some universal set) is denoted \(\overline{A}\).

Furthermore, we define \(0 \log_2 0 = 0\).

Advanced Notation

When there exists a function \(Y = f(X)\) we write \(X \imore Y\) meaning that \(X\) is informationally richer than \(Y\). Similarly, if \(f(Y) = X\) then we write \(X \iless Y\) and say that \(X\) is informationally poorer than \(Y\). If \(X \iless Y\) and \(X \imore Y\) then we write \(X \ieq Y\) and say that \(X\) is informationally equivalent to \(Y\). Of all the variables that are poorer than both \(X\) and \(Y\), there is a richest one. This variable is known as the meet of \(X\) and \(Y\) and is denoted \(X \meet Y\). By definition, \(\forall Z s.t. Z \iless X\) and \(Z \iless Y, Z \iless X \meet Y\). Similarly of all variables richer than both \(X\) and \(Y\), there is a poorest. This variable is known as the join of \(X\) and \(Y\) and is denoted \(X \join Y\). The joint random variable \((X,Y)\) and the join are informationally equivalent: \((X,Y) \ieq X \join Y\).

Lastly, we use \(X \mss Y\) to denote the minimal sufficient statistic of \(X\) about the random variable \(Y\).