1. Statistics background
Let's fix a finite alphabet $X$ of size $d$. We denote by $X^n$ the set of sequences of length $n$ on $X$. We denote by $X^*$ (resp. $X^{\omega}$) the set of finite (resp. infinite) sequences on $X$. As usual, the concatenation of words is denoted by $u \cdot v$, or simply $uv$. The length of a finite word $u$ is denoted by $|u|$. We assume that $X$ is given the uniform probability measure $\mu(x) = d^{-1}$. This measure naturally extends to $X^n$, and $X^{\omega}$. For instance,
$$
\begin{equation*}
\forall n,~ \forall w \in X^n,~ \mu (w \cdot X^{\omega}) = \mu(w_1) \dots \mu(w_n)
\end{equation*}
$$
We are interested in finding what it means for an individual sequence in $X^{\omega}$ to be random. Martin-Löf's definition is inspired by the notion of statistical tests. We start with a basic example. Some of the commonly accepted (or "ought to be") features of random sequences comprise several laws of large numbers. For example, in a "truly" infinite random sequence $S$, the frequency of any symbol $x \in X$ should be close to $\mu(x)$. Put another way, after fixing some arbitrary $k$, we can compute the frequency $freq(x,S \upharpoonright k)$ of the symbol $x$ in the prefix $S \upharpoonright k$ of length $k$, and if we find that this quantity deviates too much from $\mu(x)$, we conclude that the sequence $S$ is not random.
Reformulating this in the statististics jargon, we are designing a test for the null hypothesis "$H_0$: the sequence is random" with the statistics given by the frequency of the symbol $x$ in the prefix of length $k$ of $S$. Obviously, the fact that this statistics deviates from its expected value does not ensure at 100% that the sequence is "not random", but we want to bound the probability of such an error. Hence, we fix a confidence level $\epsilon > 0$, define a function $f(\epsilon)$ and subset $U(\epsilon) \subseteq X^*$ such that, for all $n$,
$$
\begin{align}
~&U(\epsilon) = \{ w \in X^*,~ |freq(x, w) - \mu(x)| > f(\epsilon) \} \\
~&\mu(U(\epsilon)) < \epsilon
\end{align}
$$
The function $f$ can be chosen to be the best one satisfying this relation, i.e., any smaller function would increase $\mu(U(\epsilon))$ above $\epsilon$. The equation above means that if $S$ is drawn "randomly", then the probability that the frequency computed from $S \upharpoonright k$ deviates from its expected value by the quantity $f(\epsilon)$ is at most $\epsilon$. In other words, the probability that a randomly selected infinite sequence $S$ is rejected by the test, i.e., the probability that $S \upharpoonright n \in U(\epsilon)$, is bounded by $\epsilon$. Statisticians say that the set $U(\epsilon)$ is the critical region of the test at level $\epsilon$. Actually, a test can be viewed as a subset $U \subseteq \mathbb{R}^+ \times X^*$ with $U(\epsilon) = \{S,~ (\epsilon,S) \in U\}$.
Intuitively, the sequence $S$ is non-random if it is rejected by the test at all levels. Thus $S$ is random if
$$
\begin{equation}
S \in X^{\omega} - \bigcap_{\epsilon > 0} U(\epsilon) \cdot X^{\omega}
\end{equation}
$$
2. Effective tests
One is rightly dubious about the definition of randomness based solely on the frequency of a given symbol in the prefix of length $n$ of an infinite sequence. Following the original intuition, one would define a random sequence as a sequence that passes any imaginable statistical test. But we cannot consider all the tests, i.e., all the subsets of $\mathbb{R}^+ \times X^*$. The major contribution of Martin-Löf was to focus on the effective tests, that is, roughly speaking, tests that can be performed by a Turing machine.
To do so, instead of indexing the confidence level by $\epsilon \in \mathbb{R}^+$ and without loss of generality, we use the levels $\epsilon_m = 2^{-m}$, $m \in \mathbb{N}$. An effective test is then defined as a recursively enumerable subset $U \subseteq \mathbb{N} \times X^*$ such that the subsets $U(m) = \{w \in X^*,~ (m,w) \in U\}$ satisfies, for all $n,m$
$$
\begin{align}
~&U(m) \supseteq U(m+1)\\
~&\mu(U(m)) < \epsilon_m
\end{align}
$$
The major point is the requirement that the set $U$ is recursively enumerable. Roughly speaking, this means that there exists an algorithm that recognizes the inputs $(m,w) \in U$, i.e., that is able to detect sequences that fail the test at level $\epsilon_m$.
To be precise, one also requires that, if $(m,w) \in U$, then for all $n \leq m$, $v \sqsupseteq w$, $(n,v) \in U$. This point comes from the fact we want to test randomness for infinite sequences $S \in X^{\omega}$. If $(m, S \upharpoonright k) \in U$, i.e., $S$ is rejected by the test at level $\epsilon_m$, then it is natural to require that any longer prefix $S \upharpoonright l$, $l \geq k$, is also rejected by the test at less constrained levels $\epsilon_n$, $n \leq m$. Intuitively, since $S \upharpoonright l$ comprises the information in $S \upharpoonright k$, and if $S \upharpoonright k$ provides a clue to consider $S$ as being non-random, then $S \upharpoonright l$ should also cause $S$ to be considered non-random.
3. Universal test and random sequences
It is well known that there exists a universal Turing machine, i.e., a computer that is able to simulate every other computer. In the same spirit, Martin-Löf has proved that there exists a universal effective test $U$ that "simulates" every other effective test. More precisely, if $V$ is any other effective test, then there exists a constant $c$, depending only on $U$ and $V$, such that the critical regions of $U$ and $V$ satisfies, for all $m$,
$$
\begin{equation}
V_{m+c} \subseteq U_m
\end{equation}
$$
Now, let $R$ be the set of random sequences considering the universal test $U$, i.e.
$$
\begin{equation}
R = X^{\omega} - \bigcap_{m} U(m) \cdot X^{\omega}
\end{equation}
$$
Then, by the universal property of $U$, every $S \in R$ is random relatively to any effective test $V$.
A direct consequence of this definition is that $\mu(R) = 1$ since $\mu(U(m)) \rightarrow 0$
Finally, one should notice that the set of random sequences $R$ has been defined in terms of the probability distribution $\mu$. Actually, one could also take any probability measure on $X^{\omega}$, and define a set $R_{\mu}$ of infinite sequences random with respect to $\mu$. The only restriction is that $\mu$ must be computable, as otherwise, we cannot define effective tests. The main advantage of Martin-Löf's approach is that it can be easily transposed in many other measured topological space, as long as an appropriate definition of "computable stuff" is available.
pb
1. Per Martin-Löf, The Definition of Random Sequences, Information and Control, 9(6): 602 - 619, 1966.