dytuxh

Lovász Local Lemma (non-constructive)

2015-12-08T06:28:00.000-08:00

I read the short (and interesting) presentation of the original local Lovász lemma from a lecture by Alistair Sinclair. I report here the proof of this lemma (which was left as an exercise, although a simpler case was proven whose proof contains all the ingredients).

This lemma is extremely useful in the probabilistic method: if you can prove that the set of objects satisfying a certain number of constraints has positive probability (or density), then this set cannot be empty. In particular, an object satisfying those constraints do exist. However, this method does not explicitly build the desired object. The constructive version of the Lovász local lemma was given by Moser and Tardos, as explained in this previous post.

1. Lovász Local Lemma

Let $A_1,\dots,A_n$ be a set of bad events. These events are not necessarily mutually independent. For each index $1\le i \le n$, we denote by $D_i \subseteq \{1,\dots,n\}$ the set of indices such that the events $A_i, A_j$ with $j \not\in D_i$ are mutually independent. In other words, $D_i$ is the dependency set of the event $A_i$.

The lemma states:

If there exists real values $x_1,\dots,x_n \in [0,1)$ such that $$ \mathbb{P}(A_i) \le x_i \cdot \prod_{j \in D_i} (1-x_j) ~~~~~ (\star) $$ then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge \prod_{i=1}^n (1-x_i).$$

The proof goes as follows. We first show that for any index $1\le i \le n$ and any strict subset $S \subset \{1,\dots,n\}$, we have
$$
\mathbb{P}\left( A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right) \le x_i.
$$
We prove this claim by induction on the size $m = |S|$. The case $m = 0$, i.e., $S = \emptyset$, follows from $(\star)$. Now, assume the result holds for all subset of size less than $m$. We split the set $S$ in two disjoint parts $S_D = S \cap D_i$ and $S_I = S - S_D$. The indices in $S_D$ (resp. $S_I$) are exactly the indices in $S$ of events that are not independent (resp. that are independent) from $A_i$. We have, by the chain rule of conditional probabilities:
$$
\begin{align*}
    \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right)
   &= \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{k \in S_D} \overline{A_k} \land \bigwedge\limits_{l \in S_I} \overline{A_l} \right) \\
   &= \frac{\mathbb{P}\left(A_i \land \bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}\\
   &\le \frac{\mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}
\end{align*}
$$
Since the events $A_i, A_l$ , $l \in S_I$, are mutually independent, we have:
$$
\begin{align*}
    \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right)
   &\le \frac{\mathbb{P}\left(A_i \right)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}\\
   &\le \frac{x_i \cdot \prod_{j\in D_i} (1-x_j)}{\mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)}
\end{align*}
$$
Regarding the denominator, denoting $S_D = \{k_1,\dots,k_s\}$, and $S_D^r = \{k_1,\dots,k_r\}$ for any $1 \le r \le s$, we have
$$
\begin{align*}
    \mathbb{P}\left(\bigwedge\limits_{k \in S_D} \overline{A_k} ~\bigg|~ \bigwedge\limits_{l \in S_I} \overline{A_l} \right)
   &= \prod_{r = 1}^s \mathbb{P}\left( \overline{A}_{k_r} ~\bigg|~ \bigwedge\limits_{l \in S_I \sqcup S_D^r} \overline{A_l} \right)\\
&= \prod_{r = 1}^s \left(1 - \mathbb{P}\left( A_{k_r} ~\bigg|~ \bigwedge\limits_{l \in S_I \sqcup S_D^r} \overline{A_l} \right)\right) \\
      &\ge \prod_{r = 1}^s (1 - x_{k_r}) = \prod_{j \in S_I} (1 - x_j).
\end{align*}
$$
The last inequality follows from the induction hypothesis. We conclude that
$$
\begin{align*}
    \mathbb{P}\left(A_i ~\bigg|~ \bigwedge\limits_{j \in S} \overline{A_j} \right)
   &\le x_i \cdot \frac{\prod_{j \in D_i} (1 - x_j)}{\prod_{j \in S_I} (1 - x_j)} \\
      &\le x_i ~~~ \text{ since } S_I \subseteq D_i.
\end{align*}
$$
This concludes the induction proof.

We apply this inequality to prove the Lovász lemma. Indeed, we have
$$
\begin{align*}
\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right)
&= \prod_{i=1}^n \left( 1 - \mathbb{P}\left( A_i ~\bigg|~ \bigwedge\limits_{1 \le j < i} \overline{A}_j \right)\right) \\
        &\ge \prod_{i=1}^n (1-x_i).
\end{align*}
$$

2. Other formulations

The previous formulation is quite general. A restricted formulation is given by the following:

Assume $\mathbb{P}(A_i) \le p < 1$ for all $1 \le i \le n$. Let $d$ be the maximum size of the dependency sets $D_i$'s. If $e\cdot p\cdot (d+1) \le 1$, then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge (1- \frac{1}{d+1})^n > 0.$$

Indeed,let's take $x_i = 1/(d+1)$. It suffices to prove that
$$
p \le \frac{1}{d+1} \cdot \left(1-\frac{1}{d+1}\right)^d
$$
But we have
$$
\begin{align*}
\left(1-\frac{1}{d+1}\right)^d \ge \left(1-\frac{1}{d}\right)^d
\ge \frac{1}{e}
\end{align*}
$$

Another restricted formulation is given as follows:

If $\sum_{j \in D_i} \mathbb{P}(A_j) \le 1/4$ for all $i$, then $$\mathbb{P}\left(\bigwedge\limits_{i=1}^n \overline{A}_i\right) \ge \prod_{i=1}^n \left(1- 2\mathbb{P}(A_i)\right) > 0.$$

Indeed, let's take $x_i = 2\mathbb{P}(A_i)$. The condition imposes that each $x_i \in [0,1]$. One can easily shows by induction on the size of $D_i$ that
$$
\begin{align*}
\prod_{j \in D_i} \left(1- 2\mathbb{P}(A_j)\right) &\ge 1 - \sum_{j\in D_i} 2\mathbb{P}(A_j) \\
&\ge \frac{1}{2}
\end{align*}
$$
Therefore, $\mathbb{P}(A_i) \le x_i \prod_{D_i} (1-x_j)$ is satisfied for all $i$.

Gibbs inequality applied to binomial inequality

2015-12-03T07:12:00.000-08:00

1. Gibbs inequality

Let $P = \{p_x\}_{x \in X}$ and $Q = \{q_x\}_{x \in X}$ be two probability distributions on a finite set $X$. I simply assume that the $q_x$ are all positive, $q_x > 0$. The Gibbs inequality states that
$$
- \sum_{x \in X} p_x \cdot \log p_x \le - \sum_{x \in X} p_x \cdot \log q_x
$$
with equality iff the two distributions are identical. We use the convention that $x\cdot \log x = 0$ if $x = 0$. This result follows from the classic technique of convex inequality (a.k.a Jensen's inequality). Indeed, the function $f(x) = x\cdot \log x$ is convex on the positive reals, so
$$
\begin{align*}
    \sum_x p_x \cdot \log \frac{p_x}{q_x} &= \sum_x q_x \cdot \frac{p_x}{q_x} \cdot \log \frac{p_x}{q_x} \\
                &= \sum_x q_x \cdot f\left( \frac{p_x}{q_x}\right) \\
                &\ge f\left(\sum_x q_x \frac{p_x}{q_x}\right) \\
                &\ge f(1) = 0
\end{align*}
$$

2. Binomial and entropy

Varying the probability distributions in the Gibbs inequality can yield quite surprising result. For instance, let $n$ be a positive integer, and $0 \le p \le 1$ a real number such that $n\cdot p$ is an integer. Then
$$
\begin{equation*}
\binom{n}{n p} \le 2^{n\cdot h(p)}
\end{equation*}
$$
where $h(p) = - p\cdot \log - (1-p)\cdot \log (1-p)$. The function $h$ represents the entropy of a coin with bias $p$. The intuition behind this inequality is the following: to identify a subset of $np$ elements in a set of $n$ elements $\{1,2,\dots,n\}$, you need at least $\log \binom{n}{np}$ bits. Now, there is an approximate procedure to identify such a subset: for each element $i = 1,2,\dots,n$, you toss the bias coin, and add the element $i$ to your subset iff the coin turns, e.g., head. With such a procedure, on average, you identify a subset of size $np$, and you could associate with this subset the corresponding sequence of coin tosses as an identifier. By Shannon's source coding theorem, you can store such sequences of coin tosses using about $n\cdot h(p)$ bits. The inequality states that the latter procedure uses (slightly) more bits than necessary.

The proof goes as follows. Let $X_1,\dots,X_n$ be independent random variables following the same distribution
$$
\begin{align*}
\mathbb{P}(X_i = 1) &= p \\
\mathbb{P}(X_i = 0) &= 1-p
\end{align*}
$$
Let $X = (X_1,\dots,X_n)$ be the joint variable, with values in $\{0,1\}^n$. Consider also the uniform random variable $U$ with values in the subset of vectors $u \in \{0,1\}^n$ having exactly $k = np$ entries set to $1$ (and the others to $0$). We denote by $|u|$ the number of entries set to $1$ in $u$. The probability distribution of $U$ is given by, for any $u \in \{0,1\}^n$,
$$
\begin{equation*}
    \mathbb{P}(U = u) = \left\{ \begin{array}{cc}
                                                   0 & \text{ if } |u| \ne k\\
                                                   \binom{n}{k}^{-1} & \text{ otherwise}
                                                   \end{array}
                                          \right.
\end{equation*}
$$

Applying the Gibbs inequality to the distributions of $X$ and $U$, we get
$$
\begin{align*}
    \sum_{u \in \{0,1\}^n} - \mathbb{P}(U=u) \cdot \log \mathbb{P}(U = u)
    &\le
    \sum_{u \in \{0,1\}^n} - \mathbb{P}(U=u) \cdot \log \mathbb{P}(X = u)\\
\end{align*}
$$
$$
\begin{align*}
    \log \binom{n}{k}
    &\le
    \sum_{u,~ |u|=k} - \binom{n}{k}^{-1} \cdot \log p^k(1-p)^{n-k}\\

&\le -k \cdot \log p -(n-k)\cdot\log (1-p) \\
   &\le n \cdot h(p) ~~~~\text{ (since } k = np \text{)}.
\end{align*}
$$

3. The bound is quite tight

The bound on $\binom{n}{np}$ is quite tight. Indeed, Stirling's formula states that:
$$
n! \sim \sqrt{2\pi n}\cdot n^n \cdot e^{-n}
$$
Applying this formula to the binomial, we get (with $q = 1-p$)
$$
\begin{align*}
    \binom{n}{np} &= \frac{n!}{(np)!(nq)!} \\
                               &= \frac{\sqrt{2\pi n}\cdot n^n \cdot e^{-n}}{\sqrt{2\pi n p}\cdot (np)^{np} \cdot e^{-np}\cdot \sqrt{2\pi nq}\cdot (nq)^{nq} \cdot e^{-nq}}\\
&= \frac{2^{n\cdot h(p)}}{\sqrt{2\pi \cdot n \cdot p\cdot (1-p)}}.
\end{align*}
$$
Taking the logarithm of both sides, we see that
$$
\frac{1}{n} \log \binom{n}{np} \rightarrow h(p) \text{ as } n \rightarrow \infty
$$
This means that, as $n$ gets larger and larger, the ``coin-tossing procedure'' to identify a subset of size $np$ in $n$ gets more and more precise, in the sense that it uses the optimal number of bits $\log \binom{n}{np} \simeq n \cdot h(p)$.

Weird binomial identities

2015-12-01T14:12:00.000-08:00

In this post, I present some relatively weird binomial identities. I will start by the symmetric case, and then consider the twisted case.

1. Symmetric case

We prove the following identity
$$
C_{l,l}(y) \underset{def}{=} \sum_{i=0}^l \binom{l+i}{l} y^i = \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y}\cdot\frac{\rho^l}{2}\cdot\left( 1+\frac{1}{\rho}\right)^{2l+1}
$$
where $y$ is a formal variable, and $\rho = y/(1-y)$. This identity follows from the classical method of coefficient identification.

Indeed, let's introduce a second variable $x$ and consider the polynomial
$$
P = \sum_{i=0}^l y^i \cdot (1+x)^{l+i}
$$
By applying the binomial formula to $(1+x)^{l+i}$, we see that the coefficient of $x^l$ in $P$ is
$$
C_{l,l}(y) = \sum_{i=0}^l \binom{l+i}{l} y^i
$$

On the other hand, in $P$, we have a geometric series
$$
\begin{align}
    P &= \sum_{i=0}^l y^i (1+x)^{l+i} \\
       &= (1+x)^l \cdot \sum_{i=0}^l (y+y x)^i \\
       &= (1+x)^l \cdot \frac{1-y^{l+1}(1+x)^{l+1}}{1-y-y x} \\
       &= \left( \frac{1}{1-y} (1+x)^l - \frac{y^{l+1}}{1-y} (1+x)^{2l+1} \right) \cdot \frac{1}{1-\rho x} \\
       &= \left( \frac{1}{1-y} (1+x)^l - \frac{y^{l+1}}{1-y} (1+x)^{2l+1} \right) \cdot \sum_{s \ge 0} \rho^s x^s
\end{align}
$$

From this expression, the coefficient of $x^l$ is given by the convolution
$$
\begin{align}
C_{l,l}(y) &= \sum_{s = 0}^l \left( \frac{1}{1-y} \binom{l}{l-s} - \frac{y^{l+1}}{1-y} \binom{2l+1}{l-s}\right) \cdot \rho^s \\
        &= \frac{1}{1-y} \cdot \sum_{s = 0}^l \binom{l}{l-s} \rho^s - \frac{y^{l+1}}{1-y} \sum_{s=0}^l \binom{2l+1}{l-s} \rho^s \\
        &= \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y} \rho^l \sum_{s=0}^l \binom{2l+1}{l-s} \left(\frac{1}{\rho}\right)^{l-s} \\
\end{align}
$$

Thanks to the symmetry $\binom{2l+1}{l-s} = \binom{2l+1}{l + 1 + s}$, we see that the sum in the last expression is
$$
   \frac{1}{2} \sum_{s = 0}^{2l+1} \binom{2l+1}{s} \left(\frac{1}{\rho}\right)^s = \frac{1}{2} \left(1+\frac{1}{\rho}\right)^{2l+1}
$$

Therefore, writing the two expressions for the coefficient yields the claim
$$
\sum_{i=0}^l \binom{l+i}{l} y^i = \frac{(1+\rho)^l}{1-y} - \frac{y^{l+1}}{1-y}\cdot\frac{\rho^l}{2}\cdot\left( 1+\frac{1}{\rho}\right)^{2l+1}
$$

In particular, with $y = 1/2$, we get
$$
   \sum_{i=0}^l \binom{l+i}{l} \frac{1}{2^i} = 2^l
$$

2. Twisted case

We now consider the twisted case
$$
C_{a,b}(y) \underset{def}{=} \sum_{i=0}^a \binom{b+i}{b} y^i.
$$
The method is exactly the same except that we compute the coefficient of $x^b$ in the polynomial
$$
\sum_{i=0}^a y^i (1+x)^{b+i}.
$$
We get
$$
C_{a,b}(y) = \frac{(1+\rho)^b}{1-y} - \frac{y^{a+1}}{1-y} \cdot \sum_{s=0}^b \binom{a+b+1}{b-s} \rho^s.
$$
Unfortunately, the last sum does not simplify as before because of the asymmetry $a \ne b$ (a priori).

However, we can measure this twist by focusing on the quantity $y^b C_{a,b}(y) + y^a C_{b,a}(y)$. Regarding the parts containing the sums, we have
$$
\begin{align}
          \sum_{s=0}^b &\binom{a+b+1}{b-s} \rho^s + \sum_{s=0}^a \binom{a+b+1}{a-s} \rho^s \\
     &= \rho^b \cdot \sum_{s=0}^b \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s + \rho^a \cdot \sum_{s=0}^a \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s \\

     &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \rho^b \cdot \sum_{s=b+1}^{a+b+1} \binom{a+b+1}{s} \left(\frac{1}{\rho}\right)^s + \dots \\

     &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \rho^b \cdot \sum_{s=0}^{a} \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s + \dots \\

     &= \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} + (\rho^a - \rho^b) \cdot \sum_{s=0}^{a} \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s \\ \end{align}
$$

Whence the identities
$$
   y^b C_{a,b}(y) + y^a C_{b,a}(y) \\
     = \frac{y^a(1+\rho)^a + y^b (1+\rho)^b}{1-y}
      - \frac{y^{a+b+1}}{1-y} \rho^b \left(1+\frac{1}{\rho}\right)^{a+b+1} - \frac{y^{a+b+1}}{1-y} (\rho^a - \rho^b) \sum_{s=0}^a \binom{a+b+1}{a-s} \left(\frac{1}{\rho}\right)^s \\

     = \frac{y^a(1+\rho)^a + y^b (1+\rho)^b}{1-y}
     - \frac{y^{a+b+1}}{1-y} \rho^a \left(1+\frac{1}{\rho}\right)^{a+b+1} - \frac{y^{a+b+1}}{1-y} (\rho^b - \rho^a) \sum_{s=0}^b \binom{a+b+1}{b-s} \left(\frac{1}{\rho}\right)^s
$$

In particular, if $y = 1/2$, we have $\rho = 1$, and the twist is canceled so that
$$
\sum_{i=0}^a \binom{b+i}{b} \frac{1}{2^{b+i}} + \sum_{i=0}^b \binom{a+i}{a} \frac{1}{2^{a+i}} = 2
$$
which is, somewhat, funny.

Question. Is there a combinatorial proof of these identities ?

Explicit computations of some direct limits of ordered groups

2015-11-20T13:02:00.001-08:00

The basic definition of direct limits of groups is not so difficult. But concrete computations of these turn out not to be so easy for beginners (like me). I present in this post some computations of this kind.

1. Direct limit - definition and construction

The general definition of direct limit can be found on wikipedia. Here, I will focus on direct limits of partially ordered abelian groups (poa-groups). Recall that a poa-group is simply an abelian group $(G,+,0)$ equipped with a partial order $\le$ which is translation invariant: $a \le b$ implies $a+g \le b+g $ for all $g$. The subset $G_+$ of group elements $a \ge 0$ is the positive cone of $G$. A morphism $f : G \rightarrow H$ of poa-groups is a group homomorphism that agrees with the partial orders: $a \le b$ in $G$ implies $f(a) \le f(b)$ in $H$.

The ingredients for a direct limit are: a directed set $(I,\le)$ of ``indices'', a family $(G_i)_{i \in I}$ of poa-groups indexed by $I$, and a family $f_{i,j} : G_i \rightarrow G_j$ of morphisms for every pair $i \le j$ in $I$. These morphisms have to satisfy a compatibility condition, namely, for any $i \le j \le k$ in $I$, $f_{i,k} = f_{j,k} \circ f_{i,j}$. This data forms what is called a direct system of poa-groups. Then there is a universal poa-group $L$ and morphisms $\phi_i : G_i \rightarrow L$ such that for every pair $i \le j$ in $I$, $\phi_j = f_{i,j} \circ \phi_i$. The term universal refers to the usual property in category theory of ``being the most general entity satisfying the given constraints''. Thanks to this universality, the poa-group $L$ is unique up to isomorphism, and is usually denoted by
$$
L = \lim G_i \xrightarrow{f_{i,j}} G_j
$$
Because I want to perform concrete computations, I won't insist on the universal characterization. Instead, I present a (classical) explicit construction of $L$. We first take the disjoint union $U = \bigsqcup_{i \in I} G_i$. As a set, $U$ is made of elements $(i,a)$ with $i \in I$ and $a \in G_i$. We now define an equivalence relation: $(i,a) \sim (j,b)$ iff there exists $k \ge i,j$ such that $f_{i,k}(a) = f_{j,k}(b)$. Intuitively, two elements are equivalent iff they eventually agree. We denote by $[i,a]$ the equivalence class of $(i,a)$. The poa-structure is defined as follows:

The zero element is defined by $0 = [i,0]$ for any $i \in I$. The choice of $i$ does not matter.
The addition is defined by $[i,a] + [j,b] = [k, f_{i,k}(a) + f_{j,k}(b)]$ for some $k \ge i,j$. The choice of the representatives and $k$ does not matter.
The partial order is defined by: $[i,a] \le [j,b]$ iff for some $k \ge i,j$, $f_{i,k}(a) \le f_{j,k}(b)$ in $G_k$. Again, the choice of the representatives and $k$ does not matter.

In the following examples, we will consider an even more restricted settings. Indeed, any poa-group morphism $f : G \rightarrow G$ yields a direct system $f_{i,j} : G_i \rightarrow G_j$ where the directed set is the set of natural integers $I = \mathbb{N}$ (with the usual order), each $G_i$ is a copy of $G$, and $f_{i,j} = f^{j-i}$ is the $j-i$-th iterate of $f$.By ``computing the direct limit'', I mean finding a poa-group isomorphic to the direct limit, but which is easier to work with.

2. Dyadic rationals

We consider the direct limit generated by the multiplication by $2$ on integers, denoted by $\mathbb{Z} \xrightarrow{2} \mathbb{Z}$.
$$
L = \lim \mathbb{Z} \xrightarrow{2} \mathbb{Z} \xrightarrow{2} \dots
$$
The intuition goes as follows. By the definition given above, we have $[i,a] = [i+1,2 \cdot a]$, i.e., each time we move one step forward, we multiply the data by $2$. Therefore, intuitively, moving one step backward amounts to dividing by $2$. Abusing the notations, we could write $[i,a] = [i-1,a/2]$, and thus $[i,a] = [0,a/2^i]$. This suggests considering the poa-group $\mathbb{Z}[\frac{1}{2}]$ of dyadic rationals:

Its elements are the fractions $\frac{a}{2^i}$ in $\mathbb{Q}$ with $a \in \mathbb{Z}$ and $i \in \mathbb{Z}$.
The poa-structure is the one induced by $\mathbb{Q}$.

We now define $\phi : \mathbb{Z}[\frac{1}{2}] \rightarrow L$ by
$$
\phi : \frac{a}{2^i} \mapsto [i,a]
$$
We show that $\phi$ is an isomorphism of poa-groups. First, it is well defined: if $a/2^i = b/2^j$ in $\mathbb{Z}[\frac{1}{2}]$, then for any $k \ge i,j$, we have $2^{k-i} \cdot a = 2^{k-j}\cdot b$, whence $[i,a] = [j,b]$. Second, $\phi$ agrees with addition since
$$
\frac{a}{2^i} + \frac{b}{2^j} = \frac{2^{k-i}\cdot a + 2^{k-j}\cdot b}{2^k}
$$
Third, $\phi$ agrees with the partial order since
$$
   \frac{a}{2^i} \le \frac{b}{2^j} \Leftrightarrow \frac{2^{k-i}\cdot a}{2^k} \le \frac{2^{k-j}\cdot b}{2^k}
$$
Finally, $\phi(a/2^i) = 0 = [i,0]$ implies that $a = 0$. Since $\phi$ is obviously surjective, $\phi$ is an isomorphism of poa-groups.

3. ``Fibonacci'' integers

I do not know if this name is appropriate, but it turns out that the construction below is related to the famous Fibonacci sequence; yet, I will not cover this topic here.

Consider the poa-group $\mathbb{Z}^2$ with $(a,b) \le (c,d)$ iff $a \le b$ and $c \le d$, and the multiplication $\mathbb{Z}^2 \xrightarrow{A} \mathbb{Z}^2$ by the matrix
$$
A = \left(\begin{array}{cc}
                   1 & 1 \\
                   1 & 0 \\
                   \end{array}\right)
$$
We compute the direct limit $L = \lim \mathbb{Z}^2 \xrightarrow{A} \mathbb{Z}^2 \dots$. As in the previous section, the idea is consider the element $[k,u]$ as the informal element $u/A^k$. To give a coherent meaning to this element, we ``notice'' the following. Let $\tau = (1+\sqrt{5})/2$ denote the golden mean. We have the $\tau^2 = \tau + 1$. Therefore, the group $G = \mathbb{Z}[\tau]$ of integral combinations of powers of $\tau$ decomposes as $G = \mathbb{Z}\tau + \mathbb{Z}$. If we identify the vectors $(1,0)$ and $(0,1)$ in $\mathbb{Z}^2$ with $\tau$ and $1$ in $\mathbb{Z}[\tau]$ respectively, then multiplication by $A$ on $\mathbb{Z}^2$ translates into multiplication by $\tau$ in $\mathbb{Z}[\tau]$. Also, the order structure on $G$ is defined by: $\tau\cdot a+ b \le \tau\cdot c + d$ iff $a \le c$ and $b \le d$. By the matrix form, we see that multiplication by $\tau$ agrees with this order: $u \le v$ implies $\tau\cdot u \le \tau\cdot v$. Thanks to this trick, the direct limit can be written (is isomorphic to)
$$
L = \lim G \xrightarrow{\tau} G \dots
$$
and we can compute it as in the case of dyadic integers. We consider the poa-group defined as follows:

Its elements are $\frac{\tau\cdot a + b}{\tau^k}$ (the quotient being taken in $\mathbb{R}$) with $a,b,k \in \mathbb{Z}$.
Its poa-structure is the one induced by $\mathbb{R}$.
Since $1/\tau = \tau-1$, this poa-group is actually $\mathbb{Z}[\tau] = \tau\mathbb{Z} + \mathbb{Z}$ with the order induced by the one of $\mathbb{R}$. Note that it is important to distinguish $G$ and $\mathbb{Z}[\tau]$ although they have the same underlying group structure. The only difference is between their order relations.

We then define the function $\phi : \mathbb{Z}[\tau] \rightarrow L$ by
$$
\frac{\tau \cdot a + b}{\tau^k} \mapsto [k, \tau\cdot a + b]
$$
As in the case of dyadic integers, we verify that $\phi$ is a poa-isomorphism. First, it is well defined: if $(\tau\cdot a + b)/\tau^k = (\tau\cdot c + d)/\tau^l$, then $\tau^{m-k}\cdot(\tau\cdot a + b) = \tau^{m-l}\cdot(\tau\cdot c + d)$ for some $m \ge k,l$, and $[k,\tau\cdot a+b] = [l,\tau\cdot c + d]$. Second, $\phi$ agrees with addition since
$$
\frac{\tau\cdot a + b}{\tau^k} + \frac{\tau\cdot c + d}{\tau^l} = \frac{\tau^{m-k}\cdot(\tau\cdot a + b) + \tau^{m-l}\cdot(\tau\cdot c + d)}{\tau^m}.
$$
The fact that $\phi$ agrees with the order is less trivial. Since $\phi$ agrees with addition, it suffices to check that $\phi$ sends the positive cone of $\mathbb{Z}[\tau]$ to the positive cone of $L$. This amounts to prove that if $\tau\cdot a + b \ge 0$ in $\mathbb{R}$ with $a,b \in \mathbb{Z}$, then there exists $k\in \mathbb{Z}$ and two non-negative integers $a',b' \in \mathbb{N}$ such that
$$
\tau\cdot a + b = \frac{\tau\cdot a' + b'}{\tau^k} ~~~~(\bigstar)
$$ To prove this, we shall turn back to the matrix form. In the base $(\tau,1)$, multiplication by $\tau$ is modeled by the matrix $A$. Consider the action of $A$ on the plane $\mathbb{R}^2$. Let $\Delta$ denote the line $\tau\cdot x + y = 0$, and $\Delta^+$ the half-plane $\tau\cdot x + y \ge 0$. The proof of $(\bigstar)$ amounts to show that iterating $A$ on any point of $\Delta^+$ eventually leads to a point of the positive quadrant $\{(x,y)~|~ x,y \ge 0\}$.

Basic matrix algebra shows that the eigenvalues of $A$ are $\tau$ and $\overline{\tau} = (1-\sqrt{5})/2$. The eigenspace associated with $\tau$ is $\nabla ~:~ \overline{\tau}\cdot x + y = 0$, while the eigenspace associated with $\overline{\tau}$ is $\Delta ~:~ \tau\cdot x + y = 0$. We have $|\tau| > 1$ and $|\overline{\tau}| < 1$, so $A$ dilates $\nabla$, while $A$ contracts $\Delta$. By Figure 1, we see that iterating $A$ sufficiently enough moves any point of the half-plane into the positive quadrant.

Fig. 1 - Action of $A$

Therefore, we just showed that the direct limit $L$ is isomorphic to $\mathbb{Z}\tau + \mathbb{Z}$, with positive cone $\{a\cdot\tau + b \ge 0 ~|~ a,b \in \mathbb{Z}\}$.

Distributed interpretation of the constructive Lovász Local Lemma

2015-11-03T07:48:00.001-08:00

I follow the paper A Kolmogorov complexity proof of the Lovász local lemma for satisfiability, by J. Messner and T. Thierauf. Let $\phi$ be a $k$-CNF formula with $n$ variables, and $m$ clauses, each clause containing exactly $k$ literals (i.e., a variable, or the negation of a variable). The goal is to explicitly build an assignment of truth values to the variables so that each clause contains at least one literal evaluating to true.

1. Original algorithm

We define the graph $\Gamma'$, having the clauses of $\phi$ as nodes. In $\Gamma'$, two clauses $C$ and $D$ define an edge if $C$ has a literal that occurs negated in $D$. We denote by $d$ the maximum degree of $\Gamma'$.

The symmetric version of the Lovász local lemma states that $\phi$ is satisfiable if the clauses do not "interact too much", i.e., more precisely
$$
\frac{d^d}{(d-1)^{(d-1)}} \leq 2^k - 1
$$
Moser and Tardos gave a constructive proof of the previous result, in the sense that they defined a (randomized) algorithm that (efficiently) produces a satisfying assignment for $\phi$ within $O(m)$ time steps. This algorithm runs as follows:

Pick a random assignment for the variables of $\phi$
While some clause in $\phi$ is not satisfied

Choose (deterministically) an unsatisfied clause $C$
Reassign the variables in $C$ independently at random

Output the satisfying assignment

2. Reformulation

We now proceed to the interpretation of the proof from the point of view of distributed algorithms. We define an event $e$ as a pair $(C,b)$ where $C$ is a clause, and $b$ is a bit assignment of the variables in $C$. Two events are said to be independent if their clauses are not neighbours in $\Gamma'$.

Let $\gamma$ be a configuration, i.e., a bit assignment of the $n$ variables in the formula. The event $e = (C,b)$ is enabled in $\gamma$ if $\gamma$ does not satisfy $C$, i.e., all the literals in $C$ evaluate to false. We say that the event is applied to $\gamma$ when the variables of clause $C$ are reassigned according to $b$. If $\gamma'$ is the resulting configuration, we denote by $\gamma \xrightarrow{e} \gamma'$ this transition.

We now introduce a normal form for executions inspired by the work of Cartier and Foata in trace monoids. The idea stems from the fact that if two events $e,e'$ are independent and enabled in $\gamma$, then these events can be applied in any order, and yield the same configuration. One can model a schedule of events as a word $e_0\cdot e_1 \dots e_{s-1}$ in the trace monoid generated by the events, i.e., the quotient of the free monoid over the event alphabet by the commutativity relations induced by event independence. Cartier and Foata showed that any word $w$ in the trace monoid can be uniquely represented by the following normal form
$$
V_0 | \dots | V_{t-1}
$$
where each $V_i$ is composed of pairwise independent events, and each event in $V_{i+1}$ depends on some event in $V_i$. An execution can then be defined as a pair $(\gamma_0, S)$ where $\gamma_0$ is the initial configuration, and $S$ is a schedule given in the normal form above. If $S$ contains $s$ events, then the execution has consumed exactly $n + s\cdot k$ random bits. The normal form can be depicted as a forest:

Fig. 1 Compact representation of a schedule.

The dependence relations are subsumed by the graph $\Gamma'$. Each dot on a line $C_i$ represents an event with clause $C_i$. The bit assignments are omitted in the picture. The arrows represent dependences. In particular, there are at most $d$ arrows out of any node. Let $[S]$ denote the the schedule $S$ without the bit assignments, i.e., $[S]$ retains only the causal order of clauses, pretty much as depicted in Fig. 1. We will refer to $[S]$ as the causal structure of $S$.

I don't know if Messner, Thierauf were aware of this, but it turns out that the forest construction they give in their paper corresponds almost (I have not checked it thoroughly) to the method of Cartier and Foata for computing a normal form of a word in a trace monoid.

3. Final argument

The crucial point in the proof of Messner and Thierauf consists in the following observation: a finite execution $(\gamma_0,S)$ is entirely determined by the final configuration $\gamma_{t-1}$ and the causal structure $[S]$.

Indeed, if you know $\gamma_{t-1}$ and the last group $U_{t-1}$ of pairwise independent clause in $[S]$, then the bit assignment associated with any clause $C \in U_{t-1}$ is simply the restriction of $\gamma_{t-1}$ to the variables of $C$ since they were lastly modified. In particular, $\gamma_{t-2}$ is obtained from $\gamma_{t-1}$ by simply inverting the value of every variable occurring in one of the clauses of $U_{t-1}$.

The consequence of this is that the $n + s\cdot k$ (Kolmogorov) random bits can be computed from the pair $(\gamma_{t-1},[S])$. The classic compression method in algorithmic information theory yields
$$
n + s\cdot k \leq K(\gamma_{t-1},[S]) + O(1)
$$
where $K(\cdot)$ is the Kolmogorov complexity. An upper bound for the right-hand side is given by $n$ (number of bits to encode $\gamma_{t-1}$), and the logarithm of the number of causal structures $[S]$ with exactly $s$ (bit-free) events. Combinatorics yields
$$
n + s\cdot k \leq n + (d\cdot s + m) \cdot h\left(\frac{1}{d}\right)+ O(1)
$$
where $h(p) = - p\cdot \log p - (1-p)\cdot \log (1-p)$. The bound on $d$ gives $s = O(m)$.

[1] Robin A. Moser, Gábor Tardos, A constructive proof of the Lovász local lemma, arxiv.org/abs/0903.0544, 2009

[2] Jochen Messner, Thomas Thierauf, A Kolmogorov complexity proof of the Lovász local lemma for satisfiability, Theoretical Computer Science, volume 461, pages 55-64, 2012

[3] Pierre Cartier, Dominique Foata, Problèmes combinatoire de commutation et de réarrangements, Lecture notes in Mathematics, 85, Springer Verlag, 1969

The relativization obstacle

2015-11-02T15:10:00.001-08:00

I was wondering what was exactly the meaning of trying to compare complexity classes relatively to some oracle, and, more precisely, how such an approach may help to compare these classes without oracles. In this post, I will focus on the celebrated result by Baker, Gill and Solovay, stating that:

(BGS) There exist oracles $A$ and $B$ such that $P^A = NP^A$ and $P^B \neq NP^B$.

This result imposes a strong constraint on any attempt to prove or refute $P ~=?~ NP$. Indeed, such a proof has to be non-relativizable. Roughly speaking, a proof $\pi$ of a statement $S$ about complexity classes is said to be relativizable if for any oracle $O$ the same proof $\pi$ proves the statement $S^O$ which is $S$ with all complexity classes being relativized to the oracle $O$. For example, if there were a relativizable proof that $P = NP$, then we would have $P^O = NP^O$ for all oracles $O$; which contradicts the BGS theorem above.

The lesson to retain is the following:

For any statement $S$ about complexity classes, if there exists an oracle $O$ such that $S^O$ holds (resp. does not hold), then there are no relativizable proofs that $S$ does not hold (resp. holds).

But what does a relativizable proof looks like ? Basically, it is a proof that combines composition of functions, and use of universal machines.

Let's try to prove that $P$ is a strict subset of $NP$ (sic!). More precisely, let's try to prove this claim using diagonal/simulation arguments as in the proof of the time hierarchy theorem. We could reduce (polynomial-time reductions) some $NP$-complete problem to some problem $H$ consisting of pairs $(i,n)$ such that the deterministic machine $i$ accepts input $n$ within $poly(|n|)$, plus, perhaps, additional properties. Then, we would assume that $H$ belongs to $P$, and try to derive a contradiction. We would have a deterministic polynomial-time machine $F$ which accepts $(i,n)$ if $(i,n)$ belongs to $H$, or rejects it otherwise. Then we would build a machine $K$ that satisfies the specification of $H$ on input $i$ if $F(i,i) = 0$, or does something contradicting the specification of $H$ otherwise. If $K$ on input $K$ satisfies the specification of $H$, then $F(K,K) = 0$, i.e., the machine $K$ on input $K$ does not satisfy the specification of $H$. If $K$ on input $K$ does not satisfy the specification of $H$, then $F(K,K) = 1$, i.e., the machine $K$ on input $K$ does satisfy the specification of $H$. Whence the 1'000'000 dollar contradiction.

The issue is, if we look at the proof above, we see that we can replace each Turing machine $M$ involved by the same machine $M^O$ augmented with some oracle $O$, without affecting the flow of arguments. In other words, such a proof is relativizing. The BGS theorem above prevents any such proof from solving the $P$ vs $NP$ problem. Bye bye dear 1'000'000 dollars.

Occurrences of the diagonal argument

2015-11-02T13:13:00.001-08:00

Diagonalization occurs in many places, since its first use by Cantor to prove that the real numbers are uncountable. I will try to maintain in this post, a list of the occurrences of this argument. Just for, so to speak: fun.

The main idea is the following. We have a set $I$ of ``codes'', and a set, e.g., N of ``inputs''. Very often, if the goal were not true, then there would be a function $F : I \times N \rightarrow \{0,1\}$ that ``efficiently'' encodes some property about the pair $(i,n) \in I \times N$. One can then build from the diagonal a new code $K \in I$ such that, for all $n \in N$, $F(K,n) = 1 - F(n,n)$. Very often, every element $i \in I$ can be encoded as an element of $N$. The contradiction is obtained with $F(K,K) = 1-F(K,K)$.

Let's see some concrete examples.

    #1. Real numbers are uncountable

Let $I = N = \mathbb{N}$. Assume there exists a function $F : I \times N \rightarrow \{0,1\}$ such that for every real number $x \in [0,1]$, there exists a code $i \in I$ such that $F(i,n)$ is the $n$-th bit of $x$ in its binary expansion.

We now build a member $K \in I$ from the diagonal of $F$ by $F(K,n) = 1-F(n,n)$, i.e., the real number encoded by K is the one whose $n$-th bit is $1-F(n,n)$. But then the $K$-th bit of $K$ is $F(K,K) = 1 - F(K,K)$. Whence a contradiction.

    #2. The Halting problem

Let $I$ be the set of codes of Turing machines. Assume that there exists a machine $F$ which solves the Halting problem, i.e., for any $(i,n)$, $F$ terminates on $(i,n)$, and $F(i,n) = 1$ if the $i$-th machine terminates on input $n$, and $0$ otherwise. Again, we can think of $F$ as an array with rows indexed by Turing machines, and columns indexed by all possible inputs.

We now build a machine $K \in I$ from the diagonal of $F$. On the input $n$ (a code for a Turing machine), the machine $K$ returns (any value is ok) if $F(n,n) = 0$ and runs forever otherwise. Since $K \in I$, the value $F(K,K)$ is defined. But, $K$ does not terminate on input $K$ iff $F(K,K) = 0$ iff $K$ terminates on input $K$. Whence a contradiction. Note that here the expression ``$F(K,n) = 1 - F(n,n)$'' from the introduction is to be understood as: $K$ does not terminate on input $n$ if $F(n,n)$ returns $1$.

    #3. The Time Hierarachy theorem

Let $I$ be the set of (codes of) Turing machines. I am merely restating the proof on wikipedia. Let $f : \mathbb{N} \rightarrow \mathbb{N}$ be a time constructible function. Consider the following problem
$$
H = \{ (i,n),~\text{the machine } i\in I \text{ accepts } n \text{ within } f(|n|) \text{ steps}\}
$$
Here, ``accept'' means returning $1$. It is not difficult to see that $H \in \mathsf{DTIME}(f(m)^3)$ where $m$ denotes the size of $(i,n)$. The goal is to prove that $H$ does not belong to $\mathsf{DTIME}(f(m/2))$. Assume it is the case. Then, there exists a machine $F$ which terminates on each input $(i,n)$ within $f(m/2)$ steps such that $F(i,n) = 1$ if $(i,n) \in H$ or $0$ otherwise.

We now build a machine $K$ from the diagonal of $F$. The machine $K$ accepts the input $i$ if $F$ rejects $(i,i)$, and rejects it otherwise. Let $s$ be the size of $K$, and $m \simeq 2s+1$ be the size of $(K,K)$. The function $F$ takes about $f(m/2) \simeq f(s)$ steps to compute $F(K,K)$. In particular, $K$ terminates within $f(s)$ steps on input $K$. Now, $K$ accepts input $K$ (which it does within $f(s)$ steps) iff $F$ rejects input $(K,K)$ iff $K$ does not accept $K$ within $f(s)$ steps iff $K$ rejects $K$ (which it does within $f(s)$ steps). Whence a contradiction. Here, the expression ``$F(K,n) = 1 - F(n,n)$'' is to be understood as: $K$ rejects (accepts) $n$ if $F$ accepts (rejects) $(n,n)$.

Ranking function for multiset combinations

2015-06-28T03:55:00.001-07:00

Consider a totally ordered finite alphabet $A$. Fix a vector $d = (d_s)_{s \in A}$ of non-negative integers with $n = \sum d_s$. In this post, I would like to present a somewhat simple ranking function for the set $S(A^n,d)$ of vectors $v \in A^n$ satisfying
$$
     d_s = \text{ number of occurrences of } s \text{ in } v
$$
Intuitively, if $A= \{0, \dots, k-1\}$, then $S(A^n,d)$ represents all the ways to choose $d_0$ elements of type $0$, $d_1$ elements of type $1$, etc. such that $d_0 + \dots + d_{k-1} = n$. The size of $S(A^n,d)$ is the multinomial coefficient
$$
    |S(A^n,d)| = \binom{n}{(d_s)_{s \in A}}
$$

By ranking function, I mean an explicit bijection
$$
   r(A^n,d) : S(A^n,d) \rightarrow \{0, \dots, |S(A^n,d)| - 1\}
$$

1. Case $|A| = 2$

Without loss of generality, we assume $A = \{0,1\}$. Let $d = (b,a) = (n-a, a)$. Our goal is to rank the set of vectors $v = (v_0,\dots,v_{n-1}) \in \{0,1\}^n$ having exactly $a$ entries set to $1$ (the others being set to $0$). There are exactly $\binom{n}{a}$ such vectors.

To see what is going on, we will try to build progressively a vector $v \in S(A^n,b,a)$. Let's focus, say, on the last entry $v_{n-1}$. The trick stems from the following observation:

(a) If $v_{n-1} = 1$ then it remains to build a vector $v' = (v_0,\dots,v_{n-2}) \in S(A^{n-1}, b,a-1)$, i.e., a vector of dimension $n-1$ containing exactly $a-1$ entries set to $1$.
(b) If $v_{n-1} = 0$ then it remains to build a vector $v' = (v_0,\dots,v_{n-2}) \in S(A^{n-1}, b,a)$, i.e., a vector of dimension $n-1$ containing exactly $a$ entries set to $1$.

We see that we can define the ranking function by induction (on the dimension $n$). Since there are exactly $\binom{n-1}{a-1}$ vectors of type (a), we can reserve the ranks $0,\dots,\binom{n-1}{a-1}-1$ for them, whereas the remaining ranks are reserved for the vectors of type (b). The induction formula is thus:
$$
r(A^n,b,a)(v) = \left\{
                                  \begin{array}{lr}
r(A^{n-1},b,a-1)(v') & \text{if } v_{n-1} = 1 \\
\binom{n-1}{a-1} + r(A^{n-1},b,a)(v')    & \text{otherwise}
\end{array}
   \right.
$$
Note that, if we had used any 2-element alphabet instead of $\{0,1\}$, the above formula would remain the same.

2. General case

We now consider an arbitrary alphabet size. Let $v \in S(A^n,d)$ and $s_* = \min A$. We can associate with $v$ the two following vectors. First, the vector $u \in \{0,1\}^n$ obtained from $v$ by replacing every entry containing $s_*$ with a $0$, and the others with a $1$. Second, the vector $w \in (A - \{s_*\})^{n - d_{s_*}}$ obtained from $v$ by removing all the entries containing $s_*$ and concatenating the remaining parts (orderly). For instance, if $v = (0,2,1,2,0,1,1,0,2)$ on the alphabet $A = \{0,1,2\}$ then
$u = (0,1,1,1,0,1,1,0,1)$ and $w = (2,1,2,1,1,2)$. Note that we can easily rebuild $v$ from the pair $(u,w)$.

Again, we build the ranking function by induction. The trick stems from the following observation. For each $v \in S(A^n, d)$
$$
   u \in S(\{0,1\}^n, d_{s_*}, n-d_{s_*})\\
   w \in S((A-\{s_*\})^{n-d_{s_*}}, d|_{A - \{s_*\}})
$$

To build $v$, these two vectors can be chosen independently. In particular, once $w$ is fixed, there remain exactly $\binom{n}{d_{s_*}}$ possible choices for $v$. Therefore, we can define
$$
r(A^n,d)(v) = r(D_*)(w) \cdot \binom{n}{d_{s_*}} + r(D_0)(u)
$$
where $D_* = ((A-\{s_*\})^{n-d_{s_*}}, d|_{A - \{s_*\}})$ and $D_0 = (\{0,1\}^n, d_{s_*}, n-d_{s_*})$. Note that the rank $r(D_0)(u)$ corresponds to the case of a $2$-element alphabet.

3. Conclusion

The inductive definitions given above cannot be used directly in real code because the recursive calls will blow the stack. However, it is not difficult to encode these definitions using basic loops.

Funny nonsense about Chaitin $\Omega$ and Riemann $\zeta$

2014-11-07T02:36:00.001-08:00

This post will be brief. I don't know anything about Riemann $\zeta$, except the basic definition and the million dollar prize which attracts the most terrible bounty hunters mathematicians. I hardly know more about Chaitin $\Omega$. Yet, I found that they look similar in some sense. Maybe, it's completely obvious, and trivial. Maybe, it's related to something slightly deeper. Anyway, it looks funny.

The celebrated Riemann $\zeta(s)$ has the following form, for all $s \in \mathbb{C}$ with real part greater than $1$
$$
\begin{equation}
    \zeta(s) = \sum_{n \geq 1} n^{-s}
\end{equation}
$$
Actually, $\zeta(s)$ can be analytically continued over all $\mathbb{C} - \{1\}$, with a unique pole at $1$.

To define Chaitin $\Omega$, we first have to fix some universal prefix-free partial recursive function $U : \{0,1\}^* \rightarrow \{0,1\}^*$. This simply means that we fix some computer whose legal programs, encoded as binary strings, have some sort of "end of file" marker: no legal program is the prefix of another legal program. In addition, this computer is universal, i.e., is able to simulate every other computer. Denoting by $Dom(U)$ the set of legal programs of $U$, the Chaitin's number is
$$
\Omega = \sum_{p \in Dom(U)} 2^{-|p|}
$$
where $|p|$ denotes the length of the program $p$. This number can be interpreted as a probability. Indeed, if we flip infinitely many times a fair coin, then $2^{-|p|}$ is the probability that the prefix of length $|p|$ in the generated infinite binary sequence is equal to $p$. Since $U$ is prefix-free, this also represents the probability that $U$ will execute the program $p$ and halt. Hence, $\Omega$ is the probability that $U$ halts when given the generated infinite binary sequence. It can be shown that the binary expansion of $\Omega$ is a Martin-Löf random sequence.

In [1], Tadaki generalizes Chaitin $\Omega$, and define, for all $0 < D < 1$
$$
\begin{equation}
    \Omega(D) = \sum_{p \in Dom(U)} 2^{-\frac{|p|}{D}}
\end{equation}
$$

Now, I guess everyone has caught it, but let's put bluntly the (not so) funny part. For all (real) $s > 1$
$$
\begin{align}
   \zeta(s) &= \sum_{n \geq 1} \left( 2^{\log n}\right)^{-s} \\
     \Omega(s^{-1}) &= \sum_{p \in Dom(U)} \left( 2^{|p|} \right)^{-s}
\end{align}
$$
Of course, these two quantities are not equal: $\Omega(s^{-1})$ is a sub-sum of $\zeta(s)$ which does not pick any term with $n$ not a power of $2$. We can make them look even more similar by introducing, for every $n$, the function $m(l) = |\{ p \in Dom(U),~ |p| = l\}|$ counting the number of legal programs of length $l$.
$$
\begin{align}
   \zeta(s) &= \sum_{n \geq 1} n^{-s} \\
     \Omega(s^{-1}) &= \sum_{n \geq 1} m(\log n) \cdot n^{-s}
\end{align}
$$
The definition of $\Omega(s^{-1})$ really looks like a form of $\zeta$ regularization of the counting function $m(\log n)$. Actually, we could even define, for any universal (not necessarily prefix-free) function $C$, with $m_C(\log n)$ counting the number of legal programs of $C$ with length $\log n$
$$
\begin{equation}
    \Omega_C(s^{-1}) = \sum_{n \geq 1} m_C(\log n) \cdot n^{-s}
\end{equation}
$$
provided that the real part of $s$ is sufficiently large. In particular, because the domain of $C$ is not necessarily prefix-free, the function may diverge when $s \rightarrow 1$.

Question. Does such a regularization give more insight on random strings ?

EDIT: It turns out that Tadaki has already proposed an analogy [2] between $\Omega(s^{-1})$ and partition functions in statistical physics. Since $\zeta$ naturally arises in statistical physics, the above similarity between $\Omega(s^{-1})$ and $\zeta(s)$ is not a surprise.

EDIT: Truly, there is something going on here. I found this paper [3] by Calude, introducing zeta numbers for Turing machines.

[1] Tadaki, Kohtaro. A generalization of Chaitin's halting probability $\Omega$ and halting self-similar sets. Hokkaido Mathematical Journal 31 (2002), no. 1, 219--253. doi:10.14492/hokmj/1350911778. http://projecteuclid.org/euclid.hokmj/1350911778

[2] Tadaki, Kohtaro. A statistical mechanical interpretation of algorithmic information theory. http://arxiv.org/abs/0801.4194

[3] Cristian S. Calude, Michael A. Stay, Natural halting probabilities, partial randomness, and zeta functions, Information and Computation, Volume 204, Issue 11, November 2006, Pages 1718-1739, ISSN 0890-5401, http://dx.doi.org/10.1016/j.ic.2006.07.003.

Martin-Löf randomness

2014-10-29T10:00:00.001-07:00

In this post, I sum up the basic idea of Martin-Löf randomness. Actually, this post is mainly a series of personal notes on the matter. I decided though to publish it here as it may interests someone else. In particular, for the sake of brevity, I rely on informal definitions of computable functions, Turing machines, recursive enumerable sets, etc. The reader should just think about them as any methods which can be translated in a computer program.

    1. Statistics background

Let's fix a finite alphabet $X$ of size $d$. We denote by $X^n$ the set of sequences of length $n$ on $X$. We denote by $X^*$ (resp. $X^{\omega}$) the set of finite (resp. infinite) sequences on $X$. As usual, the concatenation of words is denoted by $u \cdot v$, or simply $uv$. The length of a finite word $u$ is denoted by $|u|$. We assume that $X$ is given the uniform probability measure $\mu(x) = d^{-1}$. This measure naturally extends to $X^n$, and $X^{\omega}$. For instance,
$$
\begin{equation*}
    \forall n,~ \forall w \in X^n,~ \mu (w \cdot X^{\omega}) = \mu(w_1) \dots \mu(w_n)
\end{equation*}
$$
We are interested in finding what it means for an individual sequence in $X^{\omega}$ to be random. Martin-Löf's definition is inspired by the notion of statistical tests. We start with a basic example. Some of the commonly accepted (or "ought to be") features of random sequences comprise several laws of large numbers. For example, in a "truly" infinite random sequence $S$, the frequency of any symbol $x \in X$ should be close to $\mu(x)$. Put another way, after fixing some arbitrary $k$, we can compute the frequency $freq(x,S \upharpoonright k)$ of the symbol $x$ in the prefix $S \upharpoonright k$ of length $k$, and if we find that this quantity deviates too much from $\mu(x)$, we conclude that the sequence $S$ is not random.

Reformulating this in the statististics jargon, we are designing a test for the null hypothesis "$H_0$: the sequence is random" with the statistics given by the frequency of the symbol $x$ in the prefix of length $k$ of $S$. Obviously, the fact that this statistics deviates from its expected value does not ensure at 100% that the sequence is "not random", but we want to bound the probability of such an error. Hence, we fix a confidence level $\epsilon > 0$, define a function $f(\epsilon)$ and subset $U(\epsilon) \subseteq X^*$ such that, for all $n$,
$$
\begin{align}
    ~&U(\epsilon) = \{ w \in X^*,~ |freq(x, w) - \mu(x)| > f(\epsilon) \} \\
~&\mu(U(\epsilon)) < \epsilon
\end{align}
$$
The function $f$ can be chosen to be the best one satisfying this relation, i.e., any smaller function would increase $\mu(U(\epsilon))$ above $\epsilon$. The equation above means that if $S$ is drawn "randomly", then the probability that the frequency computed from $S \upharpoonright k$ deviates from its expected value by the quantity $f(\epsilon)$ is at most $\epsilon$. In other words, the probability that a randomly selected infinite sequence $S$ is rejected by the test, i.e., the probability that $S \upharpoonright n \in U(\epsilon)$, is bounded by $\epsilon$. Statisticians say that the set $U(\epsilon)$ is the critical region of the test at level $\epsilon$. Actually, a test can be viewed as a subset $U \subseteq \mathbb{R}^+ \times X^*$ with $U(\epsilon) = \{S,~ (\epsilon,S) \in U\}$.

Intuitively, the sequence $S$ is non-random if it is rejected by the test at all levels. Thus $S$ is random if
$$
\begin{equation}
S \in X^{\omega} - \bigcap_{\epsilon > 0} U(\epsilon) \cdot X^{\omega}
\end{equation}
$$

    2. Effective tests

One is rightly dubious about the definition of randomness based solely on the frequency of a given symbol in the prefix of length $n$ of an infinite sequence. Following the original intuition, one would define a random sequence as a sequence that passes any imaginable statistical test. But we cannot consider all the tests, i.e., all the subsets of $\mathbb{R}^+ \times X^*$. The major contribution of Martin-Löf was to focus on the effective tests, that is, roughly speaking, tests that can be performed by a Turing machine.

To do so, instead of indexing the confidence level by $\epsilon \in \mathbb{R}^+$ and without loss of generality, we use the levels $\epsilon_m = 2^{-m}$, $m \in \mathbb{N}$. An effective test is then defined as a recursively enumerable subset $U \subseteq \mathbb{N} \times X^*$ such that the subsets $U(m) = \{w \in X^*,~ (m,w) \in U\}$ satisfies, for all $n,m$
$$
\begin{align}
~&U(m) \supseteq U(m+1)\\
    ~&\mu(U(m)) < \epsilon_m
\end{align}
$$
The major point is the requirement that the set $U$ is recursively enumerable. Roughly speaking, this means that there exists an algorithm that recognizes the inputs $(m,w) \in U$, i.e., that is able to detect sequences that fail the test at level $\epsilon_m$.

To be precise, one also requires that, if $(m,w) \in U$, then for all $n \leq m$, $v \sqsupseteq w$, $(n,v) \in U$. This point comes from the fact we want to test randomness for infinite sequences $S \in X^{\omega}$. If $(m, S \upharpoonright k) \in U$, i.e., $S$ is rejected by the test at level $\epsilon_m$, then it is natural to require that any longer prefix $S \upharpoonright l$, $l \geq k$, is also rejected by the test at less constrained levels $\epsilon_n$, $n \leq m$. Intuitively, since $S \upharpoonright l$ comprises the information in $S \upharpoonright k$, and if $S \upharpoonright k$ provides a clue to consider $S$ as being non-random, then $S \upharpoonright l$ should also cause $S$ to be considered non-random.

3. Universal test and random sequences

It is well known that there exists a universal Turing machine, i.e., a computer that is able to simulate every other computer. In the same spirit, Martin-Löf has proved that there exists a universal effective test $U$ that "simulates" every other effective test. More precisely, if $V$ is any other effective test, then there exists a constant $c$, depending only on $U$ and $V$, such that the critical regions of $U$ and $V$ satisfies, for all $m$,
$$
\begin{equation}
V_{m+c} \subseteq U_m
\end{equation}
$$
Now, let $R$ be the set of random sequences considering the universal test $U$, i.e.
$$
\begin{equation}
R = X^{\omega} - \bigcap_{m} U(m) \cdot X^{\omega}
\end{equation}
$$
Then, by the universal property of $U$, every $S \in R$ is random relatively to any effective test $V$.

A direct consequence of this definition is that $\mu(R) = 1$ since $\mu(U(m)) \rightarrow 0$. In particular, $R$ is dense in $X^{\omega}$. This means that, for any finite sequence $w \in X^*$, there is a random sequence $S \in R$ that extends $w$, i.e., $w \sqsubset S$. Otherwise, $w\cdot X^{\omega} \cap R = \emptyset$, i.e., $w \cdot X^{\omega}$ is included in the complement of $R$, and thus $0 < \mu(w \cdot X^{\omega}) \leq 0$. In other words, from a measure theoretical point of view, $R$ is large.

Finally, one should notice that the set of random sequences $R$ has been defined in terms of the probability distribution $\mu$. Actually, one could also take any probability measure on $X^{\omega}$, and define a set $R_{\mu}$ of infinite sequences random with respect to $\mu$. The only restriction is that $\mu$ must be computable, as otherwise, we cannot define effective tests. The main advantage of Martin-Löf's approach is that it can be easily transposed in many other measured topological space, as long as an appropriate definition of "computable stuff" is available.

1. Per Martin-Löf, The Definition of Random Sequences, Information and Control, 9(6): 602 - 619, 1966.