Friday, November 20, 2015

Explicit computations of some direct limits of ordered groups

The basic definition of direct limits of groups is not so difficult. But concrete computations of these turn out not to be so easy for beginners (like me). I present in this post some computations of this kind.

    1. Direct limit - definition and construction

The general definition of direct limit can be found on wikipedia. Here, I will focus on direct limits of partially ordered abelian groups (poa-groups). Recall that a poa-group is simply an abelian group $(G,+,0)$ equipped with a partial order $\le$ which is translation invariant: $a \le b$  implies $a+g \le b+g $ for all $g$. The subset $G_+$ of group elements $a \ge 0$ is the positive cone of $G$. A morphism $f : G \rightarrow H$ of poa-groups is a group homomorphism that agrees with the partial orders: $a \le b$ in $G$ implies $f(a) \le f(b)$ in $H$.

The ingredients for a direct limit are: a directed set $(I,\le)$ of ``indices'', a family $(G_i)_{i \in I}$ of poa-groups indexed by $I$, and a family $f_{i,j} : G_i \rightarrow G_j$ of morphisms for every pair $i \le j$ in $I$. These morphisms have to satisfy a compatibility condition, namely, for any $i \le j \le k$ in $I$, $f_{i,k} = f_{j,k} \circ f_{i,j}$. This data forms what is called a direct system of poa-groups. Then there is a universal poa-group $L$ and morphisms $\phi_i : G_i \rightarrow L$ such that for every pair $i \le j$ in $I$, $\phi_j = f_{i,j} \circ \phi_i$. The term universal refers to the usual property in category theory of ``being the most general entity satisfying the given constraints''. Thanks to this universality, the poa-group $L$ is unique up to isomorphism, and is usually denoted by
  L = \lim G_i \xrightarrow{f_{i,j}} G_j
Because I want to perform concrete computations, I won't insist on the universal characterization. Instead, I present a (classical) explicit construction of $L$. We first take the disjoint union $U = \bigsqcup_{i \in I} G_i$. As a set, $U$ is made of elements $(i,a)$ with $i \in I$ and $a \in G_i$. We now define an equivalence relation: $(i,a) \sim (j,b)$ iff there exists $k \ge i,j$ such that $f_{i,k}(a) = f_{j,k}(b)$. Intuitively, two elements are equivalent iff they eventually agree. We denote by $[i,a]$ the equivalence class of $(i,a)$. The poa-structure is defined as follows:
  • The zero element is defined by $0 = [i,0]$ for any $i \in I$. The choice of $i$ does not matter.
  • The addition is defined by $[i,a] + [j,b] = [k, f_{i,k}(a) + f_{j,k}(b)]$ for some $k \ge i,j$. The choice of the representatives and $k$ does not matter.
  • The partial order is defined by: $[i,a] \le [j,b]$ iff for some $k \ge i,j$, $f_{i,k}(a) \le f_{j,k}(b)$ in $G_k$. Again, the choice of the representatives and $k$ does not matter.

In the following examples, we will consider an even more restricted settings. Indeed, any poa-group morphism $f : G \rightarrow G$ yields a direct system $f_{i,j} : G_i \rightarrow G_j$ where the directed set is the set of natural integers $I = \mathbb{N}$ (with the usual order), each $G_i$ is a copy of $G$, and $f_{i,j} = f^{j-i}$ is the $j-i$-th iterate of $f$.By ``computing the direct limit'', I mean finding a poa-group isomorphic to the direct limit, but which is easier to work with.

   2. Dyadic rationals

We consider the direct limit generated by the multiplication by $2$ on integers, denoted by $\mathbb{Z} \xrightarrow{2} \mathbb{Z}$.
  L = \lim \mathbb{Z} \xrightarrow{2} \mathbb{Z} \xrightarrow{2} \dots
The intuition goes as follows. By the definition given above, we have $[i,a] = [i+1,2 \cdot a]$, i.e., each time we move one step forward, we multiply the data by $2$. Therefore, intuitively, moving one step backward amounts to dividing by $2$. Abusing the notations, we could write $[i,a] = [i-1,a/2]$, and thus $[i,a] = [0,a/2^i]$. This suggests considering the poa-group $\mathbb{Z}[\frac{1}{2}]$ of dyadic rationals:
  • Its elements are the fractions $\frac{a}{2^i}$ in $\mathbb{Q}$ with $a \in \mathbb{Z}$ and $i \in \mathbb{Z}$.
  • The poa-structure is the one induced by $\mathbb{Q}$.
We now define $\phi : \mathbb{Z}[\frac{1}{2}] \rightarrow L$ by
  \phi : \frac{a}{2^i} \mapsto [i,a]
We show that $\phi$ is an isomorphism of poa-groups. First, it is well defined: if $a/2^i = b/2^j$ in  $\mathbb{Z}[\frac{1}{2}]$, then for any $k \ge i,j$, we have $2^{k-i} \cdot a = 2^{k-j}\cdot b$, whence $[i,a] = [j,b]$. Second, $\phi$ agrees with addition since
  \frac{a}{2^i} + \frac{b}{2^j} = \frac{2^{k-i}\cdot a + 2^{k-j}\cdot b}{2^k}
Third, $\phi$ agrees with the partial order since
   \frac{a}{2^i} \le \frac{b}{2^j} \Leftrightarrow \frac{2^{k-i}\cdot a}{2^k} \le \frac{2^{k-j}\cdot b}{2^k}
Finally, $\phi(a/2^i) = 0 = [i,0]$ implies that $a = 0$. Since $\phi$ is obviously surjective, $\phi$ is an isomorphism of poa-groups.

    3. ``Fibonacci'' integers

I do not know if this name is appropriate, but it turns out that the construction below is related to the famous Fibonacci sequence; yet, I will not cover this topic here.

Consider the poa-group $\mathbb{Z}^2$ with $(a,b) \le (c,d)$ iff $a \le b$ and $c \le d$, and the multiplication $\mathbb{Z}^2 \xrightarrow{A} \mathbb{Z}^2$ by the matrix
    A = \left(\begin{array}{cc}
                   1 & 1 \\
                   1 & 0 \\
We compute the direct limit $L = \lim \mathbb{Z}^2 \xrightarrow{A} \mathbb{Z}^2 \dots$. As in the previous section, the idea is consider the element $[k,u]$ as the informal element $u/A^k$. To give a coherent meaning to this element, we ``notice'' the following. Let $\tau = (1+\sqrt{5})/2$ denote the golden mean. We have the $\tau^2 = \tau + 1$. Therefore, the group $G = \mathbb{Z}[\tau]$ of integral combinations of powers of $\tau$ decomposes as $G = \mathbb{Z}\tau + \mathbb{Z}$. If we identify the vectors $(1,0)$  and $(0,1)$ in $\mathbb{Z}^2$ with $\tau$ and $1$ in $\mathbb{Z}[\tau]$ respectively, then multiplication by $A$ on $\mathbb{Z}^2$ translates into multiplication by $\tau$ in $\mathbb{Z}[\tau]$. Also, the order structure on $G$ is defined by: $\tau\cdot a+ b \le \tau\cdot c + d$ iff $a \le c$ and $b \le d$. By the matrix form, we see that multiplication by $\tau$ agrees with this order: $u \le v$ implies $\tau\cdot u \le \tau\cdot v$. Thanks to this trick, the direct limit can be written (is isomorphic to)
  L = \lim G \xrightarrow{\tau} G \dots
and we can compute it as in the case of dyadic integers. We consider the poa-group defined as follows:
  • Its elements are $\frac{\tau\cdot a + b}{\tau^k}$ (the quotient being taken in $\mathbb{R}$) with $a,b,k \in \mathbb{Z}$.
  • Its poa-structure is the one induced by $\mathbb{R}$.
  • Since $1/\tau = \tau-1$, this poa-group is actually $\mathbb{Z}[\tau] = \tau\mathbb{Z} + \mathbb{Z}$ with the order induced by the one of $\mathbb{R}$. Note that it is important to distinguish $G$ and $\mathbb{Z}[\tau]$ although they have the same underlying group structure. The only difference is between their order relations.
We then define the function $\phi : \mathbb{Z}[\tau] \rightarrow L$ by
    \frac{\tau \cdot a + b}{\tau^k} \mapsto [k, \tau\cdot a + b]
As in the case of dyadic integers, we verify that $\phi$ is a poa-isomorphism. First, it is well defined: if $(\tau\cdot a + b)/\tau^k = (\tau\cdot c + d)/\tau^l$, then $\tau^{m-k}\cdot(\tau\cdot a + b) = \tau^{m-l}\cdot(\tau\cdot c + d)$ for some $m \ge k,l$, and $[k,\tau\cdot a+b] = [l,\tau\cdot c + d]$. Second, $\phi$ agrees with addition since
  \frac{\tau\cdot a + b}{\tau^k} + \frac{\tau\cdot c + d}{\tau^l} = \frac{\tau^{m-k}\cdot(\tau\cdot a + b) + \tau^{m-l}\cdot(\tau\cdot c + d)}{\tau^m}.
The fact that $\phi$ agrees with the order is less trivial. Since $\phi$ agrees with addition, it suffices to check that $\phi$ sends the positive cone of $\mathbb{Z}[\tau]$ to the positive cone of $L$. This amounts to prove that if $\tau\cdot a + b \ge 0$ in $\mathbb{R}$ with $a,b \in \mathbb{Z}$, then there exists $k\in \mathbb{Z}$ and two non-negative integers $a',b' \in \mathbb{N}$  such that
   \tau\cdot a + b = \frac{\tau\cdot a' + b'}{\tau^k} ~~~~(\bigstar)
$$ To prove this, we shall turn back to the matrix form. In the base $(\tau,1)$, multiplication by $\tau$ is modeled by the matrix $A$. Consider the action of $A$ on the plane $\mathbb{R}^2$. Let $\Delta$ denote the line $\tau\cdot x + y = 0$, and $\Delta^+$ the half-plane $\tau\cdot x + y \ge 0$. The proof of $(\bigstar)$ amounts to show that iterating $A$ on any point of $\Delta^+$ eventually leads to a point of the positive quadrant $\{(x,y)~|~ x,y \ge 0\}$.

Basic matrix algebra shows that the eigenvalues of $A$ are $\tau$ and $\overline{\tau} = (1-\sqrt{5})/2$. The eigenspace associated with $\tau$ is $\nabla ~:~ \overline{\tau}\cdot x + y = 0$, while the eigenspace associated with $\overline{\tau}$ is $\Delta ~:~ \tau\cdot x + y = 0$. We have $|\tau| > 1$ and $|\overline{\tau}| < 1$, so $A$ dilates $\nabla$, while $A$ contracts $\Delta$. By Figure 1, we see that iterating $A$ sufficiently enough moves any point of the half-plane into the positive quadrant.

Fig. 1 - Action of $A$

Therefore, we just showed that the direct limit $L$ is isomorphic to $\mathbb{Z}\tau + \mathbb{Z}$, with positive cone $\{a\cdot\tau + b \ge 0 ~|~ a,b \in \mathbb{Z}\}$.


Tuesday, November 3, 2015

Distributed interpretation of the constructive Lovász Local Lemma

I follow the paper A Kolmogorov complexity proof of the Lovász local lemma for satisfiability, by J. Messner and T. Thierauf. Let $\phi$ be a $k$-CNF formula with $n$ variables, and $m$ clauses, each clause containing exactly $k$ literals (i.e., a variable, or the negation of a variable). The goal is to explicitly build an assignment of truth values to the variables so that each clause contains at least one literal evaluating to true.

  1. Original algorithm

We define the graph $\Gamma'$, having the clauses of $\phi$ as nodes. In $\Gamma'$,  two clauses $C$ and $D$ define an edge if $C$ has a literal that occurs negated in $D$. We denote by $d$ the maximum degree of $\Gamma'$.

The symmetric version of the Lovász local lemma states that $\phi$ is satisfiable if the clauses do not "interact too much", i.e., more precisely
  \frac{d^d}{(d-1)^{(d-1)}} \leq 2^k - 1
Moser and Tardos gave a constructive proof of the previous result, in the sense that they defined a (randomized) algorithm that (efficiently) produces a satisfying assignment for $\phi$ within $O(m)$ time steps. This algorithm runs as follows:
  • Pick a random assignment for the variables of $\phi$
  • While some clause in $\phi$ is not satisfied
    • Choose (deterministically) an unsatisfied clause $C$
    • Reassign the variables in $C$ independently at random
  •  Output the satisfying assignment

      2. Reformulation

    We now proceed to the interpretation of the proof from the point of view of distributed algorithms. We define an event $e$ as a pair $(C,b)$ where $C$ is a clause, and $b$ is a bit assignment of the variables in $C$. Two events are said to be independent if their clauses are not neighbours in $\Gamma'$.

    Let $\gamma$ be a configuration, i.e., a bit assignment of the $n$ variables in the formula. The event $e = (C,b)$ is enabled in $\gamma$ if $\gamma$ does not satisfy  $C$, i.e., all the literals in $C$ evaluate to false. We say that the event  is applied to $\gamma$ when the variables of clause $C$ are reassigned according to $b$. If $\gamma'$ is the resulting configuration, we denote by $\gamma \xrightarrow{e} \gamma'$ this transition.

    We now introduce a normal form for executions inspired by the work of Cartier and Foata in trace monoids. The idea stems from the fact that if two events $e,e'$ are independent and enabled in $\gamma$, then these events can be applied in any order, and yield the same configuration. One can model a schedule of events as a word $e_0\cdot e_1 \dots e_{s-1}$ in the trace monoid generated by the events, i.e., the quotient of the free monoid over the event alphabet by the commutativity relations induced by event independence. Cartier and Foata showed that any word $w$ in the trace monoid can be uniquely represented by the following normal form
         V_0 | \dots |  V_{t-1}
    where each $V_i$ is composed of pairwise independent events, and each event in $V_{i+1}$ depends on some event in $V_i$. An execution can then be defined as a pair $(\gamma_0, S)$ where $\gamma_0$ is the initial configuration, and $S$ is a schedule given in the normal form above. If $S$ contains $s$ events, then the execution has consumed exactly $n + s\cdot k$ random bits. The normal form can be depicted as a forest:

    Fig. 1 Compact representation of a schedule.
    The dependence relations are subsumed by the graph $\Gamma'$. Each dot on a line $C_i$ represents an event with clause $C_i$. The bit assignments are omitted in the picture. The arrows represent dependences. In particular, there are at most $d$ arrows out of any node. Let $[S]$ denote the the schedule $S$ without the bit assignments, i.e., $[S]$ retains only the causal order of clauses, pretty much as depicted in Fig. 1. We will refer to $[S]$ as the causal structure of $S$.

    I don't know if Messner, Thierauf were aware of this, but it turns out that the forest construction they give in their paper corresponds almost (I have not checked it thoroughly) to the method of Cartier and Foata for computing a normal form of a word in a trace monoid.

      3. Final argument

    The crucial point in the proof of Messner and Thierauf consists in the following observation: a finite execution $(\gamma_0,S)$ is entirely determined by the final configuration $\gamma_{t-1}$ and the causal structure $[S]$. 

    Indeed, if you know $\gamma_{t-1}$ and the last group $U_{t-1}$ of pairwise independent clause in $[S]$, then the bit assignment associated with any clause $C \in U_{t-1}$ is simply the restriction of $\gamma_{t-1}$ to the variables of $C$ since they were lastly modified. In particular, $\gamma_{t-2}$ is obtained from $\gamma_{t-1}$ by simply inverting the value of every variable occurring in one of the clauses of $U_{t-1}$.

    The consequence of this is that the $n + s\cdot k$ (Kolmogorov) random bits can be computed from the pair $(\gamma_{t-1},[S])$. The classic compression method in algorithmic information theory yields
        n + s\cdot k \leq K(\gamma_{t-1},[S]) + O(1)
    where $K(\cdot)$ is the Kolmogorov complexity. An upper bound for the right-hand side is given by $n$ (number of bits to encode $\gamma_{t-1}$), and the logarithm of the number of causal structures $[S]$ with exactly $s$ (bit-free) events. Combinatorics yields
      n + s\cdot k \leq n + (d\cdot s + m) \cdot h\left(\frac{1}{d}\right)+ O(1)
    where $h(p) = - p\cdot \log p - (1-p)\cdot \log (1-p)$. The bound on $d$ gives $s = O(m)$.


    [1] Robin A. Moser, Gábor Tardos, A constructive proof of the Lovász local lemma,, 2009
    [2] Jochen Messner, Thomas Thierauf, A Kolmogorov complexity proof of the Lovász local lemma for satisfiability, Theoretical Computer Science, volume 461, pages 55-64, 2012
    [3] Pierre Cartier, Dominique Foata, Problèmes combinatoire de commutation et de réarrangements, Lecture notes in Mathematics, 85, Springer Verlag, 1969 

    Monday, November 2, 2015

    The relativization obstacle

    I was wondering what was exactly the meaning of trying to compare complexity classes relatively to some oracle, and, more precisely, how such an approach may help to compare these classes without oracles. In this post, I will focus on the celebrated result by Baker, Gill and Solovay, stating that:
    (BGS) There exist oracles $A$ and $B$ such that $P^A = NP^A$ and $P^B  \neq NP^B$.
    This result imposes a strong constraint on any attempt to prove or refute $P ~=?~ NP$. Indeed, such a proof has to be non-relativizable. Roughly speaking, a proof $\pi$ of a statement $S$ about complexity classes is said to be relativizable if for any oracle $O$ the same proof $\pi$ proves the statement $S^O$ which is $S$ with all complexity classes being relativized to the oracle $O$. For example, if there were a relativizable proof that $P = NP$, then we would have $P^O = NP^O$ for all oracles $O$; which contradicts the BGS theorem above.

    The lesson to retain is the following:
    For any statement $S$ about complexity classes, if there exists an oracle $O$ such that $S^O$ holds (resp. does not hold), then there are no relativizable proofs that $S$ does not hold (resp. holds).
    But what does a relativizable proof looks like ? Basically, it is a proof that combines composition of functions, and use of universal machines.

    Let's try to prove that $P$ is a strict subset of $NP$ (sic!). More precisely, let's try to prove this claim using diagonal/simulation arguments as in the proof of the time hierarchy theorem. We could reduce (polynomial-time reductions) some $NP$-complete problem to some problem $H$ consisting of pairs $(i,n)$ such that the deterministic machine $i$  accepts input $n$ within $poly(|n|)$, plus, perhaps, additional properties. Then, we would assume that $H$ belongs to $P$, and try to derive a contradiction. We would have a deterministic polynomial-time machine $F$ which accepts $(i,n)$ if $(i,n)$ belongs to $H$, or rejects it otherwise. Then we would build a machine $K$ that satisfies the specification of $H$ on input $i$ if $F(i,i) = 0$, or does something contradicting the specification of $H$ otherwise. If $K$ on input $K$ satisfies the specification of $H$, then $F(K,K) = 0$, i.e., the machine $K$ on input $K$ does not satisfy the specification of $H$. If $K$ on input $K$ does not satisfy the specification of $H$, then $F(K,K) = 1$, i.e., the machine $K$ on input $K$ does satisfy the specification of $H$. Whence the 1'000'000 dollar contradiction.

    The issue is, if we look at the proof above, we see that we can replace each Turing machine $M$ involved by the same machine $M^O$ augmented with some oracle $O$, without affecting the flow of arguments. In other words, such a proof is relativizing. The BGS theorem above prevents any such proof from solving the $P$ vs $NP$ problem. Bye bye dear 1'000'000 dollars.


    Occurrences of the diagonal argument

    Diagonalization occurs in many places, since its first use by Cantor to prove that the real numbers are uncountable. I will try to maintain in this post, a list of the occurrences of this argument. Just for, so to speak: fun.

    The main idea is the following. We have a set $I$ of ``codes'', and a set, e.g., N of ``inputs''. Very often, if the goal were not true, then there would be a function $F : I \times N \rightarrow \{0,1\}$ that ``efficiently'' encodes some property about the pair $(i,n) \in I \times N$. One can then build from the diagonal a new code $K \in I$ such that, for all $n \in N$, $F(K,n) = 1 - F(n,n)$. Very often, every element $i \in I$ can be encoded as an element of $N$. The contradiction is obtained with $F(K,K) = 1-F(K,K)$.

    Let's see some concrete examples.

        #1. Real numbers are uncountable

    Let $I = N = \mathbb{N}$. Assume there exists a function $F : I \times N \rightarrow \{0,1\}$ such that for every real number $x \in [0,1]$, there exists a code $i \in I$ such that $F(i,n)$ is the $n$-th bit of $x$ in its binary expansion.

    We now build a member $K \in I$ from the diagonal of $F$ by $F(K,n) = 1-F(n,n)$, i.e., the real number encoded by K is the one whose $n$-th bit is $1-F(n,n)$. But then the $K$-th bit of $K$ is $F(K,K) = 1 - F(K,K)$. Whence a contradiction.

        #2. The Halting problem

    Let $I$ be the set of codes of Turing machines. Assume that there exists a machine $F$ which solves the Halting problem, i.e., for any $(i,n)$, $F$ terminates on  $(i,n)$, and $F(i,n) = 1$ if the $i$-th machine terminates on input $n$, and $0$ otherwise. Again, we can think of $F$ as an array with rows indexed by Turing machines, and columns indexed by all possible inputs.

    We now build a machine $K \in I$ from the diagonal of $F$. On the input $n$ (a code for a Turing machine), the machine $K$ returns (any value is ok) if $F(n,n) = 0$ and runs forever otherwise. Since $K \in I$, the value $F(K,K)$ is defined. But, $K$ does not terminate on input $K$ iff $F(K,K) = 0$ iff $K$ terminates on input $K$. Whence a contradiction. Note that here the expression ``$F(K,n) = 1 - F(n,n)$'' from the introduction is to be understood as: $K$ does not terminate on input $n$ if $F(n,n)$ returns $1$.

        #3. The Time Hierarachy theorem

    Let $I$ be the set of (codes of) Turing machines. I am merely restating the proof on wikipedia. Let $f : \mathbb{N} \rightarrow \mathbb{N}$ be a time constructible function. Consider the following problem
      H = \{ (i,n),~\text{the machine } i\in I \text{ accepts } n \text{ within } f(|n|) \text{ steps}\}
    Here, ``accept'' means returning $1$. It is not difficult to see that $H \in \mathsf{DTIME}(f(m)^3)$ where $m$ denotes the size of $(i,n)$. The goal is to prove that $H$ does not belong to $\mathsf{DTIME}(f(m/2))$. Assume it is the case. Then, there exists a machine $F$ which terminates on each input $(i,n)$ within $f(m/2)$ steps such that $F(i,n) = 1$ if $(i,n) \in H$  or $0$ otherwise.

    We now build a machine $K$ from the diagonal of $F$. The machine $K$ accepts the input $i$ if $F$ rejects $(i,i)$, and rejects it otherwise. Let $s$ be the size of $K$, and $m \simeq 2s+1$ be the size of $(K,K)$. The function $F$ takes about $f(m/2) \simeq f(s)$ steps to compute $F(K,K)$. In particular, $K$ terminates within $f(s)$ steps on input $K$. Now, $K$ accepts input $K$ (which it does within $f(s)$ steps) iff $F$ rejects input $(K,K)$ iff $K$ does not accept $K$ within $f(s)$ steps iff $K$ rejects $K$ (which it does within $f(s)$ steps). Whence a contradiction. Here, the expression ``$F(K,n) = 1 - F(n,n)$'' is to be understood as: $K$ rejects (accepts) $n$ if $F$ accepts (rejects) $(n,n)$.