Ericpony's Blog: 2013

Tuesday, November 19, 2013

Message authentication code

In almost all situations the damage that modified message can do is far greater than the damage of leaking the plaintext. Therefore, you should always combine encryption with authentication.

A secure MAC must have existential unforgeability under an adaptive chosen-message attack.

Let $F:\mathcal K\times \mathcal X \rightarrow \mathcal Y$ be a PRF. Then if we set $I=(S,V)$ such that $S(k,m)=F(k,m)$ and $V(k,m,t)=yes$ iff $F(k,m)=t$, then for all PPT-adversary $A$ that attacks $I$, there is a PPT-adversary $B$ that attacks $F$ and $$Adv_{MAC}[A,I]\le Adv_{PRF}[B,F] + \frac{1}{|\mathcal Y|}.$$It follows that if $F$ is secure, then the derived MAC is secure given that $|\mathcal Y|$ is large, say $|\mathcal Y|=2^{20}.$

Algorithm: Order Statistics

Finding dominating members

Problem Given $k\ge2$ and array $A$ with $|A|=n$, find the elements with frequency greater than $n/k$ in $O(nk)$ time. (When $k=2$, this problem reduces to finding the majority.)

Solution Given a multiset $A$ and $k>0$, we call a number $(A,k)$-dominating if its frequency in $A$ is greater than $|A|/k$. Observe that if $S\subset A$ and $|S|=k$, then $(A,k)$-dominating implies $(A/S,k)$-dominating.
Now we can solve this problem by removing sets of $k$ distinct elements from $A$ until no such set exists. To do this efficiently, we enumerate $A$ and in the meantime maintain $k-1$ counters to record the frequencies of duplicate elements. For each element, we check if all $k-1$ counters are nonzero. If so, we can remove $k$ distinct elements from $A$ by decrementing all counters by 1. Otherwise, we record the element by incrementing the associated counter by 1. At the end of enumeration, elements that dominate $A$ must have the maximal counter values in the record. Note that the reverse is not true, since it is possible that dominating elements don't exist in the first place. Hence, we collect the elements associated with the maximal counter values and return them if they indeed dominate. [Example Code]

Sunday, August 18, 2013

Cryptography: Notions of security

This article briefly discusses the relations between the following notions of security:

Ciphertext integrity
Ciphertext indistinguishability
Semantic security
Existential unforgeability
Collision resistance

Ciphertext integrity

It is computationally infeasible to create a ciphertext that decrypts correctly. More precisely, if the decryption key is chosen at random, then a ciphertext forged in polynomial time fails to decrypt by that key with very high probability. The main purpose of CI is to ensure that the ciphertext is not modified during the transmission from the sender to the receiver. Note that when we say CI, we actually means CI under CPA.

In general, good encryption does not necessarily imply integrity.
Encryption schemes that never reject cannot be CI. For example, CBC with random IV never rejects a ciphertext, hence the adversary easily wins CI game.

Ciphertext indistinguishability

It is computationally infeasible to distinguish two ciphertexts with respect to their plaintexts. More precisely, given two plaintexts $m_0, m_1$ such that $|m_0|=|m_1|$ and $c\in_r\{E(k,m_0),E(k,m_1)\}$, a polynomial-time adversary has no efficient way to tell the plaintext of $c$ significantly better than random guess. Ciphertext indistinguishability (IND) is identified by the type of attack and can be ordered according to their strength of security as follows: IND-CPA < IND-CCA1 < IND-CCA2. Note that IND-CPA plus CI imply IND-CCA1.

An encryption system with ciphertext indistinguishability must be randomized. For public key encryption system, the adversary have an public key issued by the challenger, and thus he can encrypt any plaintexts at his disposal. However, the probabilistic nature of the encryption means that the ciphertexts are nearly random, and therefore encrypting $m_0, m_1$ and comparing the resulting ciphertexts with the challenge ciphertext does not afford any non-negligible advantage to the adversary.

Semantic security

It is computationally infeasible to learn information about the plaintext from the ciphertext. More precisely, if two plaintexts $m_0, m_1$ satisfy $|m_0|=|m_1|$, then $\{E(k, m_0)\} \approx_p \{E(k, m_1)\}$. Note that when we say semantic security, we actually mean semantic security under CPA. Hence, the two probability distributions above are in fact conditioned on a polynomial number of plaintext-ciphertext pairs where the plaintexts are chosen by the adversary.

Semantic security is shown to be equivalent to IND-CPA, and many cryptographic proofs use these two definitions interchangeably. Like IND, deterministic encryption cannot be semantically secure.

Existential unforgeability

It is computationally infeasible to forge a message along with a correct tag. More precisely, if $(S_k,V_k)$ is a MAC with existential unforgeability and a polynomial-time adversary forges $m,t$ without $k$, then $V_k(m,t)=0$ with very high probability. Note that when we say existential unforgeability, we actually mean existential unforgeability under CPA, ie., the adversary can request a polynomial number of messages for their tags before the challenge.

Fact. Let $(E,D)$ be a symmetric encryption system and define a MAC by letting $T_k(m)=E(k,m)$ and $V_k(m,t)=1$ iff $D(k,m)=t$. Then this MAC has existential unforgeability iff the encryption system has ciphertext integrity.

Collision resistance

It is computationally infeasible to find two inputs with the same output. More precisely, if $H$ is a function with collision resistance, then no polynomial-time adversary can find $m_1\neq m_2$ such that $H(m_1)=H(m_2)$ with a chance significantly better than random guess. Collision resistance is a property of a cryptographic hash function, among the following two properties: (i) Preimage resistance: For any element in the image, it is computationally infeasible to find its preimage; (ii) Second-preimage resistance: For any element in the domain, it is computationally infeasible to find another element in the domain with the same image.

Saturday, July 20, 2013

Cryptography: PRG, PRF and PRP

The main purpose of this post is to show that the three fundamental cryptographic primitives PRG, PRF and PRP are in fact equivalent in the sense of (semantic) security. We shall do this by showing that

PRG to PRF: using GGM PRF construction
PRF to PRP: by the Luby-Rackoff Theorem (ie., a 3-round Feistel cipher)
PRP to PRG: using counter mode

PRG to PRF

Since PRF is somewhat a "random access" version of PRG, one may want to simulate a PRF $F(k,i)$ by enumerating a PRG $G(k)$ to its $i$th value. However, this design takes exponential time and is thus not satisfactory. There are many method to build an efficient secure PRF out of a secure PRG, and here we present the arguably most intuitive one (which is also the first one): the Goldreich-Goldwasser-Micali construction [GGM'84]. Given a PRG $G:\{0,1\}^m\rightarrow\{0,1\}^{2m}$, let $G_0,G_1$ be the left and right halves of $G$, ie., $G(k)=G_0(k)||G_1(k)$. A PRF $F:\{0,1\}^m \times \{0,1\}^n\rightarrow \{0,1\}^m$ is defined by $$F(k, x_1...x_n) = G_{x_n}(G_{x_{n-1}}(...(G_{x_2}(G_{x_1}(k))...)).$$ The security of the construction can be easily proven by induction on $n$. Its fixed parameter security is given by the following theorem, though we omit the proof due to its technicality:

Theorem. If $G$ is a $(t,\epsilon)$-PRG then $F$ is a $(t-cn,\epsilon q n,q)$-PRF for some constant $c$.

Note that the PRF expansion lemma also implies that one can build a secure PRF out of a secure PRG.

PRP to PRG

Counter mode.

Example. (OTP vs PRF) A one-bit PRF defined by $E(k,m)=k\oplus m$ is secure, since its distribution is exact the set of all one-bit PRPs. However, an $n$-bit PRF defined in similar way is no longer a secure PRF, since $E(k,m_1)\oplus E(k,m_2) = m_1\oplus m_2$ is obviously not random.

Example. (PRP vs PRF) A secure invertible PRF is clearly a secure PRP. A secure PRP, however, may not be a secure PRF, since a PRP must be invertible, and thus is not random when looked upon as a PRF. On the other hand, the PRF Switching Lemma tolds as if the input length is sufficiently large, then a secure PRP is indeed a secure PRF by thr PRF Switching Lemma: Let $E$ be an secure PRP over $(\mathcal K, \mathcal M)$. For an adversary that makes at most $q$ queries, then $$|Adv_{PRF}(A,E)-Adv_{PRP}(A,E)|\le q^2/|\mathcal M|.$$

Sunday, July 14, 2013

Exhaustive search attacks

Ideal ciphers

A block cipher is a function $E:\mathcal K \times \mathcal M \rightarrow \mathcal C$. Here we shall let $|\mathcal K| = n$ and $|\mathcal M| = |\mathcal C| = poly(n)$, where $n$ is called the security parameter. A block cipher $E$ is called an ideal cipher if for any $m\in \mathcal M$, $Adv(E(k,m),r)$ is negligible for $r \in_r \{0,1\}^{poly(n)}$ and $k \in_r \mathcal K$. Another characterization of an ideal block cipher $E$ is $\{E(k,m)\} \approx_p \{f_k(m)\}$ for any $m\in \mathcal M$, where ${f_k}$ are a random subset of the family of all invertible functions from $\mathcal M$ to $\mathcal C$. The second characterization is especially helpful, because it allows us to approximate an ideal cipher by a random invertible functions.

Fact. Let $E:\mathcal K\times\mathcal M\rightarrow \mathcal C$ be an ideal cipher. For all $c,m$, the probability that a cipher-text can only be decrypted by the encryption key is $$P\{k'\neq k, E(k',m)=c\} \le \sum_{k'\in\mathcal K,k'\neq k} P\{E(k',m)=c\} \le \frac{|\mathcal K|}{|\mathcal M|}.$$ Assuming that DES is ideal, the probability that one can recover a unique key from $N$ input-output pairs is larger than $1-N2^{56-64N}$. When $n=1$, the probability is roughly 99.5% --- already high enough for an exhaustive search attack to be reasonable.

Attacks on block ciphers

To attack a block cipher $(E,D)$, we are given a few input-output pairs $(m_i, c_i=E(k,m_i))$, $i=1,...,n$ to find the key $k$. The ciphers we are interested here are DES and some of its strengthened versions
DES: $E(k,m)$ for $k\in \{0,1\}^{56}$, $m\in \{0,1\}^{64}$
2DES: $E^2((k_1,k_2),m) = E(k_2,E(k_1,m))$
3DES: $E^3((k_1,k_2,k_3),m) = E(k_3,D(k_2,E(k_1,m)))$
DX: $E^x((k_1,k_2,k_3),m) = k_3\oplus E(k_2, k_1\oplus m)$

DES Challenge: Broken by exhaustive search attack since 1997.

2DES Challenge: Given $c,m$, find $k_1,k_2$ such that $E(k_2,E(k_1,m))=c$.

Build a sorted table $T$ such that for each $k_1$, $T[k_1]=E(k_1,m)$.
For each $k_1$, check whether $D(k_2,c)$ is in the table.

Both steps take $O(56 \cdot 2^{56})=O(2^{62})$ time and $O(2^{56})$ space, which turns out to be the total cost of the attack (called the meet-in-the-middle attack). This complexity is considered tractable with technology of nowadays.

3DES Challenge: Given $c,m$, find $k_1,k_2,k_3$ such that $E(k_3,D(k_2,E(k_1,m)))=c$.

Build a sorted table $T$ such that for each $k_1$, $T[k_1]=E(k_1,m)$.
For each $k_1,k_2$, check whether $E(k_2,D(k_3,c))$ is in the table.

The first step takes $O(56\cdot 2^{56})$ time and the second step takes $O(56\cdot 2^{56}\cdot 2^{56})=O(2^{118})$ time. Both steps take $O(2^{56})$ space. In practice, we regard a cipher as sufficiently secure if it takes $\omega(2^{90})$ time to compromise. This is why 3DES is widely used today while DES and 2DES are not.

Monday, July 1, 2013

Security of stream ciphers

Perfect Security. A cipher $(E,D)$ over $(\mathcal K, \mathcal M, \mathcal C)$ is called perfectly secure if for all $c\in \mathcal C$, $k \in_r \mathcal K$, and $m_1,m_0\in \mathcal M$ with $|m_1| = |m_0|$, $$P\{ E(k,m_0)=c \} = P\{E(k,m_1)=c \}.$$ The canonical example of perfectly secure cipher is the one-time pad (OTP). This definition of security, however, turns out to be too strict for practical ciphers: It is clear that a necessary condition for perfect security is $|\mathcal K| \ge |\mathcal M|$, since for any $c \in\mathcal C$ and any $m \in\mathcal M$ there must exist $k \in\mathcal K$ such that $E(k,m)=c$. Hence perfect security can be guaranteed only if $|k|=\omega(|m|)$, which made perfectly secure ciphers hard to use in practice. (Recall that a stream cipher uses symmetric key. If we have a secure channel to transfer a key as long as a message, why not using it to transfer the message in the first place?)

Semantic Security. A cipher $(E,D)$ over $(\mathcal K, \mathcal M, \mathcal C)$ is called semantically secure if for $k \in_r \mathcal K$ and all $m_0,m_1\in \mathcal M$ with $|m_0| = |m_1|$, there is no efficient algorithm that can distinguish $E(k,m_0)$ from $E(k,m_1)$ with non-negligible probability. This fact can also be written as $\{E(k, m_0)\} \approx_p \{E(k, m_1)\},$ which means the two distributions are computationally indistinguishable.

Recall that a PRG is secure if it is computational indistinguishable from a true RNG. The following theorem connects the definition of secure PRGs and and the definition of semantic security.

Theorem. A stream cipher derived from a secure PRG is semantically secure.

The intuition of behind the theorem is as follows. It is clear that any stream cipher derived from a true RNG is semantically secure. Therefore, if a PRG derives a stream cipher that is not semantically secure, then there exists an efficient way to distinguish the PRG from a true RNG (i.e., by deriving a stream cipher and then testing the semantic security of the derived cipher), which would imply that the PRG is not secure.

Before we proceed to a formal proof, we need to make the statement ‘‘distinguish A from B’’ more precise. We define the advantage of an adversary $A$ over an algorithm $G$ with respect to $G'$ as $$Adv(A,G,G')=|P\{A(G(X))=1\} - P\{A(G'(Y))=1\}|,$$ where $X$ and $Y$ and random variables. The larger the advantage is, the more capable $A$ can distinguish $G$ from $G'$. With this definition, we see that $G(X) \approx_p G'(Y)$ if and only if $Adv(A,G,G')$ is negligible for all efficient adversaries $A$. We shall write $Adv(A,G)$ for simplicity when $G'$ is clear from the context.
Now suppose that $E$ is a stream cipher derived from PRG $G$.

Proof of the theorem. It suffices to show that for any adversary $A$ for the cipher, there is an adversary $B$ for the PRG such that $Adv(A,E) \le 2Adv(B,G).$ Our approach, which is quite standard in proving results concerning strength of security, is to build adversary $B$ using $A$ as an oracle. Formally, we define $B_i(X)=A(m_i\oplus G(X))$ for $i\in \{0,1\}$ and let $B=B_j$ where $j=\arg\max_i\{Adv(B_i,G) \}$. The construction of $B_0$ is illustrated as follows.

When $Y$ is uniformly random, there is no way to tell $m_0\oplus Y$ from $m_1 \oplus Y$, and thus $P\{A(m_0\oplus Y)=1\} = P\{A(m_1 \oplus Y)=1\}$. Now, note that

$Adv(A,E)=|P\{A(m_0 \oplus G(X))=1\} - P\{A(m_1 \oplus G(X))=1\}| \\ = |P\{A(m_0 \oplus G(X))=1\} - P\{A(m_0 \oplus Y)=1\} + P\{A(m_1 \oplus Y)=1\} - P\{A(m_1 \oplus G(X))=1\}| \\ \le |P\{A(m_0 \oplus G(X))=1\} - P\{A(m_0 \oplus Y)=1\}| + |P\{A(m_1 \oplus Y)=1\} - P\{A(m_1 \oplus G(X))=1\}| \\ =Adv(B_0,G) + Adv(B_1,G) \le 2Adv(B,G).$

This completes the proof. $\square$

Sunday, June 30, 2013

A predictable PRG is exactly an insecure PRG

Let $A(x)$ be an algorithm and $n=|x|$ denote the length of the binary encoding of its input $x$. We say that an event regarding $A$ and $x$ holds with negligible probability if it holds with probability $O(1/poly(n))$. We say the event holds with significant probability if it holds with probability $1-O(1/poly(n))$. Note that the property of being negligible or significant is closed under polynomial additions and multiplications. That is, the probability of the union and the interception of a polynomial number of negligible (resp.$\ $significant) events is still negligible (resp.$\ $significant).

We say that a PRG $G$ is predictable if given a random seed $k$, there is an efficient algorithm that can predict some $i$th bit of the output according to first $i-1$ bits with non-negligible probability.

We say that a PRG $G$ is insecure if given a random seed $k$, there is an efficient algorithm that can distinguish it from a true random number generator with non-negligible probability.

Static abstract methods in Java

In Java 6 and earlier, you can't define an abstract static method. For example,

abstract class foo {
    abstract void bar( );        // this is ok
    abstract static void bar2(); // this gives a compile-time error
}

Abstract static methods work fine and are commonly used in many other languages, e.g., @classmethod in Python. Indeed, some methods just don't make sense as instance methods, and it would be much more effective to call directly a static abstract method than creating an instance just for using that abstract method. If Java supported abstract static methods, one could expect that the method (i) must be implemented by subclasses, and (ii) is a class method of the subclass. In Java, however, a static method cannot be inherited or overridden. If a subclass has a static method with the same signature as a static method in the superclass, the former shadows the latter instead of overriding it. (See this document for rules of overriding and shadowing in Java.)
On the other hand, some languages do support static inheritance, just like the way they support instance inheritance. From a syntax perspective, those languages usually require the class name to be included in the statement. For example, in Java, assuming you are writing code in ClassA, where methodA is a static method and there is no instance method with the same signature. Then methodA() and ClassA.methodA() are equivalent statements. In SmallTalk, however, the class name is not optional, so the syntax is (note that SmallTalk does not use the . to separate the "subject" and the "verb", but uses it as the statement terminator): ClassA methodA. Because the class name is always required, the correct "version" of the method can always be determined by traversing the class hierarchy.
Since a Java static method defined in an abstract class would belong to that very class, an abstract static method could not be used anyway. Fortunately, an abstract class can have static fields and static methods starting from Java 7 (see this document). These static members can be invoked with a class reference as you would do with a non-abstract class.

Tuesday, February 19, 2013

A note on omega automata

$\omega$-automata are finite state automata interpreted on infinite words. Non-deterministic $\omega$-automata are quintuples $(Q,\Sigma,I,T,\mathcal F)$, where $Q$ is a finite set of states, $I$ is a non-empty set of initial states, $T:Q\times\Sigma\times Q\to Q$ is a transition relation, and $\mathcal F$ is an acceptance condition.
The Rabin condition is in form of $\mathcal F=\{(A_i,R_i):i\in J\}$, and run is accepting iff there exists $j\in J$ such that it visits $A_j$ i.o. but visits $R_i$ only finitely many times.
The Streett condition is $\mathcal F=\{(A_i,R_i):i\in J\}$ such that a run is accepting iff there exists $j\in J$ such that it visits $A_j$ i.o. or visits $R_i$ only finitely many times.
The parity automata are $\omega$-automata with a priority function $p:Q\to [c]$ for some $c\in\mathbb N.$ A run $\pi$ is accepting iff $\lim\sup p(\pi(n))$ is even. Parity automata can be simulated by $\omega$-automata with the Rabin acceptance conditions: just define $J:=\mathbb N$, $A_n:={s\in S: p(s)\le 2n\}$, and $R_n:={s\in S: p(s)\le 2n-1\}$.

Parity Automata.

Ericpony's Blog