Get Complete Project Material File(s) Now! »
A Language for Describing Games
E have tried so far to be as rigorous as possible in our treatment of crypto-W graphic proofs. We argued that a game-based approach can lead to tidier, more understandable proofs that help eliminate conspicuous errors and make clear the reasoning behind each step in a proof. We moved one step forward, taking a language-based approach and regarding games as programs. But still, our under-standing of what a game means remains purely intuitive. In this chapter we will make this intuition precise by defining formally the probabilistic language we use to describe games and its semantics. This semanticist perspective allows a pre-cise specification of the interaction between an adversary and the challenger in a game, and to readily answer questions that often arise in proofs, such as: Which oracles does the adversary have access to? Can the adversary read/write this vari-able? How many queries the adversary can make to a given oracle? What is the type/length of a value returned by the adversary? Can the adversary repeat a query? Furthermore, the framework enables us to give very precise definitions of fundamental notions such as probabilistic polynomial-time complexity or termina-tion which are of paramount importance in the specification of security definitions and computational hardness assumptions.
Mathematical Preliminaries
The Unit Interval
The starting point of our formalization is the ALEA Coq library, developed by Paulin-Mohring and described in [Audebaud and Paulin-Mohring 2009]. It provides an axiomatization of the unit interval [0 1], with the following operations:
Addition: (x y ) min(x + y 1), where + denotes addition over reals;
Inversion: x 1 − x, where − denotes subtraction over reals;
Multiplication: (x y ) x × y, where × denotes multiplication over reals;
Division: (x y = 0) min(xy 1), where denotes division over reals; more-
over, if y = 0, for convenience division is defined to be 0.
Other useful operations can be derived from these basic operations; for instance the absolute value of the difference of two values x y ∈ [0 1] can be obtained by computing (x − y) + (y − x) and their maximum by computing (x − y) + y.
The unit interval can be given an ω-complete partial order (cpo) structure. Recall that an ω-cpo consists of a partially ordered set such that any monotonic sequence has a least upper bound. The unit interval [0 1] can be given the structure of a ω-cpo by taking as order the usual ≤ relation and by defining an operator sup that computes the least upper bound of a monotonic sequence f : N → [0 1] as follows:
sup f = max f (n)
n∈N
More generally, for any complete partially ordered set D, we use sup f to denote the least upper bound of a monotonic sequence f : N → D. Note that a cpo structure on D induces a cpo structure in the function space A → D by taking f ≤A→D g DEF
The Measure Monad
Programs are interpreted as functions from initial memories to sub-probability dis-tributions over final memories. To give semantics to most programs used in crypto-graphic proofs, it would be sufficient to consider sub-distributions with a countable support, which admit a very direct formalization as functions of the form
: A → [0 1] such that (x) ≤ 1 x∈A
However, it is convenient to take a more general approach and represent instead a distribution over a set A as a probability measure, by giving a function that maps a [0 1]-valued random variable (a function in A → [0 1]) to its expected value, i.e. the integral of the random variable with respect to the probability measure. This view of distributions eliminates the need of cluttered definitions and proofs involving summations, and allows us to give a continuation-passing style semantics to programs by defining a suitable monadic structure on distributions. Formally, we represent a distribution on A as a function of type DEF (A → [0 1]) →[0 1]
We do not restrict our attention only to distributions with probability mass of 1, but we consider as well sub-probability distributions, that may have a total mass strictly less than 1. As we will see, this is key to give semantics to non-terminating programs (i.e. programs that do not terminate with probability 1).
Distributions can be interpreted as a monad whose unit and bind operators are defined as follows:
unit : A → D(A) DEF λx λf f x = bind : D(A) → (A → D(B)) → D(B) DEF λ λF λf (λx (F x) f )
These operators satisfy the usual monadic laws
bind (unit x) F = F x
bind unit =
bind (bind F ) G = bind (λx bind (F x) G)
The monad D was proposed by Audebaud and Paulin-Mohring [2009] as a variant of the expectation monad used by Ramsey and Pfeffer [2002], and builds on earlier work by Kozen [1981]. It is, in turn, a specialization of the continuation monad (A → B) → B, with result type B = [0 1].
Lifting Predicates and Relations to Distributions
For a distribution : D(A) over a countable set A, we let support( ) denote the set of values in A with positive probability, i.e. its support:
support( ) DEF x ∈ A | 0 < I{x}
where IX denotes the indicator function of set X,
IX DEF1 if x ∈ X = 0 otherwise
To lift relations to probability distributions we follow the early work of Jonsson et al. [2001] on probabilistic bisimulations.
Definition 2.1 (Lifting predicates to distributions). Let be a distribution on a set A, and P be a predicate on A. We define the lifting of P to as follows: range P DEF ∀f (∀x P x =⇒ f x = 0) =⇒ f = 0
This rather contrived definition is necessary because we consider sub-probability distributions whose total measure may be less than 1; equivalently we could have stated it as:
range P DEF ∀f (∀x P x =⇒ f x = 1) =⇒ f = 1
Note also that due to the way distributions are formalized, the above definition is strictly stronger than the following, more intuitive definition:
range P DEF ∀x ∈ support( ) 0 < (I{x}) =⇒ P x
This latter definition makes sense only for distributions with countable support, for which it can be proved equivalent to the above definitions.
Definition 2.2 (Lifting relations to distributions). Let 1 be a probability distribution on a set A and 2 a probability distribution on a set B. We define the lifting of a relation R ⊆ A × B to 1 and 2 as follows:
1R# 2 = ∃ :D(A×B) π1( )= 1∧π2( ) = 2 ∧ range R (2.1)
where range R stands for the lifting of R, seen as a predicate on pairs in A × B, to distribution , and the projections π1( ), π2( ) of are given by π1( ) DEF bind (unit ◦ fst) π2( ) DEF bind (unit ◦ snd)
In contrast to the definition given by Jonsson et al. [2001], the definition above makes sense even when the distributions do not have a countable support. When they do, both definitions coincide; in this case, 1 R# 2 amounts to saying that the probability of each element a in the support of 1 can be divided among the elements related to it in such a way that when summing up over these probabilities for an element b ∈ B, one obtains 2 I{b}.
Let us give an example that conveys a better intuition; suppose one wants to prove UA R# UB , where UX stands for the uniform probability distribution on a finite set X. When A and B have the same size, proving this is equivalent to exhibiting a bijection f : A → B such that for every a ∈ A, R(a f (a)) holds. Indeed, using such f it is easy to build a distribution on A × B that satisfies the condition in (2.1): DEF bind UA (λa unit (a f (a)))
This example, as trivial as it may seem, shows that probabilistic reasoning can sometimes be replaced by simpler forms of reasoning. In typical cryptographic proofs, purely probabilistic reasoning is seldom necessary and most mundane steps in proofs can be either entirely automated or reduced to verifying simpler conditions, much like in the above example, e.g. showing the existence of a bijection with particular properties.
The way we chose to lift relations over memories to relations over distributions is a generalization to arbitrary relations of the definition of Sabelfeld and Sands [2001] that applies only to equivalence relations. Indeed, there is a simpler but equivalent (see [Jonsson et al. 2001]) way of lifting an equivalence relation to distributions:
if R is an equivalence relation on A, then 1 R# 2 holds if and only if for all equivalence classes [a] ⊆ A, 1 I[a] = 2 I[a].
Define two functions f and g to be equal modulo a relation Φ iff f =Φ g DEF ∀x y x Φ y =⇒ f (x) = g(y)
It can be easily shown that the above general definition of lifting satisfies 1 Φ# 2 ∧ f =Φ g =⇒ 1 f = 2 g and analogously 1 Φ# 2 ∧ f ≤Φ g =⇒ 1 f ≤ 2 g
We use these properties to prove rules relating observational equivalence to proba-bility in Section 3.1.
It can be shown that lifting preserves the reflexivity and symmetry of the lifted relation, but proving that it preserves transitivity is not as straightforward. Ideally, one would like to have for probability measures 1 : D(A), 2 : D(B), 3 : D(C) and relations Ψ ⊆ A × B, Φ ⊆ B × C 1Ψ# 2 ∧ 2Φ# 3 =⇒ 1(Ψ◦Φ)# 3
Proving this for arbitrary distributions requires proving Fubini’s theorem for prob-ability measures, which allows to compute integrals with respect to a product mea-sure in terms of iterated integrals with respect to the original measures. Since in practice we consider distributions with a countable support, we do not need the full generality of this result, and we prove it under the assumption that the distribution 2 has a countable support, i.e. there exist coefficients ci : [0 1] and points bi : B such that, for any f , f = cif (bi)
Lemma 2.3. Consider d1 : D(A), d2 : D(B), d3 : D(C) such that d2 has countable support. Suppose there exist distributions 12 : D(A × B) and 23 : D(B × C) that make 1 Ψ # 2 and 2 Φ# 3 hold. Then, the following distribution over A × C is witness for the existential in 1 (Ψ ◦Φ)# 3:
DEF 2 λb λp 1snd(p)=b 2 I{b}
23 λq 1fst(q)=b 2 I{b} f (fst p snd q)
Proof. The difficult part of the proof is to show that the projections of this distri-bution coincide with 1 and 2. For this, we use the fact that 2 is discrete to prove that iterative integration with respect to 2 and another measure commutes. This is the case because we can write integration with respect to 2 as a summation and we only consider measures that are continue and linear. For instance, to see that the first projection of 13 coincides with 1: π1( 13) f = ci 12 λp 1snd(p)=bi c i 23 λq1fst(q)=bi c i f (fst p)
The pWHILE Language
We describe games as programs in the pWHILE language, a probabilistic extension of an imperative language with procedure calls. This language can be regarded as a mild generalization of the language proposed by Bellare and Rogaway [2006], in that our language allows while loops whereas theirs only allow bounded for loops. The formalization of pWHILE is carefully crafted to exploit key features of Coq: it uses modules to support an extensible expression language that can be adapted according to the verification goal, dependent types to ensure that programs are well-typed and have a total semantics, and monads to give semantics to probabilistic programs and capture the cost of executing them.
We formalize programs in a deep-embedding style, i.e. the syntax of the lan-guage is encoded within the proof assistant. Deep embeddings offer one tremendous advantage over shallow embeddings, in which the language used to represent pro-grams is the same as the underlying language of the proof assistant. Namely, a deep embedding allows to manipulate programs as syntactic objects. This permits to achieve a high level of automation in reasoning about programs through cer-tified tactics that implement syntactic program transformations. Additionally, a deep embedding allows to formalize complexity issues neatly and to reason about programs by induction on the structure of their code.
The semantics of programs is given by an interpretation function that takes a program p—an element of the type of programs—and an initial state s, and returns the result of executing p starting from s. In a deterministic case, both s and the result of executing p starting from s would be deterministic states, i.e. memories mapping variables to values. In the case of pWHILE programs, the denotation of a program is instead a function mapping an initial state to a (sub)-probability measure over final states. We use the measure monad described in 2.1 to define the denotation of programs.
Syntax
Given a set V of variable identifiers, a set P of procedure identifiers, a set E of deterministic expressions, and a set DE of distribution expressions, the instructions I and commands C of the language can be defined inductively by the clauses:
I::=V←E deterministic assignment
| V←DE probabilistic assignment
| if E then C else C conditional
| while E do C while loop
| V←P(E E) procedure call
C ::= skip nop
| I; C sequence
The inductive definition of the language suffices to understand the rest of the pre-sentation and the reader may prefer to retain it for further reference. In practice, however, variable and procedure identifiers are annotated with their types, and the syntax of programs is dependently-typed. Thus, x ← e is well-formed only if the types of x and e coincide, and if e then c1 else c2 is well-formed only if e is a Boolean expression and c1 and c2 are themselves well-formed. An immediate benefit of using dependent types is that the type system of Coq ensures for free the well-typedness of expressions and commands.
In the remainder of this section we describe in detail the formalization of the syntax and semantics of the language. Most readers, particularly those not familiar with Coq, can skim through this section without hindering the understanding of the rest of the dissertation.
Background on the COQ proof assistant
We built our framework on top of Coq, a general purpose proof assistant that has been used for over two decades to formalize results in mathematics and computer science [The Coq development team 2009]. Coq provides an expressive specifi-cation language based on the Calculus of Inductive Constructions, a higher-order dependently-typed λ-calculus in which mathematical notions can be formalized con-veniently. The Coq logic distinguishes between types, of type Type, which represent sets, and propositions, of type Prop, which represent formulae: thus, a : A is inter-preted as “a is an element of type A” if A is a set, and as “a is a proof of A” if A is a proposition. In the latter case, we say that a is a proof object. Types can ei-ther be introduced by standard constructions, e.g. (generalized) function space and products, or by inductive definitions. Most common inductive types are predefined, including the type N of natural numbers, the type B of Boolean values, and sum and product types. We will also use the inductively defined types of homogeneous and heterogeneous lists. Homogeneous lists are composed of elements of the same type. The polymorphic inductive type of homogeneous lists is defined as follows:
Inductive list A : Type :=
| nil : nil
cons : A → list A → list A
The list constructor cons is usually represented using an infix notation as the oper-ator “::”. Thus, the list composed of the natural numbers 1, 2 and 3, in that order, has type list N and could be represented as (1 :: 2 :: 3 :: nil). Heterogeneous lists are composed of elements whose type may depend on a value. Given a type A and a type-valued function P : A → Type, the inductive type of heterogeneous lists is defined as follows:
Inductive hlist A (P : A → Type) : list A → Type :=
dnil : hlist nil
dcons : ∀a l P a → hlist l → hlist (a :: l)
We will use A⋆ to denote the type of A-lists (i.e. list A), and Pl⋆ to denote the type of heterogeneous P -lists over a list of values l (i.e. hlist P l).
Types
We formalize a dependently-typed syntax, and use the underlying type system of Coq to ensure for free that expressions and commands are well-formed. In our experience, the typed syntax provides particularly useful feedback when debugging proofs and makes proofs easier by restricting reasoning about programs to reasoning about their meaningful behaviors.
Table of contents :
1 Introduction
1.1 The CertiCrypt Framework
1.2 Organization of the Dissertation
1.3 A Primer on Formal Proofs
1.4 Introductory Examples
1.4.1 The ElGamal Encryption Scheme
1.4.2 The Hashed ElGamal Encryption Scheme
2 A Language for Describing Games
2.1 Mathematical Preliminaries
2.1.1 The Unit Interval
2.1.2 The Measure Monad
2.1.3 Lifting Predicates and Relations to Distributions
2.2 The pWhile Language
2.2.1 Syntax
2.2.2 Semantics
2.3 Probabilistic Polynomial-Time Programs
2.4 Adversaries
2.5 Making Security Properties Precise
2.5.1 EF-CMA Security
2.5.2 IND-CCA2 Security
3 Reasoning about Games
3.1 Probabilistic Relational Hoare Logic (pRHL)
3.1.1 Observational Equivalence
3.2 Bridging Steps
3.2.1 Certified Program Transformations
3.2.2 Algebraic Equivalences
3.2.3 Inter-procedural Code Motion
3.3 Reasoning about Failure Events
3.3.1 A Logic for Bounding the Probability of Events
3.3.2 Automation
4 The PRP/PRF Switching Lemma
4.1 Pseudorandom Functions
4.2 Pseudorandom Permutations
4.3 The PRP/PRF Switching Lemma
4.3.1 A Proof Based on Eager Sampling
4.3.2 A Proof Based on the Failure Event Lemma
4.3.3 Comparison of Both Proofs
4.4 Pseudorandom Permutations as Pseudorandom Functions
4.5 Discussion
4.6 Related Work
5 Unforgeability of Full-Domain Hash Signatures
5.1 The Original Proof
5.2 Improved Bound
5.3 Practical Interpretation
5.4 Discussion and Related Work
6 Ciphertext Indistinguishability of OAEP
6.1 Indistinguishability under Chosen-Plaintext Attacks
6.2 Indistinguishability under Chosen-Ciphertext Attacks
6.2.1 Proof Outline
6.2.2 Notes about the Proved Security Bound
6.3 About the Security of OAEP in the Standard Model
7 Machine-Checked Formalization of Zero-Knowledge Protocols
7.1 Sigma-Protocols
7.1.1 Relation between sHVZK and HVZK
7.2 Sigma Protocols Based on Special Homomorphisms
7.2.1 Concrete Instances of Sigma-Phi Protocols
7.2.2 Composition of Sigma-Phi Protocols
7.3 Sigma Protocols Based on Claw-Free Permutations
7.4 Combination of Sigma-Protocols
7.4.1 AND-Combination
7.4.2 OR-Combination
7.5 Related Work
7.6 Perspectives
8 Related Work and Conclusion
8.1 Related Work
8.2 Conclusion
References
Index