Get Complete Project Material File(s) Now! »
Entropy
Entropy is an often used term when describing cryptographic primitives. It is a measure for the uncertainty about the output of a data source and was introduced for the first time 1948 by Shannon as a central concept of his work about communication theory. His definition of entropy was an answer to the question:
Can we find a measure of how much “choice” is involved in the selection of the event or how uncertain we are of the outcome? [SW49]
Entropy belongs to the area of information theory. In this chapter we deal with the definition of entropy and some of its properties.
Shannon’s fundamental work on entropy can be found in [SW49]. Further information on this topic is contained in Cover and Thomas’s monograph [CT91].
Definition and Characteristics
Shannon analyzed the process of communication. In his basic model, a source sends a message that was chosen out of a set Ω. This message gets encoded into a symbol by a transmitter and is sent over a defective channel. At the other end of the channel, the received symbol is decoded by the receiver and forwarded to the destination (see Figure 2.1).
We are only interested in the initial part of the transmission. Shannon wanted to know how much information is contained in the choice of a specific element of the source. Hartley [Har28] claimed that if an element is chosen uniformly out of a set Ωn of size |Ωn| = n, then the information of this choice is determined by I(Ωn) = log2(n) . (2.1)
Shannon saw a need to modify this result. He was searching for a measure that takes into account the probability of choosing an element of Ωn even if the corresponding distribution is not uniform. Let us assume that we have the sample space Ωn = {ω1, ω2, . . . , ωn} and that pi, 1 ≤ i ≤ n, is the probability of the occurrence of ωi. Of an appropriate measure
H(p1, p2, . . . , pn) of uncertainty we demand the following properties:
(H 1) H must be continuous in pi, for all 1 ≤ i ≤ n.
(H 2) If pi = n1 for all 1 ≤ i ≤ n, which means that occurrences of the elements of Ωn are uniformly distributed, then H should monotonically increase with n. This property is based on the characteristic that with uniformly distributed elements the uncertainty of choosing a particular element increases with n.
(H 3) Let us assume that we split the choice into two successive ones, i.e. instead of choosing element x directly out of Ωn = {ω1, ω2, . . . , ωn}, we first decide if x is ω1, ω2, . . . , ωn−2 or falls into the set {ωn−1, ωn} and in a second choice we determine if x = ωn−1 or x = ωn. Without loss of generality let p = pn−1 + pn > 0. Then, the original measure should be equal to the measure of the first choice plus the weighted measure of the second choice
H(p1, p2, . . . , pn) = H(p1, p2, . . . , pn−2, p) + p × H pn−1 , pn . (2.2)
We will illustrate this property by an example (see Figure 2.2). Let us assume that p1 = 12 , p2 = 13 , and p3 = 16 . In the left side of the figure we decide between the three elements at once. In the right side we first decide if we take ω1 or one of the other two elements. Each event has probability 12 . If we chose the second alternative, then we additionally have to decide between ω2 and ω3, according to the two new probabilities p′2 = 23 and p′3 = 13 . (H 3) then means that we demand the property H 12, 13, 16 = H 12, 12 + 12H 23, 13 .
The weight 12 is necessary because the second alternative is only taken with proba-bility 12 .
We then have the following theorem.
Theorem 2.1 ([SW49]) Let P = {pi, 1 ≤ i ≤ n} be the probability distribution on Ωn = {ω1, ω2, . . . , ωn }. The only function H satisfying the three assumptions (H 1), (H 2), and (H 3) above is of the form:
n 1
H = K pi log , (2.3)
i=1 pi
where K is a positive constant.
The result of Theorem 2.1 motivates the definition of entropy for discrete random variables.
Definition 2.2 (Entropy) Let us assume that X is a discrete random variable on the sample space Ωn = {ω1, ω2, . . . , ωn} with probability distribution P = {pi, 1 ≤ i ≤ n}. The entropy H(X) of X is defined by H(X) = i=1 pi log2 pi .
By convention, 0 log2 10 = 0, which is supported by the asymptotic behavior of the function f (x) = x log2 x1 , when x tends towards zero, lim x log2 x = 0 . x→0
Therefore, elements occurring with probability 0 have no effect on entropy.
For the definition of entropy we could have chosen any base of the logarithm. Since loga(x) = logb (x) , for every choice of bases a, b ≥ 2, a different base would change the value logb (a) of the entropy only by a constant factor, which conforms to (2.3). We will always use the logarithm in base 2, since this choice corresponds to the binary representation of data in a computer. In this case, the unit of entropy is one bit (binary digit). If we would have chosen the Euler’s number e as base, then the entropy would be measured in nats (natural units).
Shannon did not limit his investigations to the case of a single random variable. He also defined the entropy for sources that can be represented by an ergodic Markov chain. In Section 6.2.4 we will discuss this topic in greater detail. Here, we focus our discussion on the independent case. For a better understanding of the term entropy, we shall consider two different examples.
Example 2.1 In the first example we will discuss three different cases of random vari-ables, X1, X2, and X3. The three variables have the sample spaces Ω1 = {ω1, ω2, ω3, ω4}, Ω2 = {ω1, ω2, . . . , ω8}, and Ω3 = {ω1, ω2, ω3, ω4}, respectively. X1 and X2 are uniformly distributed, which means that p(1)i = 14 , 1 ≤ i ≤ 4 and p(2)i = 18 , 1 ≤ i ≤ 8, respectively, where p(ij) describes the probability of the event Xj = ωi. The probability distribution of the last variable X3 is given by p(3)1 = 12 , p(3)2 = 14 , and p(3)3 = p(3)4 = 18 . This implies the following values of entropy.
H(X1) = 2 ,
H(X2) = 3 .
Generally, if X is uniformly distributed on a sample space Ω, then H(X) = log2 |Ω|, which is conform to (2.1). Thus, Hartley’s notion of information is a special case of Shannon’s notion of entropy. Concerning X3, we have H(X3) = 4 .
Let us first compare the entropy of the variables X1 and X2. H(X2) is larger than H(X1), which corresponds to condition (H 2). The more elements are available, the larger is the measure of uncertainty and thus the value of entropy.
The variables X1 and X3 have the same sample space, but H(X1) is larger than H(X3). X1 is uniformly distributed, thus if someone wants to guess the outcome of X1, all elements are equally probable. With X3, the occurrence of ω1 is as probable as the occurrence of the other three elements together. If we guess ω1 as the outcome of X3 we would be right in half of the cases. Thus, one may state that the value of X1 is more “uncertain” than the value of X3.
Theorem 2.3 shows that for a fixed sample space the uniform distribution always has maximum entropy.
THEOREM 2.3 ([CT91]) Let X be a random number on the sample space ΩX and |ΩX | denotes the number of elements in the range of X. Then, H(X) ≤ log2 |ΩX |, with equality if and only if X has a uniform distribution over ΩX .
Remark 2.4 This characteristic of the uniform distribution is often used in relation with random number generators (RNGs, see Chapter 4). A number is denoted “real random”, if it was chosen according to a uniform distribution, which provides maximal entropy. Let X be the random variable describing the output of a RNG. From a good RNG we would expect H(X) to be as high as possible.
Example 2.2 (Minimal number of binary questions [CT91]) In this example we study the number of binary questions that are, on average, necessary to find out which element was produced by our information source. We will discuss the connection of this number to Shannon’s entropy.
Let us assume that the random variable X represents the source, such that Ω is the sample space and P = {p(x), x ∈ Ω} the probability distribution of X. A questioning strategy S specifies which binary questions we will ask to determine the chosen element x. E.g., if Ω = {1, 2, . . . , 8}, then such a question could be « Is x in the set {1, 3, 4, 7}? ». Such a strategy is equivalent to binary code. Each element x ∈ Ω is determined by a finite sequence of responses of yes = 1 or no = 0. Let EQ denote the expected number of questions in an optimal scheme. Considering the properties of a Huffman code we know that H(X)≤EQ≤H(X)+1.
This means that Shannon’s entropy is approximately equal to the minimal expected number of necessary binary questions to determine a certain element. Consequently, if an ele-ment was produced by an information source of high entropy, then an adversary needs, on average, more effort guessing the specific element.
Until now, we have only considered the case of a single random variable. If we have two random variables X, Y we can define the joint entropy H(X, Y ) as well as the conditional entropy H(X|Y ). The first value gives the entropy of X and Y together, whereas the second one defines the entropy of X if the value of Y is known.
Table of contents :
1 Introduction
I General Notions
2 Entropy
2.1 Definition and Characteristics
3 Stream Cipher
3.1 Classification
3.1.1 One-Time Pad (Vernam Cipher)
3.1.2 Synchronous Stream Cipher
3.1.3 Self-Synchronizing Stream Cipher
3.2 State of the Art
II Studies of the HAVEGE Random Number Generator
4 Random Number Generators
4.1 Tests of Randomness
5 HAVEGE
5.1 Optimization Techniques of the Processor
5.1.1 Functionality of HAVEGE
5.2 General Structure
5.2.1 HAVEG
5.2.2 HAVEGE
5.3 Empirical Results
5.4 Security
5.5 Conclusion
6 Empirical Tests
6.1 Test Setup
6.2 Test Results
6.2.1 Basic Data
6.2.2 Auto Correlation
6.2.3 Entropy
6.2.4 Entropy of a Markov Chain
6.3 Conclusion
III Study of a Specific Stream Cipher
7 The Dragon Stream Cipher
7.1 Introduction
7.1.1 Inner State
7.1.2 Key and IV Setup
7.1.3 State Update
7.1.4 Function F
7.2 Published Distinguisher on Dragon
7.2.1 Statistical Distinguisher
7.2.2 Linear Distinguisher
7.3 Some Properties of Dragon
7.3.1 Properties of G’s and H’s
7.3.2 Relations among Words
7.3.3 Bias of Different Equations
7.3.4 Bias of Equations
7.4 Conclusion
IV Random Functions
8 Characteristics of Random Functions
8.1 Approach to Analyze Random Functions
8.1.1 Use of Generating Function
8.1.2 Singularity Analysis
8.2 Known Properties
9 State Entropy of a Stream Cipher using a Random Function
9.1 Introduction
9.2 Estimation of Entropy
9.2.1 Previous Work
9.2.2 New Entropy Estimation
9.3 Collision Attacks
9.3.1 States after k Iterations
9.3.2 Including Intermediate States
9.3.3 Improvement with Distinguished Points
9.4 Conclusion
V FCSRs
10 Introduction to FCSRs
10.1 Characteristics of LFSRs
10.2 Characteristics of FCSRs
10.3 Applications of FCSRs
10.4 Extensions of FCSRs
10.4.1 d-FCSRs
10.4.2 AFSR
11 Entropy of the Inner State of an FCSR in Galois Setup
11.1 Introduction
11.2 Entropy after One Iteration
11.3 Final State Entropy
11.3.1 Some Technical Terms
11.3.2 Final Entropy Case by Case
11.3.3 Complexity of the Computation
11.4 Lower Bound of the Entropy
11.4.1 Basis of Induction
11.4.2 Induction Step
11.5 Bounds for the Sums
11.6 Conclusion
12 Parallel generation of ℓ-sequences
12.1 Introduction
12.2 Motivation
12.3 Sub-Sequences Generators and m-Sequences
12.4 Sub-Sequences Generators and ℓ-Sequences
12.5 Conclusion
Conclusion and Perspectives
Bibliography