Weak and strong laws of large numbers and the Monte Carlo method

Get Complete Project Material File(s) Now! »

Classical statistical mechanics and ensemble averages

Let us consider a fluid consisting of N classical atoms in a volume V. One microscopic state z(t) of the fluid is described by the 6N independent position rj and momentum p j coordinates of each atom. We wish to study the density of the fluid at a distance r from a given atom i, ri(r). In most experiments, the measurement of precise and instantaneous values ri(r, z(t)) is not possible, as it depends on the coordinates rj of all particles j. For example, the pressure that the fluid will exert on a piston is, as a matter of fact, time-averaged by the inertia of the latter. It is then considered that the result of a measurement is the time average r¯i(r) of ri(r). Each measurement of a macroscopic observable at a time t0 is considered to take actually a certain interval of time t to be realized. During this interval, z(t) changes according to Newton’s laws of motion and ri(r, z(t)) with it. The time average, 1 Z t0+t ri(r, z(t0))dt0, (1.10) t t0 is constant, if it is independent of the initial condition, i.e. t0, and the length of the interval, i.e. t, given it is long enough. We consider this to be the case and will discuss this issue later in Section 1.3.1. One can argue that the macroscopic interval of time t for the measurement is extremely large from the microscopic point of view and one may take the limit t ! ¥, ¯ ( ) = lim 1 Z t0+t dt0 r ( , t0 ). (1.11) ri r t!¥ t t0 i r
As Eq. 1.11 does not depend on the initial condition, it is possible to rewrite it as an average over a collection of initial conditions, 1 t0 +t dt0ri(r, t0) åinitial conditions limt!¥ t0 r¯i(r) = t . (1.12) Total number of initial conditions
Experimentally, Eq. 1.12 can be realized by observing a large number of copies of the atomic fluid (Gibbs’s approach [44]) or by observing only one and the same copy of the system on successive intervals, given the intervals are long enough so they are independent from one another (time approach). Following [11, p.16], we consider the limiting case of averaging over the set W of all possible initial conditions z(0), åinitial conditions f (z(0)) ! ZW dz(0)p(z(0)) f (z(0)), (1.13) Total number of initial conditions
with f an arbitrary function of the initial coordinates z(0) = fri(0), pi(0) g. p(z(0)) is the probability to start the averaging at z(0). For instance, if we realize the experi-ment of measuring one copy of the system at intervals, then the starting point of the measuring interval can be at any time and the initial conditions can be any state the system will visit. Therefore p(z(0)) = p(z). In the case where the system is ergodic, as previously assumed by the independence from the initial conditions, the system visits during its evolution all states z of the phase space G for a given energy. The probability weight p(z) can then be understood as the probability to find the system in the state z during its evolution. Using the equivalence of Eq. 1.13, the time average Eq. 1.11 is now equivalent to the ensemble average, r¯i(r) = hri(r)i = dzp(z)ri(r, z), (1.14) where h i identifies with Eq. 1.9.
Eq. 1.14 is the main version of the ergodic hypothesis. r¯i(r) describes then the equi-librium, without depending on the initial state. As the initial conditions do not matter, only the quantities that are conserved do, hence the link between mesoscopic descrip-tion and conservation laws. Section 1.3.1 will address with more care the ergodic hypothesis. The systems studied later in Chapter 4 are ergodic, but not all systems are.
If one wishes to compute the average of a function, it is possible to either use the time average, as in molecular dynamics [43] or the ensemble average, as in the Monte Carlo method, see Chapter 2. Both approaches are equivalent, as shown by Eq. 1.14, and have been extremely versatile. For particle systems, the molecular dynamics solves the classical equations of motion and computes the trajectories of the particles from collision to collision with other particles. The Monte Carlo method builds a stochastic process that evolves on the configuration space and samples thus in a simpler way the different configurations needed for the ensemble average.
If time and ensemble averages are equivalent, given the system is ergodic, dynam-ical properties are still controlled by Newton’s equations of motion. For a closed and isolated system2, the microscopic deterministic motion is represented as a trajectory in phase space G. After a time t, each point z 2 G is mapped in a unique and determined way into another point zt = f(z, t) 2 G. The initial probability density evolves in phase space G according to Liouville’s equation. The deterministic trajectory generated this way is different from the probabilistic trajectories generated in stochastic processes. However, we are interested here on a coarse-grained picture and not on the full atom-istic description. For such mesoscopic scale, only a subset of the degrees of freedom matters. The microscopic variables that evolve on a much faster time scale act then as a heat bath, like it was assumed for Eq. 1.11. It then induces stochastic transitions among the relevant and slower mesoscopic variables. In the case of a separation of time scales, the Liouville equation that controls the temporal evolution of the initial distribution over G can be reduced to a Markovian master equation, Eq. 1.56, and the stochastic process is then Markovian, see Section 1.2.
The stochastic nature does not come from setting a probability density P(x) as the initial state. For instance, the irregular motion of a Brownian particle in a bath cannot be linked to a probability distribution of some initial state. This irregularity is actually the work of the bath molecules in the vicinity and is linked to all the microscopic vari-ables describing the total system that have been eliminated in order to get an equation for only the Brownian particle. It is the elimination of the bath variables that allows one to establish the stochastic process of Brownian motion [40, p. 55-57].
2No energy or particle exchange

Stochastic processes

In Section 1.1.2, the equivalence between time and ensemble averages was presented. It led to the mesoscopic stochastic description, that contains both the deterministic laws and the fluctuations. Such a description consists of eliminating a large number of microscopic variables, so that a small set of mesoscopic variables obeys approximately an independent set of deterministic equations. The deterministic nature of those equations, as the hydrodynamics equations for instance, is indeed only approximated, as the now hidden microscopic variables introduce fluctuation terms [38]. The fluctua-tions, acting like a heat bath, change the mesoscopic variables into stochastic processes. Following [40], a stochastic process Y = (Xt1 , Xt2 , …Xtn ) is an ordered collection of a stochastic (or random) variable X, e.g. the ensemble of positions of one atom at successive times. The random variable X is defined by a set of possible values W, called the configuration space, and by a probability distribution (or density) P(x) over this set, with P(x) 0 P(x)dx = 1. W
The probability that X has a value between x and x + dx is P(x)dx. From Eq. 1.15, it is possible to define the average of any function f , defined on the same configuration space W, as in Eq. 1.14, h f i = f (x)P(x)dx. (1.16)
For multivariate distributions X = (X1, X2, …, Xn), as for the atomic fluid with the whole set of position and momentum coordinates of all atoms, the probability distri-bution P(x) can be understood as the joint probability distribution of the n variables X1, …, Xn. A marginal distribution P(xi1 , …, xis ) on a subset of s < n variables Xi1 , …, Xis , may be defined as
P(xi1 , …, xis ) = Z P(x1, …, xn) j Õi1,…,is dxj. (1.17)
For instance, in the atomic fluid, the marginal distribution of the position coordinates is
P(r1 , …, rN ) = P(r1, …, rN , p1, …, pN ) Õdpj. (1.18) Z j=0
The average of the density function ri(r) is then h i ZW j=0
ri(r) = ri(r)P(r1, …, rN ) Õdrj. (1.19)
One can define also the mth moment of X, hXmi = ZW xm P(x)dx, (1.20) and the variance or dispersion var(X) = s2 , var(X) = sX2 = hX2i h Xi2, (1.21) that will be useful in discussing errors in MonteRCarlo Method in Section 2.3. The mth-moment of X exists if and only if the integral W jxmjP(x)dx converges. If the mth-moment does not exist, neither does every nth-moment, n m. If the mth-moment does exist, so does every nth-moment, n m.
The joint distribution of a set of s variables, while attributing fixed values to the re-maining (r s) variables, is called the conditional probability distribution of Xi1 , …, Xis , conditional on Xj1 , …, Xjr s having the prescribed values xj1 , …, xjr s and denoted as P(xi1 , …, xis jxj1 , …, xjr s ). Following the definition of the marginal and the conditional probability distribution, Bayes’ rule is then expressed by,
P(x1, …, xr) = P(xj1 , …, xjr s )P(xi1 , …, xis jxj1 , …, xjr s ) (1.22)
or
P(xi1 , …, xis jxj1 , …, xjr s ) = P(x1, …, xr)
P(xj1 , …, xjr s ). (1.23)
It leads to the concept of statistical independence of two sets. If the set of r variables can be subdivided in two sets (Xi1 , …, Xis ) and (Xj1 , …, Xjr s ), so that Pr factorizes, P(x1, …, xr) = Ps(xi1 , …, xis )P(xj1 , …, xjr s ), (1.24) then the two sets are called statistically independent and the factor Ps (respectively Pr s) identifies with the conditional probability density Psjr s (respectively Pr sjs). Ba-sically, this means that fixing the values of one set brings no extra information for the possible values in the other set, as they do not affect each other. This will be particularly useful to devise the factorized Metropolis filter that leads to rejection-free irreversible Markov chains, as discussed in Section 3.2.
As in Eq. 1.20, one can define the moments of a multivariate distribution, hXm1 ..Xmr i = xm1 …xmr P(x1, …, xr)dx1…dxr. (1.25) 1 r W 1 r
We are interested in stochastic processes Y = fXt1 , Xt2 , . . . , Xtn g that are ordered according to the time t, as illustrated in Fig. 1.1.3. A realization of the process y(t) is an ordered collection fxtg, where x t are successive realizations of X. In statistical mechanics, the stochastic process Y itself is understood as the ensemble of these re-alizations y(t), like an ensemble of a large number of copies of an experiment. The ensemble average h i is
hY(ti)i = Z y(ti)P(y)dy (1.26)
hY(ti)i = Z W xP1(x, ti)dx.
Here the functional P is the probability distribution of y, P(y)dy = 1. It can be rewritten as the joint probability distribution Pn of observing a certain realization fxt1 , xt2 , . . . , xtn g as R
Pn(x1, t1; x2, t2; . . . ; xn, tn) = hd(Y(t1) x1)d(Y(t2) x2) . . . d(Y(tn) xn)i. (1.27)
A stochastic process is completely defined by the complete set of Pn, as shown by the Kolmogorov existence theorem [45, 46]. Also Pn does not change on interchanging two pairs (xk, tk) and (xl, tl ). Considering the atomic fluid, the ensemble average of the density hri(r)i is obtained by considering all the possible realizations ri(r, t), leading to Eq. 1.14.
The nth moment, from n values t1, …, tn of the time t, can also be defined, hY(t1)…Y(tn)i = W x1 . . . xn Pn(x1, t1; . . . ; xn, tn)dx1 . . . dxn. (1.28) t2j > texp)
The normalized autocorrelation CY function will be particularly useful,
CY(t1, t2) = hY(t1) h Y(t1)iihY(t2) h Y (t2)ii = hY t1 p 2 i h 1 ih 2 i.
var(Y(t1))var(Y(t2))
( )Y(t ) Y(t ) Y(t ) (1.29)
var(Y(t1))var(Y(t2))
From Eq. 1.28, one can define the stationary property. The statistical property of a stationary process does not change with time, i.e. all the nth moments of a stochastic process are not affected by any shift t in time, hY(t1 + t)…Y(tn + t)i = hY(t1)…Y(tn)i. (1.30)
A stationary stochastic process has therefore a time-independent average and its au-tocorrelation function CY(t1, t2) of Eq. 1.29 depends only on the difference of time jt1 t2j, in particular CY(t1, t2) = C(t2, t1),
C (t) = Y hY2i h Yi2
This leads to the definition of an autocorrelation time texp, so that CY(jt1 is negligible, see Section 1.3.2.4 and Section 2.3.3. In practice, stationary processes are only an approximation; they arise by considering analytically the right time scale, so that the evolution of the microscopic variables is averaged, whereas the macroscopic variation is still negligible.
We discussed how the equivalence of time and ensemble average allows for a refor-mulation of the problem in terms of a stochastic process. From the moment the system exhibits a self-averaging property, which is the case in particular for a stationary sys-tem, the irregularly varying function can be cut into a collection of long time intervals. These intervals are an ensemble of realizations of the stochastic process. It is necessary that there is no influence of an interval on the next one. If this is the case, the process is called ergodic, see Section 1.3.1.

READ African Countries and Financial Development

Markov processes and master equation

Before discussing Markov processes developed within Markov-chain Monte Carlo methods in Section 2.2, we address more generally in this Section the Markovian de-scription of physical systems. As seen in Section 1.1.2, if the separation of scales be-tween microscopic and macroscopic variables is good enough, the underlying stochas-tic process has a memory of only its immediate past and is said to be Markovian. Experimentally, the Markov property is only approximate. For a Brownian particle for instance, if the previous displacement was a large one, the probability that the current displacement is also large is slightly higher. The autocorrelation time of the velocity, even if small, is not strictly zero and a large displacement is more likely just after a large one. But, as the experimental sampling of positions is made on a coarse-grained time scale longer than the autocorrelation time of the velocity, see Section 1.1.2, the process appears Markovian.
The Markovian nature depends on the number of variables used to describe the process. For example, if a stochastic process described by r variables is Markovian, it is not necessary that the stochastic process constituted by a subset of s variables from the previous ones is Markovian. The information brought by the s variables may not be sufficient to predict the future configuration, not even the configuration restricted to these s variables.
However, if a process is not Markovian, it is sometimes possible to extend the configuration space by additional variables and create a Markovian process. These additional variables describe explicitly the information that was previously implicit and hidden in the past values of the variables. This extension of the configuration space will be one of the key elements of the rejection-free Monte Carlo scheme, see Section 3.1.
As discussed in Section 1.1.2, the goal is to find a small set of variables preserv-ing the Markovian nature. Experimentally, it is possible to do so for most many-body systems, but it remains an approximate description restricted to a macroscopic, coarse-grained level. This reduction is usually called contraction or projection and the justifica-tion of this approximation is still a debate at the most fundamental level of statistical mechanics.

General properties of a Markov process

Markov processes (or Markov chains) exhibit the Markov property: The conditional probability distribution of future states depends only upon the present state. For any set of n successive times (i.e. t1 < t2 < … < tn), P(xn, tnjx1, t1; x2, t2; …; xn 1, tn 1) = P(xn, tnjxn 1, tn 1). (1.32)
A Markov process is fully determined by the initial probability distribution P(x1, t1) and the transition probability P(x2, t2jx1, t1) = p((x1, t1) ! (x2, t2)). Using only these functions, the complete hierarchy can be retrieved, P3(x1, t1; x2, t2; x3, t3) = P1(x1, t1)P(x2, t2jx1, t1)P(x3, t3jx2, t2). (1.33)
Eq. 1.33 written for two times t1 and t2 is simply the Bayes rule, Eq. 1.23, P2(x1, t1; x2, t2) = P1(x1, t1)P(x2, t2jx1, t1). (1.34)
Integrating Eq. 1.34 over x2 and dividing by P1(x1, t1) leads to the conservation condi-tion for all x1, t1, t2, 1 = P(x2, t2jx1, t1)dx2 (1.35)
Equivalent to,
P1(x1, t1) = Z P1(x1, t1)P(x2, t2jx1, t1)dx2 (1.36)
Integrating now Eq. 1.34 over x1 leads to the necessary condition for all x2, t1, t2, P1(x2, t2) = Z P(x2, t2jx1, t1)P1(x1, t1)dx1. (1.37)
Eq. 1.37 is known as the global-balance condition in the Markov-chain Monte Carlo method, in Chapter 2, Section 2.2. By analogy with the mass flow in hydrodynamics, the instantaneous probability flow fx1!x2 (t1, t2) between two configurations is defined as fx1!x2 (t1, t2) = P1(x1, t1)P(x2, t2jx1, t1). (1.38)
As for the mass flow, the probability flow is composed of a velocity term P1(x2, t2jx1, t1) and a mass term P1(x1, t1) and obeys a continuity condition, Eq. 1.36. Following this analogy, the global-balance condition Eq. 1.37 and the conservation condition Eq. 1.36, when combined, yield the incompressibility of the probability flows, å P1(x0, t1)P(x, t2jx0, t1) = å P1(x, t1)P(x0, t2jx, t1) for all x, x0 in W. x0 2W x02W
x All flows out x}
All flows into configuration of configuration (1.39) å fx!x0 (t1, t2) = å fx0!x(t1, t2) x02W x02W
Finally, a relationship between the transition probabilities can be obtained by inte-grating Eq. 1.33 over x2. Considering that the initial probability distribution is arbitrary, we obtain, for any t1 < t2 < t3,
P1(x3, t3jx1, t1) = P(x3, t3jx2, t2)P(x2, t2jx1, t1)dx2, (1.40)
Eq. 1.40 is known as the Chapman-Kolmogorov equation. Any two nonnegative func-tion P(x1, t1) and P(x2, t 2jx1, t1) following the set of equations (Eq. 1.37, Eq. 1.40) or equivalently the set (Eq. 1.37, Eq. 1.35) define uniquely a Markov process. Even if a Markov process is specified only by P1 and P2, one needs however all the Pn to demon-strate the Markovian nature of a process.

Stationary and homogeneous Markov processes

Markov processes that are stationary, as defined in Eq. 1.30, are of special interest, in particular for describing a system in equilibrium and the fluctuations within. Consider a system described by a Markov process Y(t). When the system reaches equilibrium, Y(t) becomes stationary. In particular, P1(x, t) = p(x) is independent of time and is the equilibrium distribution of the quantity Y, as described by statistical mechanics, Section 1.1.1. For simplicity, we consider here discrete-time Markov chains. The continuous-time Markov chains will be presented in Section 1.2.3.

Transfer Matrix

The transition probabilities of a stationary Markov process only depend on the time difference. Once the initial probability distribution is known, the Markov chain is then completely characterized by a stochastic matrix T, called the transfer matrix. The matrix T has for elements all the transition probabilities,
P(Xt+1 = x0jXt = x) = T(x, x0) (1.41)
P(Xt+t = x0jXt = x) = Tt (x, x0),
with (x, x0) 2 W2 and t = t2 t1. The element Tt (x1, x2) is the probability that a chain on a state x reaches x0 in t steps. The matrix Tt is simply the t-fold matrix product of T. We can also define the stationary probability flow between two states x, x0,
fx!x0 = p(x)T(x, x0) (1.42)
fxt!x0 = p(x)Tt (x, x0).
The Chapman-Kolmogorov equation Eq. 1.40 becomes a simple matrix product, with t, t0 > 0,
Tt+t0 = Tt Tt0 . (1.43)
Eq. 1.41 is not restricted to positive t [40]. As the probability P(Xt+t = x0, Xt = x) is symmetric, i.e. P(Xt+t = x0, Xt = x) = P(Xt = x, Xt+t = x0), from Eq. 1.41, we can write,
p(x)Tt (x, x0) = p(x0)T t (x0, x), (1.44)
fxt!x0 = fx0!tx.
Eq. 1.44 applies for any stationary Markov process, among those, physical systems at equilibrium, without additional assumption. It is simply the time reversal.
For a stationary process, the conservation condition and the global-balance condi-tion can be simplified as, Conservation condition: Z T(x, x0)dx0 = 1 for all x in W, (1.45)
Global-balance condition: Z p(x0)T(x0, x)dx0 = p(x) for all x in W. (1.46)
From Eq. 1.45, the matrix T has a right-eigenvector 1 of eigenvalue 1 and, from Eq. 1.46, T has the stationary distribution p as a left-eigenvector of eigenvalue 1. The spectrum of the transfer matrix contains important information about the properties of conver-gence and relaxation of the Markov chain, as will be discussed further in Section 1.3.2.
The global balance in Eq. 1.46 is automatically fulfilled, when the stronger condition of the detailed balance Eq. 1.47, also called reversibility condition, is satisfied, Detailed-balance condition: p(x)T(x, x0) = p(x0)T(x0, x) for all x and x0 in W. (1.47) | {zx } | }
Flow !x0 Flow {zx0!x
Note that Eq. 1.44 is not the detailed-balance condition, which has +t on the right-hand side. Enforcing the detailed-balance condition requires an additional phys-ical justification. Eq. 1.47 is equivalent to the joint probability being symmetric, i.e. P2(Xt 1 = x, Xt = x0) = P2(Xt 1 = x0, Xt = x). Hence the name reversible Markov chain is used for any chain that obeys detailed balance. From now on, we only consider t > 0 to avoid confusion between time reversal and time reversibility.

Extraction of a subensemble

Stationary Markov processes will prove particularly useful in Chapter 2, regarding the extraction of a particular subensemble. A subensemble of a stochastic process Y is the subset of all realizations y(t) that obey a given condition. For instance, they all take the value x0 at time t0. This defines a new, non-stationary Markov process Y for t > t 0,
P1 (x1, t1) = Tt1 t0 (x0 , x1)
P (x2, t2jx1, t1) = Tt2 (1.48)
t1 (x1, x2).
More generally, one could extract a subensemble where at a given time t0 the values of the realization y(t) are distributed according to a given probability distribution p(x0),
P1 (x1, t1) = dx0 p(x0)Tt1 t0 (x0, x1) (1.49)
P (x2, t2jx1, t1) = Tt2 t1 (x1, x2).
Although they are extracted from a stationary process, these processes are non-stationary, given p 6= p, as a time t0 is singled out. The probability flows near t0 are not the stationary ones. However, the transition probabilities still depend only on the time interval alone, as it is the same as the underlying stationary process. Such non-stationary Markov processes whose transition probabilities only depend on time difference are called homogeneous processes or Markov process with stationary tran-sition probability.
For a stationary process, extracting such subensemble physically means that one prepares the system in a certain non-equilibrium state, x0 and expects that after a long time the system returns to equilibrium, i.e., P (x , t ) ¥ p(x ). (1.50)
Moreover, as any initial state x0, or more generally any initial probability distribution p(x0), is arbitrary, it is expected that, Tt1 t0 (x0, x1!) t1 ! ¥ p(x1). (1.51)
Quantifying the asymptotic behavior in Eq. 1.50 and Eq. 1.51 is a major problem in dealing with Markov processes and is discussed in Section 1.3. The convergence of Markov processes will be at the heart of the Monte Carlo methods, see Chapter 2 and Chapter 3, that relies on Markov processes to simulate ensemble average, as in Eq. 1.14.

Master equation

The Chapman-Kolmogorov equation Eq. 1.43 can be rewritten as the master equa-tion, in the limit of vanishing time difference t0, i.e. continuous time, as a differential equation. The master equation has the advantage of being more convenient for math-ematical operations but, most of all, it has a more direct physical interpretation.

Table of contents :

1 Markov processes
1.1 Fluctuations and statistical ensembles
1.1.1 Maxwell-Boltzmann distribution
1.1.2 Classical statistical mechanics and ensemble averages
1.1.3 Stochastic processes
1.2 Markov processes and master equation
1.2.1 General properties of a Markov process
1.2.2 Stationary and homogeneous Markov processes
1.2.3 Master equation
1.3 Convergence in Markov processes
1.3.1 Ergodic and mixing properties
1.3.2 Mixing in Markov chains
2 Monte Carlo Method
2.1 Direct sampling for high-dimensional integration
2.1.1 Weak and strong laws of large numbers and the Monte Carlo method
2.1.2 Inversion sampling
2.1.3 Acceptance-rejection method
2.1.4 Importance sampling
2.2 Markov-chain Monte Carlo method
2.2.1 Principles of the Markov-chain Monte Carlo method
2.2.2 Metropolis algorithm
2.2.3 A priori probabilities
2.2.4 Heat-bath algorithm
2.3 Convergence and error
2.3.1 Initialization bias
2.3.2 Uncorrelated samples
2.3.3 Correlated samples and autocorrelation times
2.3.4 Scaling of autocorrelation times around a phase transition
2.4 Cluster algorithms
2.4.1 Fortuin-Kasteleyn transformation
2.4.2 Swendsen-Wang algorithm
2.4.3 Wolff algorithm
2.4.4 Cluster algorithms for spin glasses
2.5 Non-local algorithms
2.5.1 Geometric cluster algorithm
2.5.2 Jaster algorithm
3 Irreversible factorized Metropolis algorithm
3.1 The Lifting Framework
3.1.1 Lifting for the unidimensional random walk
3.1.2 General case
3.1.3 Lifting for two particles
3.2 The Factorized Metropolis filter
3.2.1 Definition of the Factorized Metropolis filter
3.2.2 Implementation
3.2.3 The factorized Metropolis filter in the lifting framework, beyond replicas
3.3 Continuous-time scheme
3.3.1 Maximal global-balance by infinitesimal steps
3.3.2 Event-driven approach: Event sampling
3.3.3 Observable averaging
3.4 Infinite chains
4 Applications of the irreversible factorized Metropolis algorithm
4.1 Soft-sphere systems
4.1.1 Melting in soft spheres
4.1.2 Performance
4.2 Pressure computation
4.3 Continuous classical spin systems
4.3.1 Planar rotator spin systems
4.3.2 Heisenberg spin systems on a three-dimensional lattice
General conclusion
Bibliography