Entropy in statistical physics

Here will be presented the main concepts explaining the nature of entropy in physics. Prerequisites:

Physics and information

The understanding of entropy from information theory will suffice to explain most physical properties of entropy in a first approach, but it keeps some details unclear because the reality is a bit different: the exact understanding will come from the framework of quantum physics, replacing classical information by quantum information. The present introduction, expressed in terms of classical information, may be criticized for its lack of rigor or even its incoherence under careful analysis; but what really matters is that the main intuitive ideas here provided, indeed reflect well enough the situation in quantum physics.
Quantum physics formalizes any list of N clearly distinct states as N pairwise orthogonal unitary vectors in a Hilbert space (vectors of an orthonormal basis there); the subspace they generate represents all states "somewhere among them". Then, evolution operates in the Hilbert space as a unitary transformation (a concept similar to rotations), mapping these states to another family of states with the same property of being "clearly distinct" (orthogonal). Thus, any "pack of N possible states", formalized as an N-dimensional Hilbert space (subspace of a bigger Hilbert space, where we might choose a basis to represent N clearly distinct states), cannot "evolve into a smaller pack", i.e. a subspace with smaller dimension.
In the limit of classical physics, where a system is formalized by some 2n-dimensional phase space, these "packs of N possible states" correspond to regions of the phase space with volume N.hⁿ. The unit of volume hⁿ is provided by the Planck constant (h= 2πℏ). In the classical limit, the preservation of the "size of packs" for isolated systems, corresponds to the conservation of volumes in phase space expressed by Liouville's theorem. In some long time approximation for classical systems, the effective region of phase space containing given possibilities can expand by dilution (mixing with its outside), but not shrink.

In particular, 2 clearly distinct states A and B of an isolated microscopic system for a specific time, cannot reliably evolve into the same state A' at a same later time: if A is determined to evolve into A' then B cannot evolve into A'. In other words, the evolution of any isolated microscopic systems cannot erase any information in the sense of multiplicity of possible states (regardless whether it is a really definite information where only one possibility is realized, or a persisting indetermination of a state between multiple possibilities). Thus, the physically possible evolutions of isolated microscopic systems can be understood as information-preserving operations (acting bijectively), such as, algorithmically speaking, file compressions, and operations that measure and restore the state of a microscopic system to and from digital information (i.e. states of computer memory, except that actually used processors involve heavier physical processes which are irreversible).

Nature of entropy in physics

The entropy of a physical system, can be understood as the entropy (indefiniteness) of information of what the exact microscopic state of the system may be. Usually, this data mainly consists in that of all positions and speeds (and sometimes spins) of the atoms contained in the system. Of course, this information would usually be impossible to determine, both because it would be very unpractical to measure already for systems of one or few molecules, and because usual amounts of entropy are much bigger than any amount of data usually considered in computer science : a few bytes per molecule, multiplied by the very huge number of molecules in a sytem (like the Avogadro constant). However the point is not the practicality of the measure, but the fact that the entropy of this information is what theoretically defines entropy in physics.

So, entropy roughly measures (in logarithmic units) the size of the main pack of possible elementary states where a system is likely to be. Physical processes are reversible when they preserve the information of this pack (distinguishing it from its outside), thus when they preserve entropy. Large isolated systems may create entropy by diluting this pack, mixing states in and out and forgetting which is which. This explains the conservation of entropy in fundamental laws, and its possible creation (but non-elimination) in irreversible macroscopic processes. In non-isolated systems, processes can somewhere seem to shrink the pack of distinct microscopic states (the volume of phase space) by evacuating their multiplicity to the environment (final states may be identical here but distinct elsewhere).

Entropy was defined in information theory by the formula S = ∑_ip_i S_i where S_i= -ln p_i, assuming the choice of unit of entropy expressed by the natural logarithm (where 1 bit = ln 2). In physics, the usual convention assumes another unit, related to this one by the Bolzmann constant k. In this convention, we should write S_i = -k.lnp_i. But to simplify, let us set k=1 in the present page.

Probabilistic evolution in isolated systems

Rigorously speaking, in quantum physics, isolated systems behave reversibly (without creating entropy) while entropy creation is a subtle emerging process, most clearly expressed in the case of non-isolated systems. However, some cases of entropy creation, while also involving non-isolated systems, can be approximately expressed for an isolated (discrete) system, in terms of a classical probabilistic evolution (instead of a deterministic one), acting by mixing the probabilities of different states. This probabilistic evolution is expressed by a square matrix of probabilities, with positive coefficients m_ij, representing the probabilistic states into which each elementary initial state would evolve. This transforms each probabilistic state (p_i) into the probabilistic state p'_j = ∑_i p_i m_ij.

Still, this process has a sort of non-shrinking property (that will lead to the non-decrease of entropy), that looks like a time symmetry in the property of probabilities (not a real symmetry):

Of course, for all i, ∑_j m_ij = 1 : from a given initial state i the sum of probabilities of all possible final states j is 1.
But also, for all j, ∑_i m_ij = 1 : for any final state j the sum of probabilities for j to be reached by evolution from each possible initial state i, is also 1.

This says that the mixing of probabilities by evolution must be fair, so that the probabilistic state becomes closer to equiprobability, never going away from it.

In its quantum form (the phenomenon of decoherence, or implicit measurement), the idea is to measure the final state according to a basis of orthogonal states so that the effect of evolution from the initial list of states (whose probabilities defined the quantum state) to the final list, is not a mere bijection: it is a more general rotation with nontrivial angles.

Entropy creation in isolated systems

To verify the entropy creation, let us write the initial entropy as

S = ∑_i,j m_ij (-p_i ln p_i).

The concavity of (-x ln x) gives for all j, ∑_i m_ij (-p_i ln p_i) ≤ - p'_j ln p'_j.
We conclude S ≤ S'.
Another form of entropy creation in isolated systems comes from how things behave in practice rather than how they ideally are in theory : while the theory logically implies precise probabilities for the final state, real systems usually do not come with integrated super-computers giving the exact predictions of these probabilities (and if we added one, it would be likely to produce more entropy in its operations than the one it was meant to avoid). Without precise predictions, we can only make gross approximations of the probabilities, thus handle effectively accessible probabilities with more entropy. The idea of such approximations is expressed by the concept of received entropy, discussed below.

Entropy creation by chaotic interactions in a many-bodies system

When two objects meet by chance (in a world with many objects), interact then go apart, the sum of their entropies cannot decrease. Indeed:

As they meet by chance (without coordination between their states), they were initially uncorrelated : S₁+S₂ = S
Entropy cannot decrease on the way: S ≤ S'.
Their interaction may make them correlated when going apart: S' ≤ S'₁+S'₂ ;

Thus, S₁+S₂ ≤ S'₁+S'₂ : the entropy of a large system seems appears to increase when counted as the sum of entropies of subsystems.
The above inequalities present this increase as made of 2 contributions. But quantum physics does not distinguish them: the above described entropy creation for an isolated system (S ≤ S') requires a sort of measurement, i.e. interaction, to fully happen; but an interaction is precisely the cause of the other contribution : that we are not looking at the entropy of a fixed isolated system (the global system of 2 interacting objects) before and after its evolution (S = S'), but at that of non-isolated ones (an object before and after interacting with another), and separately counting and adding up the individual entropies S'₁ + S'₂, forming a larger amount than the global entropy S' because of the correlation. This is the only real contribution to the entropy growth from S₁+S₂ to S'₁+S'₂ but it is as good as the above two classically described contributions because quantum physics lets the difference S'≤ S'₁+S'₂ a larger amplitude than with classical probabilities, letting it somehow play both roles.
The creation of entropy is due to the fact that the initial lack of entropy, carried by the correlation after interaction, becomes ineffective, as the correlated objects go apart and have too little chance to meet again in conditions that would let this correlation observable. Instead, the next interacting objects will rather be other uncorrelated pairs, while existing correlations become progressively dispersed among manier and manier molecules that would have to be compared all together, which is harder and harder to decipher, and even impossible when some objects involved in a correlation escape the analysis.
Moreover in quantum physics, the lack of entropy of the global system cannot always be fully described by separately measuring the states of components and analyzing them as a classical information; only a physical recombination (treatment as unobserved quantum information) might do it, but it is even harder to make.

Stable probabilistic states in a stable environment

The general stablility conditions for probabilistic states of an object and its environment with which it interacts, depend on the list of conserved quantities for isolated systems. A conserved quantity defined as a function of the elementary state of any system, is called extensive when its value on a system of several objects is the sum of its values for each object; then it can vary in a non-isolated system by transfer with the environment.

The probability is a function of conserved quantities
It is the exponential of an affine function of conserved extensive quantities, when the rest of conserved quantities are fixed; the linear part of this affine function is independent of other conserved quantities.

Proof of 1. A probabilistic state is stable (not creating entropy) when evolution only mixes elementary states with the same probability. The conservation of a quantity (function of elementary states) on an isolated system, prevents its evolution from mixing elementary states with different values of this quantity, so that they can keep different probabilities in a stable way. A probability function defined as a function of conserved quantities is also stable for an isolated system; conversely, if it is stable then it is itself a conserved quantity.

Proof of 2. Consider a system of 2 objects A and B, which are uncorrelated (especially if they are far away from each other) where a conserved quantity E takes values E₁,E₂ on two elementary states of A, and values E'₁,E'₂ on 2 elementary states of B (other conserved quantities staying fixed) such that E₂−E₁= E'₂−E'₁. According to 1., both states (1,2) and (2,1) having the same value of the conserved quantity E₁+E'₂=E₂+E'₁, also have the same probability p₁ p'₂=p₂p'₁. Thus, p₂/p₁= p'₂/p'₁, i.e. p₂/p₁ only depends on E₂−E₁ throughout the environment, independently of the particular object or states (with equal values of other conserved quantities). Thus, (ln p) must be an affine function of conserved quantities.

The most famous conserved quantities are energy and momentum, that are components of a single 4-dimensional object (a linear form in relativistic space-time). As ln p is an affine function of energy-momentum, its direction (differential, or linear part) can be identified as a time-like vector in space-time which defines "the reference frame of the environment" (its average direction, for example on Earth it is the ground's reference frame). When analyzed in this frame, the probability appears independent of momentum, and is thus a mere function of energy, independent of momentum (which no more seems conserved as it can be freely exchanged with the environment).
Other important conserved quantities are the numbers of atomic nuclei of each type (as long as no nuclear reaction is happening). The component of (ln p) along the conserved quantity "number of hydrogen nuclei" defines the pH.

Free energy

Let T be a fixed temperature (namely, the temperature of the environment).
Assuming each considered elementary state i to have a definite energy E_i, let us define its free energy as

F_i = E_i − TS_i = E_i + T.ln p_i

Then, the Helmoltz free energy F in its standard macroscopic definition, coincides with the average of these free energies over all states:

F = E−TS = ∑_ip_iF_i

where E is the average energy : E = ∑_ip_i E_i.

Thermal equilibrium

With fixed E_i and variable p_i, the free energy F reaches its minimum when all F_i are equal (thus F_i=F). The unique such probabilistic state, called the Boltzmann distribution (or thermal equilibrium) at temperature T, is defined by

p_i = e^(F−E_i)/T where F = -T.ln ∑_i e^-E_i/T.

Proof.

As ∑_i p_idF_i = ∑_i p_i Tdp_i/p_i = T(∑_i dp_i) = 0, we get dF = ∑_i F_idp_i.
Thus the equilibrium condition (dF = 0 for all variations of p) is that all F_i are equal.
When F_i > F_j and dp_i = -dp_j > 0 while other variations of p cancel (thus going away from equilibrium because each F_i is an increasing function of p_i), we get dF>0, thus the equilibrium is a minimum.
Then F = F_i = E_i + T ln p_i gives p_i = e^(F−E_i)/T, and the value of F comes from ∑_i p_i = 1.

According to the above description of stable environments (not creating more entropy), any stable probabilistic state of an object as well as its environment will follow the Boltzmann distribution of some temperature in some reference frame (for which this state is the one with minimal free energy), unless the conservation of another quantity is at stakes (which might be ignored by looking at the configuration space of an object taken with a fixed value of that other conserved quantity, describing the case of an object that is isolated for that quantity).

Entropy of correlated systems, again

Energies of elementary states contribute to the expression of F as an arbitrary affine function of probabilities (the average energy E), in addition to the contribution of S. Thus, saying that, for any data of energies (thus any added affine function), the equilibrium state is a minimum of F, means that F is a convex function of probabilities, thus S is a concave function. We already used this fact to show the inequality about information entropies of correlated systems (S ≤ S_A + S_B). Let us restate the argument in other words.
By choosing energy laws for A and B such that their probability laws (considered separately) are the respective Bolzmann distributions at a given temperature, the transition from the correlated state (with entropy S) to the non-correlated one (with entropy S_A + S_B) preserves the energy but increases the entropy because it comes down to thermal equilibrium.

Comment on the role of free energy

For example, between 2 states A and B with energies E₁ and E₂ such that E₂−E₁ = T.ln(2), the state A is twice more probable than B.
But if we have another state B' with the same energy as B, then the undetermined state (B or B' with same probability) will be as probable as state A. So, the state (B or B') where B and B' are equiprobable, has more energy than A (the difference is T.ln(2)) but, its entropy (when it is known to occur) is 1 bit = ln 2 while A is a single state, whose entropy thus cancels when it is known to occur. Thus, (B or B' with the same probability) gets the same probability as A when these alternatives are put in a larger list because they have the same free energy (when seen separately, but then also when seen in the list). The lists of elementary states contributing to the free energy of a state may be seen as grouped by sub-lists, where each sub-list plays the role of a single state whose energy is given by the free energy of the sub-list seen separately.

Evolution in an environment satisfying the Boltzmann distribution

In an interaction as above, the total energy (equal to the sum of energies) is conserved (E₁+E₂ = E'₁+E'₂).
Thus, the sum of free energies for any fixed temperature T cannot increase (F'₁+F'₂ ≤ F₁+F₂). In particular when the second object is a piece of environment initially in thermal equilibrium at T, its free energy F₂ was minimal, thus it cannot decrease any further : F₂ ≤ F'₂.
We conclude F'₁ ≤ F₁ : interaction with an environment at thermal equilibrium with a temperature T, can only decrease the free energy of an object, thus bring it closer to the thermal equilibrium at this temperature. If it was already there, then the union of both systems is at thermal equilibrium too: Boltzmann's law is the stable probability distribution when the system interacts with an environment at temperature T.
Let us express the evolution of an object in interaction with its environment, by a matrix as above (p'_j = ∑_i p_i m_ij where again, for all i, ∑_j m_ij = 1).
If the object starts in thermal equilibrium (p_i = e^(F−E_i)/T), then it must stay there : for all j,

p'_j = e^(F−E_j)/T = ∑_i e^(F−E_i)/T m_ij.

We conclude ∑_i m_ij e^{(E_j−E_i)/T} = 1, to be compared with the previous formula satisfied by evolution matrices of isolated systems (where T was absent as if we took the limit of this one for an infinite T, but in fact for another reason: the evolution in an isolated system must preserve energy, so that the only possible mixtures were those between elementary states with equal energy E_i = E_j, thus making temperature irrelevant); a direct deduction of the new formula from an application of the old one to the whole system (object + environment), is left as an exercise to the reader.

Physical role of received entropy

Let us explain how the concept of "received entropy" that we introduced for information theory, can make sense for physics, as a possible form or interpretation of the entropy creation process.

Consider a system in a probabilistic state p, with an energy function (E_i) (a priori unrelated: the system may not be in any thermal equilibrium), a third (also unrelated) function p' (positive with sum 1) that can be interpreted as an "expected probability" (no matter if its below use is a fruit of design or happens by chance in nature), and an environment with temperature T.
Consider an adiabatic transformation of the system, modifying the energies of elementary states while preserving their probabilites (we use the same labels for these states through their evolution) from E_i to E'_i = K− T.ln p'_i where K is any constant energy (and conversely for any data of the E'_i there is a unique positive function p' positive with sum 1 and a unique value of K satisfying these formulas which are identical to the definition of the Boltzmann distribution). Then, the system will rest in the environment, and approach thermal equilibrium (minimal free energy) after some time, in case it wasn't there at first.
Final value of the free energy : F' = -T.ln ∑_i exp(-E'_i/T) = K
Actual average mechanical energy spent to reach it : W = ∑_ip_i(E'_i −E_i) = ∑_ip_i(K − T.ln p'_i −E_i)
Effectively saved free energy in the process = K − W = ∑_ip_i(T.ln p'_i +E_i) = E − TS'
where E = ∑_ip_i E_i is the initial real average energy, and S' = -∑_i p_i ln p'_i is what we called the "received entropy".
This scenario is the one that would preserve all existing free energy (by not creating any entropy) if the system really was in the probabilistic state p' (as the first step would already reach thermal equilibrium), but fails otherwise.

In practice, systems evolve by transformations which are not adiabatic but can be analyzed in terms where the final states into which the initial elementary states evolve, are probabilistic combinations, with respective free energies replacing the role of energies, by the roles substitution commented above. Looking at a non-isolated system, the minimal amount of entropy creation among all possible initial probabilistic states may be nonzero as well.

Links on entropy

Next page : The simplest proof of the ideal gas law
Table of contents : Foundations of physics