Entropy in statistical physics
Here will be presented the main concepts explaining the nature of
entropy in physics. Prerequisites:
Physics and information
The understanding of entropy from information theory will suffice to
explain most physical properties of entropy in a first approach, but
it keeps some details unclear because the reality is a bit
different: the exact understanding will come from the framework of
quantum physics, replacing classical information by quantum
information. The
present introduction, expressed in terms of classical information,
may be criticized
for its lack of rigor or even its incoherence under careful
analysis; but what really matters is that the main intuitive ideas
here provided, indeed reflect well enough the situation in quantum
physics.
Quantum physics formalizes any list of N clearly distinct
states as N pairwise orthogonal unitary vectors in a Hilbert
space (vectors of an orthonormal basis there); the subspace they
generate represents all states "somewhere among them". Then,
evolution operates in the Hilbert space as a unitary transformation
(a concept similar to rotations), mapping these states to another
family of states with the same property of being "clearly distinct"
(orthogonal). Thus, any "pack of N possible states",
formalized as an N-dimensional Hilbert space (subspace of a
bigger Hilbert space, where we might choose a basis to represent N
clearly distinct states), cannot "evolve into a smaller pack", i.e.
a subspace with smaller dimension.
In the limit of classical physics, where a system is formalized by
some 2n-dimensional phase space, these "packs of N
possible states" correspond to regions of the phase space with
volume N.hn. The unit of volume hn
is provided by the Planck constant (h= 2πℏ). In the classical
limit, the preservation of the "size of packs" for isolated systems,
corresponds to the conservation of volumes in phase space expressed
by Liouville's theorem. In some long time
approximation for classical
systems, the effective region of phase space containing given
possibilities can expand by dilution (mixing with its outside), but
not shrink.
In particular, 2 clearly distinct states A and B of
an isolated microscopic system for a specific time, cannot reliably
evolve into the same state A' at a same later time: if A
is determined to evolve into A' then B cannot evolve
into A'. In other words, the evolution of any isolated
microscopic systems cannot erase any information in the sense of
multiplicity of possible states (regardless whether it is a really
definite information where only one possibility is realized, or a
persisting indetermination of a
state between multiple possibilities).
Thus, the physically possible evolutions of isolated microscopic
systems can
be understood as information-preserving operations (acting
bijectively), such as, algorithmically speaking, file compressions,
and operations that measure and restore the state of a microscopic
system to and from digital information (i.e. states of computer
memory, except that actually used processors involve heavier
physical processes which are irreversible).
Nature of entropy in physics
The entropy of a physical system, can be understood as the entropy
(indefiniteness) of information of what the exact microscopic state
of the system may be. Usually, this data mainly consists in that of
all positions and speeds (and sometimes spins) of the atoms
contained in the system. Of course, this information would usually
be impossible to determine, both because it would be very
unpractical to measure already for systems of one or few molecules,
and because usual amounts of entropy are much bigger than any amount
of data
usually considered in computer science : a few bytes per molecule,
multiplied
by the very huge number of molecules in a sytem (like the Avogadro
constant).
However the point is not the practicality of the measure, but the
fact that the entropy
of this information is what theoretically defines entropy in
physics.
So, entropy roughly measures (in logarithmic units) the size of
the main pack of possible elementary states where a system is
likely to be. Physical processes are reversible when they preserve
the information of this pack (distinguishing it from its outside),
thus when they preserve entropy. Large isolated systems may create
entropy by diluting this pack, mixing states in and out and
forgetting which is which. This explains the conservation of
entropy in fundamental laws, and its possible creation (but
non-elimination) in irreversible macroscopic processes. In
non-isolated systems, processes can somewhere seem to shrink the
pack of distinct microscopic states (the volume of phase space) by
evacuating their multiplicity to the environment (final states may
be identical here but distinct elsewhere).
Entropy was defined in information theory by the formula S =
∑i pi Si where Si=
-ln pi, assuming the choice of unit of entropy
expressed by the natural logarithm (where 1 bit = ln 2). In physics,
the usual convention assumes another unit, related to this one by
the Bolzmann constant k. In this convention, we should write
Si = -k.lnpi. But to
simplify, let us set k=1 in the present page.
Probabilistic evolution in isolated systems
Rigorously speaking, in quantum physics, isolated systems behave
reversibly (without creating entropy) while entropy creation is a
subtle emerging process, most clearly expressed in the case of
non-isolated systems. However, some cases of entropy creation, while
also involving non-isolated systems, can be approximately expressed
for an isolated (discrete) system, in terms of a classical
probabilistic evolution (instead of a deterministic one), acting by
mixing the probabilities of different states. This probabilistic
evolution is expressed by a square matrix of probabilities, with
positive coefficients mij, representing the
probabilistic states into which each elementary initial state would
evolve.
This transforms each probabilistic state (pi) into
the probabilistic state p'j = ∑i pi
mij.
Still, this process has a sort of non-shrinking property (that
will lead to the non-decrease of entropy), that looks like a time
symmetry in the property of
probabilities (not a real symmetry):
- Of course, for all i, ∑j mij
= 1 : from a given initial state i the sum of
probabilities of all possible final states j is 1.
- But also, for all j, ∑i mij
= 1 : for any final state j the sum of probabilities for
j to be reached by evolution from each possible initial
state i, is also 1.
This says that the mixing of probabilities by evolution must be
fair, so that the probabilistic state becomes closer to
equiprobability, never going away from it.
In its quantum form (the phenomenon of decoherence, or implicit
measurement), the idea is to measure the final state according to a
basis of orthogonal states so that the effect of evolution from the
initial list of states (whose probabilities defined the quantum
state) to the final list, is not a mere bijection: it is a more
general rotation with nontrivial angles.
Entropy creation in isolated systems
To verify the entropy creation, let us write the initial entropy as
S = ∑i,j mij
(-pi ln pi).
The concavity of (-x ln x) gives for all j, ∑i
mij (-pi ln pi)
≤ - p'j ln p'j.
We conclude S ≤ S'.
Another form of entropy creation in isolated systems comes from how
things behave in practice rather than how they ideally are in theory
: while the theory logically implies precise probabilities for the
final state, real systems usually do not come with integrated
super-computers giving the exact predictions of these probabilities
(and if we added one, it would be likely to produce more entropy in
its operations than the one it was meant
to avoid). Without precise predictions, we can only make gross
approximations of the probabilities, thus handle effectively
accessible probabilities with more entropy. The idea of such
approximations is expressed by the concept of received entropy,
discussed below.
Entropy creation by chaotic interactions in a many-bodies system
When two objects meet by chance (in a world with many objects),
interact then go apart, the sum of their entropies cannot decrease.
Indeed:
- As they meet by chance (without coordination between their
states), they were initially uncorrelated : S1+S2
= S
- Entropy cannot decrease on the way: S ≤ S'.
- Their interaction may make them correlated when going apart: S'
≤ S'1+S'2 ;
Thus, S1+S2 ≤ S'1+S'2
: the entropy of a large system seems appears to increase when
counted as the sum of entropies of subsystems.
The above inequalities present this increase as made of 2
contributions. But quantum physics does not distinguish them: the
above described entropy creation for an isolated system (S ≤
S')
requires a sort of measurement, i.e. interaction, to fully happen;
but an interaction is precisely the cause of the other contribution
: that we are not looking at the entropy of a fixed isolated system
(the global system of 2 interacting objects) before and after its
evolution (S = S'), but at that of non-isolated ones
(an object before and after interacting with another), and
separately counting and adding up the individual entropies S'1
+
S'2, forming a larger amount than the global
entropy S' because of the correlation. This is the only real
contribution to the entropy growth from S1+S2
to S'1+S'2 but it is as good as
the above two classically described contributions because quantum
physics lets the difference S'≤ S'1+S'2
a larger amplitude than with classical probabilities, letting it
somehow play both roles.
The creation of entropy is due to the fact that the initial lack of
entropy, carried by the correlation after interaction, becomes
ineffective, as the correlated objects go apart and have too little
chance to meet again in conditions that would let this correlation
observable. Instead, the next interacting objects will rather be
other uncorrelated pairs, while existing correlations become
progressively dispersed among manier and manier molecules that would
have to be compared all together, which is harder and harder to
decipher, and even impossible when some objects involved in a
correlation escape the analysis.
Moreover in quantum physics, the lack of entropy of the global
system cannot always be fully described by separately measuring the
states of components and analyzing them as a classical information;
only a physical recombination (treatment as unobserved quantum
information) might do it, but it is even harder to make.
Stable probabilistic states in a stable environment
The general stablility conditions for probabilistic states of an
object and its environment with which it interacts, depend on the
list of conserved quantities for isolated systems. A conserved
quantity defined as a function of the elementary state of any
system, is called extensive when its value on a system of
several objects is the sum of its values for each object; then it
can vary in a non-isolated system by transfer with the environment.
- The probability is a function of conserved quantities
- It is the exponential of an affine function of conserved
extensive quantities, when the rest of conserved quantities are
fixed; the linear part of this affine function
is independent of other conserved quantities.
Proof of 1. A probabilistic state is stable (not creating
entropy) when evolution
only mixes elementary states with the same probability. The
conservation of a quantity (function of elementary states) on an
isolated system, prevents its evolution from mixing
elementary states with different values of this quantity, so that
they can keep different probabilities in a stable way. A probability
function defined as a function of conserved quantities is also
stable for an isolated system; conversely, if it is stable then it
is itself a conserved quantity.
Proof of 2. Consider a system of 2 objects A and B,
which are uncorrelated (especially if they are far away from each
other) where a conserved quantity E takes values E1,E2
on two elementary states of A, and values E'1,E'2
on 2 elementary states of B (other conserved quantities
staying fixed) such that E2−E1=
E'2−E'1. According to 1., both
states (1,2) and (2,1)
having the same value of the conserved quantity E1+E'2=E2+E'1,
also have the same probability p1
p'2=p2p'1.
Thus, p2/p1=
p'2/p'1, i.e. p2/p1
only depends on E2−E1
throughout the environment, independently of the particular object
or states (with equal values of other conserved quantities). Thus,
(ln p) must be an affine function of conserved
quantities.
The most famous conserved quantities are energy and momentum, that
are components of a single 4-dimensional object (a linear form in
relativistic space-time). As ln p is an affine function
of energy-momentum, its direction (differential, or linear part) can
be identified as a time-like vector in space-time which defines "the
reference frame of the environment" (its average direction, for
example on Earth it is the ground's reference frame). When analyzed
in this frame, the probability appears independent of momentum, and
is thus a mere function of energy, independent of momentum (which no
more seems conserved as it can be freely exchanged with the
environment).
Other important conserved quantities are the numbers of atomic
nuclei of each type (as long as no nuclear reaction is happening).
The component of (ln p) along the conserved quantity
"number of hydrogen nuclei" defines the pH.
Free energy
Let T be a fixed temperature (namely, the temperature of the
environment).
Assuming each considered elementary state i to have a
definite energy Ei, let us define its free energy
as
Fi = Ei −
TSi = Ei + T.ln pi
Then, the Helmoltz free energy F in its standard macroscopic
definition, coincides with the average of these free energies over
all states:
F = E−TS = ∑i
piFi
where E is the average energy : E = ∑i
pi Ei.
Thermal equilibrium
With fixed Ei and variable pi,
the free energy F reaches its minimum when all Fi
are equal (thus Fi=F). The unique such
probabilistic state, called the Boltzmann distribution (or thermal
equilibrium) at temperature T, is defined by
pi = e(F−Ei)/T
where F = -T.ln ∑i e-Ei/T.
Proof.
As ∑i pidFi
= ∑i pi Tdpi/pi
= T(∑i dpi)
= 0, we get dF = ∑i Fidpi.
Thus the equilibrium condition (dF = 0 for all variations
of p) is that all Fi are equal.
When Fi > Fj and dpi
= -dpj > 0 while other variations of p
cancel (thus going
away from equilibrium because each Fi is an
increasing function of pi), we get dF>0,
thus the equilibrium is a minimum.
Then F = Fi = Ei + T
ln pi gives pi = e(F−Ei)/T,
and the value of F comes from ∑i
pi = 1.
According to the above description of stable environments (not
creating more entropy), any stable probabilistic state of an object
as well as its environment will follow the Boltzmann distribution of
some temperature in some reference frame (for which this state is
the one with minimal free energy), unless the conservation of
another quantity is at stakes (which might be ignored by looking at
the configuration space of an object taken with a fixed value of
that other conserved quantity, describing the case of an object that
is isolated for that quantity).
Entropy of correlated systems, again
Energies of elementary states contribute to the expression of F
as an arbitrary affine function of probabilities (the average energy
E), in addition to the contribution of S. Thus,
saying that, for any data of energies (thus any added affine
function), the equilibrium state is a minimum of F, means
that F is a convex function of probabilities, thus S
is a concave function. We already used this fact to show the
inequality about information
entropies
of correlated systems (S ≤ SA + SB).
Let us restate the argument in other words.
By choosing energy laws for A and B such that their
probability laws (considered separately) are the respective Bolzmann
distributions at a given temperature, the transition from the
correlated state (with entropy S) to the non-correlated one
(with entropy SA + SB)
preserves the energy but increases the entropy because it comes down
to thermal equilibrium.
Comment on the role of free energy
For example, between 2 states A and B with energies
E1 and E2 such that E2−E1
= T.ln(2), the state A is twice more probable than B.
But if we have another state B' with the same energy as B,
then the undetermined state (B or B' with same
probability) will be as probable as state A. So, the state (B
or B') where B and B' are equiprobable, has
more energy than A (the difference is T.ln(2)) but,
its entropy (when it is known to occur) is 1 bit = ln 2 while A
is a single state, whose entropy thus cancels when it is known to
occur. Thus, (B or B' with the same
probability) gets the same probability as A when these
alternatives are put in a larger list because they have the same
free energy (when seen separately, but then also when seen in the
list). The lists of elementary states contributing to the free
energy of a state may be seen as grouped by sub-lists, where each
sub-list plays the role of a single state
whose energy is given by the free energy of the sub-list seen
separately.
Evolution in an environment satisfying the Boltzmann
distribution
In an interaction as above, the total energy (equal to the sum of
energies) is conserved (E1+E2 =
E'1+E'2).
Thus, the sum of free energies for any fixed temperature T
cannot increase (F'1+F'2 ≤ F1+F2).
In
particular when the second object is a piece of environment
initially in
thermal equilibrium at T, its free energy F2
was minimal, thus it cannot decrease any further : F2
≤ F'2.
We conclude F'1 ≤ F1 :
interaction with an environment at thermal equilibrium with a
temperature T, can only decrease the free energy of an
object, thus bring it closer to the thermal equilibrium at this
temperature. If it was already there, then the union of both systems
is at thermal equilibrium too: Boltzmann's law is the stable
probability distribution when the system interacts with an
environment at temperature T.
Let us express the evolution of an object in interaction with its
environment, by a matrix as above (p'j = ∑i
pi mij where again, for all i,
∑j mij = 1).
If the object starts in thermal equilibrium (pi =
e(F−Ei)/T),
then it must stay there : for all j,
p'j = e(F−Ej)/T
= ∑i e(F−Ei)/T
mij.
We conclude ∑i mij e(Ej−Ei)/T
= 1, to be compared with the previous formula satisfied by evolution
matrices
of isolated systems (where T was absent as if we took the
limit of this one for an infinite T, but in fact for another
reason: the evolution in an isolated system must preserve energy, so
that the only possible mixtures were those between elementary states
with equal energy Ei = Ej, thus making
temperature irrelevant); a direct deduction of
the new formula from an application of the old one to the whole
system (object + environment), is left as an exercise to the reader.
Physical role of received entropy
Let us explain how the concept of "received entropy" that we introduced for
information theory, can make sense for physics, as a possible
form or interpretation of the entropy creation process.
Consider a system in a probabilistic state p, with an energy
function (Ei) (a priori unrelated: the system may
not be in any thermal equilibrium), a third (also unrelated)
function p'
(positive with sum 1) that can be interpreted as an "expected
probability" (no matter if its below use is a fruit of design or
happens by chance in nature), and an environment with temperature T.
Consider an adiabatic transformation of the system, modifying the
energies of elementary states while preserving their probabilites
(we use the same labels
for these states through their evolution) from Ei
to
E'i = K− T.ln p'i
where K is any constant energy (and conversely for any data
of the E'i there is a unique positive function p'
positive with sum 1 and a unique value of K satisfying these
formulas which are identical to the definition of the Boltzmann
distribution). Then, the system will rest in the environment, and
approach thermal equilibrium (minimal free energy) after some time,
in case it wasn't there at first.
Final value of the free energy : F' = -T.ln ∑i
exp(-E'i/T) = K
Actual average mechanical energy spent to reach it : W
= ∑i pi(E'i
−Ei) = ∑i pi(K
− T.ln p'i −Ei)
Effectively saved free energy in the process = K − W
= ∑i pi(T.ln p'i
+Ei) = E − TS'
where E = ∑i pi Ei
is the initial real average energy, and S' = -∑i
pi ln p'i is what we called the
"received entropy".
This scenario is the one that would preserve all existing free
energy (by not creating any entropy) if the system really was in the
probabilistic state p' (as the first step would already
reach thermal equilibrium), but fails otherwise.
In practice, systems evolve by transformations which are not
adiabatic but can be analyzed in terms where the final states into
which the initial elementary states evolve, are probabilistic
combinations, with respective free energies replacing the role of
energies, by the roles substitution commented above. Looking at a
non-isolated system, the minimal amount of entropy creation among
all possible initial probabilistic states may be nonzero as well.
Links on entropy
Next page : The simplest proof of the ideal
gas law
Table of contents : Foundations
of physics