Imagine the following storing method.

The states of successive systems are recorded as successive files in memory. Each possible state of an elementary system is encoded as "a file", that is a succession of bits, that is one in a given list of "possible files" (successions of bits), where "possible files" bijectively correspond to possible states of a system. For the sake of compression, these files will be just put aside each other, with neither any unused space between them nor any directory of files to distinguish their separation. To prevent ambiguities while the lengths of files may differ, the separation between files will be determined by the following method.

The state of the first given elementary system will be encoded by the first file (the first few bits) in memory, whose content is one in a list

The condition to avoid any risk of ambiguity, is that for any possible memory content, only one element

So, the size of the first file can be inferred from a given memory content, by reading it from its first bits until they are found to form a sequence identical to one of the files in

Re-interpreting all possible memory contents as the binary expansions of real numbers between 0 and 1, each possible state

∑_{i} 2^{-li}
≤ 1

Here we chose binary digits to represent things. Why not use digits in another base instead of 2 ? For example, a digit with 8 values gives as much information as 3 binary digits, so that the size of files is counted by packs of 3 = log

So, the "objective size" of a file with

∑_{i}* e*^{-Si}
≤ 1.

The quality of a compression is its ability to reduce this
quantity. How may it depend on the choice of encoding ?

We have ∑

In the particular case when all

Then the formula of the average size of the file, becomes the general definition of the

*S* = - ∑* _{i }p_{i}*
ln

We can think of

In particular for our above construction, -log

For this, let us fix the

d*S'*= - ∑* _{i}* (

In particular, for any binary encoding of a probabilistic state whose probabilities are not powers of 2, the average file size will always be higher than the entropy.

-ln *p*_{i,j} = (-ln *a*_{i})
+ (-ln *b*_{j})

This can be written completely formally, as ∑_{i,j} p_{i,j}
ln *p*_{i,j} = ∑_{i,j }*a*_{i}*b*_{j}
(ln *a*_{i}+ln *b*_{j})

= ∑_{i}*a*_{i }(ln *a*_{i})
(∑_{j}*b*_{j}) + (∑_{i}*a*_{i})(*∑*_{j}b_{j}
ln *b*_{j})

=∑_{i} a_{i} ln *a*_{i}+∑_{j}
b_{j} ln *b*_{j}

Thus, even the one bit margin of extra entropy (+ln(2)) in the above
compression method, can be almost eliminated (made to approach 0 per
file), by gathering a large number of files representing
uncorrelated systems, together as a big one: by looking for a
compression, not of the state of one system, but of a large
succession of independent systems. When treating the global system
as one like above, we can find a compression with average size less
than = ∑

=∑

(entropy of the global system + ln(2)) = (∑ entropies of individual systems) + ln(2)

Thus, the ln(2) margin comes only once for all instead of being repeated for each subsystem.In conclusion, by such methods, even when probabilities are not powers of 2, the entropy keeps essentially representing the "size of information in its most compressed form".

Let

*S* = - ∑* _{i,j} a_{i} b_{i,j}*
(ln

Entropy is a concave function of the probabilistic state (sum of concave functions -

∑_{i} a_{i} S_{Bi}
≤ *S*_{B} = -∑_{i,j} b_{j}
ln *b*_{j}

where We conclude

(*) I had to invent the expression "received entropy" as I did not find this quantity defined and named elsewhere.

Links on entropy

Next page: Entropy in statistical physics

Table of contents : Foundations of physics