Entropy, information, and mixed states

Off-topic: Internet Explorer users are urged to instantly install this patch from Microsoft (click) fixing a security issue recently found in Europe.

People around physics - and increasingly often, many physicists themselves - are getting confused about ever more elementary concepts in physics.

A few years ago, it became obvious that most people in the broader physics community are inherently incapable to understand string theory and why it is inevitably the only mathematically possible unifying theory of gravity and quantum mechanics. Well, it's a difficult subject and a PhD - not even a PhD in theoretical physics - can't guarantee that you will be able to master it.

String theorist Anne Hathaway

But things have been getting worse recently. Today, even fundamental pillars of quantum mechanics, relativity, and statistical physics are controversial once again. They're controversial because people - and often people with physics PhDs - just don't understand them. Or to say the least, they don't understand them well. It has become very fashionable to do things that would be inseparably connected with the unquestionable crackpots just a few years ago.

A few years ago, everyone would agree they were silly and physically impossible - but it's obvious today that many people would only agree because everyone else did, not because they actually understood the issues.

History of entropy

One of these raped theoretical constructs is the concept of entropy. While this concept became popular and important in various disciplines at various later times, all of its universal logical and physical properties have been fully understood since the late 19th century, i.e. for more than 100 years.

The history of the notion may still be divided to the thermodynamic and statistical era.

The thermodynamic era, covering approximately the first half of the 19th century, only cared about the macroscopic, directly observable features of the physical phenomena that have something to do with entropy. The statistical era, covering approximately the second half of the 19th century, was concerned with the explanation of these facts using more elementary, microscopic building blocks. In this era, thermodynamics - a "principled" theory that was only constraining the macroscopic phenomena - was suddenly understood in terms of statistical physics - a "constructive" theory that creates a model of the world in terms of atoms (or other basic degrees of freedom).

This separation of physical theories to "principled" (based on general principles that should always apply) and "constructive" theories (building upon very specific elementary laws) is due to Einstein: he would include relativity among the "principled" theories.

In 1824, Sadi Carnot would notice that the heat was going from warmer objects to the colder ones and there was a degree of irreversibility that you couldn't get rid of. In 1850, Rudolf Clausius formulated the second law of thermodynamics - that the entropy can never decrease - essentially in its modern form.

You must realize that already in 1845, Joule has understood the equivalence of heat and energy. At those early industrial times, people would play with heat engines all the time and they probably understood them more than most of us do today.

However, in 1872, Ludwig Boltzmann would explain all those insights in terms of statistics applied to atoms. He has also derived the second law of thermodynamics from the constructive approach: his proven version of the second law is known as the H-theorem. Of course, other people were important in thermodynamics and statistical physics, too: Maxwell, Gibbs, and many others.

Macroscopic systems at nonzero temperature are inevitably described by classical physics pretty well - so thermodynamics is inevitably classical, in a sense. The microscopic explanations don't have to be classical but Boltzmann's proof was only created in the framework of classical (non-quantum) physics, of course. However, when quantum mechanics was born in the 1920s, all these old proofs and concepts were instantly promoted to the quantum language. There was nothing hard about it at all and I will show why.

In fact, Ludwig Boltzmann was already "thinking" in the quantum fashion and he could predict some changes that the quantum revolution would later bring. For example, he kind of knew that there should be a natural "unit of phase space" (a power of Planck's constant). Those precious expectations make Boltzmann one of the early forefathers of quantum mechanics.

Information: bits and e-bits

If you read random bits "0", "1" from a source, you are getting information. If both "0" and "1" have the probability to occur at 50%, and they're not correlated with the previous digits, each binary digit carries one bit of information.

"N" bits are able to distinguish "2^N" equally probable possibilities. They carry the information "log_2(2^N) = N" bits. The logarithm's base is two. Of course, in physics and mathematics, it's way more natural to use natural logarithms whose base is "e=2.718...". While base-two logarithms are important for computers which are man-made, the natural exponentials and logarithms are those that appear in Nature. (Some "discrete physicists" are still unable to understand this simple point.) Physics wants to deal with Nature so we would say that "N" bits carry the information "ln(2^N) = N ln(2)" where "ln(2)=0.693" or so.

One digit that has "K" equally likely values (a base-K digit) carries the information of "ln(K)", and so on. It's also useful to learn Shannon's generalization of the information entropy.

Imagine that you have a noisy picture, with many (N) bits being either 0 or 1, the points are uncorrelated among the pieces of the picture, but the probability of having "0" is "p" while the probability of 1 is "1-p". How much information is there in the picture? Can you substantially compress the computer file if "p" differs from 1/2? You bet.

I said that if you could distinguish "2^N" choices, the information carried by the sequence of bits - that actually tells you which choice you got - equals "ln(2^N)". You can explain it in another way, too: it's "ln(2^N)" because the probability of getting a particular sequence was equal to "1/2^N", and the information carried by the sequence was thus "-ln(1/2^N)".

In the very same way, if you have bits such that "0" appears with probability "p" while "1" appears with probability "1-p", you can say that if you get another "0", you got "-ln(p)" of information, while if you get another "1" digit, you received "-ln(1-p)" of information. Relatively to the previous paragraph, I just replaced "1/2^N" by "p" or "1-p" because that's the right map here.

For example, if the digits "1" are very unlikely, i.e. if "p" is very close to "1" while "1-p" is close to very zero, you get a lot of information (relatively) if you're lucky and you just obtain one pixel "1". On the other hand, another "0" doesn't tell you too much because you expected "0" to be probably there.

What is the average information carried by one bit or pixel if "0" occurs with probability "p" and "1" occurs with probability "1-p"? Well, that's easy. You must compute the expectation value. With probability "p", the information is "-ln(p)", because the digit turns out to be "0". And with probability "1-p", the information carried by the pixel is "-ln(1-p)" because the digit is "1". So the statistical average is

Information = - p ln(p) - (1-p) ln(1-p)

We only have two terms because we had two possibilities. It's obvious how to generalize it to the case of more options than two:

Information = - Σ_k p_k ln p_k

where the probabilities must sum to one, i.e. "Σ_k p_k = 1". This is the formula for the entropy, too.

If there are "continuously infinitely many" possibilities, you must replace the summation by the integral, and the probability "p_k" by the density "rho". In that case, "p_k" becomes a dimensionful "rho(k)", assuming that the options are distinguished by a dimensionful "k" (i.e. positions and momenta of particles), and in order for the logarithm's argument to be dimensionless, which it should be, you should first divide "rho(k)" by a natural unit of the "phase space volume". This turns out to naturally be equal to a power of Planck's constant "h" in quantum mechanics.

But if you would use an incorrect value of "h", the resulting error would be just an additive shift of the information by "- integral rho ln(h)" which is equal to "-ln(h)" because of the normalization condition for "rho" (whose integral equals one). But note that "-ln(h)" is independent of the particular configuration "k". It's a universal additive shift.

In classical thermodynamics, it was never possible to determine the overall additive shift to the entropy - only the entropy differences were objectively physical. In quantum physics, things become kind of more unique and well-defined because the entropy itself, not just its changes, can be calculated, including the additive shift, because we know what the "right" (or at least "natural") value of "h" is that should be inserted to the logarithms: it's Planck's constant or its power.

An important consequence is that in classical physics, the third law of thermodynamics would only say that the entropy of a system was approaching a universal constant at "T=0". In quantum mechanics, we can say that the constant is "S=0". I will discuss quantum mechanics later.

Indistinguishable microstates

A formula for the information or entropy is nice and important but we mustn't get lost in the formulae. We must also know what they mean. In particular, we must know whether the information we're talking about is actually known to us or unknown to us.

The very motivation of using entropy in physics (or elsewhere) is that we effectively "don't distinguish" or "can't distinguish" or "don't want to distinguish" many configurations - or sequences of bits - from one another. They look "macroscopically identical" because we're not able to measure the detailed properties of all individual atoms. If that's so, the same formula I derived above,

Entropy = - Σ_k p_k ln p_k

informs us about the inherent amount of disorder that is hiding in the microscopic information - the degrees of freedom that are hard to measure macroscopically. The right hand side of the equation above is dimensionless. In order to compare it with the entropy in thermodynamics, which has units of heat per temperature, you should multiply the formula above by Boltzmann's constant "k":

S = Entropy . k, k = 1.38 x 10^{-23} J/K

I used two symbols, "Entropy" and "S", for the two versions of the same thing. They only differ by "k" - a constant often set to one by the adult physicists who conveniently use the adult units.

Such a proportionality constant had to occur because people had introduced two units, Joule and Kelvin, for the energy and temperature, not realizing that these quantities are "secretly" the same thing because temperature is just the average energy carried by one quadratic degree of freedom (actually two of them, because E=kT/2). So it's natural to treat them as the same quantity and Boltzmann's constant "k" is the conversion factor from Kelvin degrees to Joules.

The second law

One law of physics has been associated with the entropy since the very beginning, and it was actually the primary reason why the physicists invented the concept of entropy in the first place: it never decreases, at least not by macroscopic amounts.

That's easy to see. The probabilistic distribution for various microstates corresponds to a "cloud" in the phase space - the points where the cloud is present have "rho" substantially greater than zero. If you evolve this "cloud" according to the equations of motion, it will spread, like all clouds. The resulting cloud will look chaotic and its snippets will cover most of the phase space - or its part that respects the conservation laws.

If you take the final cloud and create a "fuzzy" version of it, by adding all states that are macroscopically indistinguishable from the points included in the cloud (a kind of convolution with another, ball-shaped cloud - whose size and shape depends on what you mean by "indistinguishable", an issue that always depends on some conventions), you will inevitably obtain a cloud with a higher entropy. Why? Because it will dilute the numbers "p" or "rho" so that they're more uniform than before. And the key formula for the "Entropy" simply increases if things get more uniform. For example, note that "-p ln(p) - (1-p) ln(1-p)" is maximized at "p=1/2" - the uniform distribution between "0" and "1" as the values of the bit.

Boltzmann's proof of the H-theorem is just a mathematical version of the simple argument above. It is de facto completely trivial and it holds completely universally - regardless of the "interpretation" or "visualization" of the particular configurations (or sequences of bits).

The upgrade to quantum mechanics

We said that the formula for the entropy could have been rewritten from a sum to the integral if the degrees of freedom were continuous. But it actually turns out that quantum mechanics - because it's quantum - allows us to return to the original, "simpler" formula involving a sum rather than an integral. We have

Entropy = - Σ_k p_k ln p_k

In this case, "p_k" is the probability of a particular microstate - an element of an orthonormal basis of the Hilbert space. Because such sums can be rewritten as traces, we also have

Entropy = - Tr ρ ln ρ

where "ρ" (rho) is the density matrix. In an appropriate orthonormal basis, the Hermitean matrix ρ can always be diagonalized. And the eigenvalues are nothing else than the probabilities "p_k" of the individual microstates (basis vectors) that we were discussing previously.

Note that the trace can also be rewritten as an integral, if you insert the completeness relation for a continuous basis. However, in quantum mechanics, you could only integrate over "x" only or "p" only, but not both (because of the uncertainty principle). Alternatively, you may "cover" the phase space by another set of microstates that are quasi-localized, and you know that there will be 1 microstate per volume "h = 2.pi.hbar" of the phase space.

The proof that the entropy doesn't decrease - more precisely, that the probability that it decreases by "Δ Entropy" is as small as "exp(-Δ Entropy)" which is de facto zero for macroscopic changes - is pretty much identical in quantum mechanics as it has been in classical physics. In fact, it's simpler because we have a basis-independent formula involving the trace and because the probability densities appearing in the logarithms are nicely and automatically made dimensionless.

Also, the specific properties of quantum mechanics - such as interference and the ability to violate Bell's inequalities via entanglement - become irrelevant in statistical physics because the very goal of statistical physics is to consider the effect of having very many states that differ from each other in details and that decohere: whenever you have such large ensembles of states, the classical reasoning is becoming a good approximation. Decoherence guarantees that interference disappears. Consequently, the overall logic of quantum statistical physics is the same as it was in classical statistical physics.

For example, if the density matrix is mixed (i.e. more than one of its eigenvalues is nonzero), you shouldn't imagine that the precise form of the density matrix is an "objective state of the world". Instead, you may always imagine that the density matrix just encodes your ignorance about the actual state but the actual state may always be imagined to be a pure state. The same thing was actually true in classical physics: the probabilistic distributions on the phase spaces were just a technical tool to deal with our ignorance while the underlying physics was thought to have a well-defined pure state (a point in the phase space). But if you don't know what the state is, you're forced to calculate with the probabilities, anyway.

(The only new feature of quantum mechanics in this respect is that the pure states can be complex linear combinations of other pure states, which is also why even pure states lead to probabilistic predictions in QM, as they have to be rewritten as combinations of other pure states, namely eigenstates of the measured observables. But this fact, linked to the quantum superpositions, is getting less important if you talk about whole ensembles of macroscopically indistinguishable states because the density matrix is de facto proportional to the identity matrix acting on these subspaces of the Hilbert space, so the ability of quantum mechanics to produce vectors in "interesting new directions" of the Hilbert space can't be exploited here.)

I want to stress that all these things have been understood for 80 years - and most of them (those unrelated to quantum physics) for more than 130 years. The arguments above, or its obvious refinements or specializations that use the notation that is appropriate for a dynamical system (such as quantum field theory), tell us everything that is universally true about the concept of entropy and its behavior. In particular situations, we may calculate how much the entropy is changing and what it means - but these things depend on the "Hamiltonian" or "dynamics", of course.

The only insights that don't depend on them are the general consequences of statistics, logic, and properties of the logarithm - and most of them have actually been explained in the text above.

Irreversibility

The notion of entropy is critically linked to irreversibility, as we have mentioned, and irreversibility means that the physical phenomena don't look identical if you revert them in time - if you run the "movie" backwards. The most omni-present fact that can't be changed is that the entropy increases from the past to the future. It never decreases. This is true even if the underlying laws are time-reversal symmetric - and the "practically relevant ones" approximately but almost exactly are time-reversal symmetric and all of the known ones must actually be exactly CPT-symmetric, in fact, which is morally similar to the T-symmetry.

In the previous sentences about the asymmetry, "the past" and "the future" are defined according to the logical arrow of time which must always exist in any argument that studies incomplete knowledge. "The past" is what we can remember or what we can know, "the future" is what we can't know, what we can attempt to predict, change, expect, hope for, or be afraid of. For Nature Herself, "the future" is what she must evolve given the existing data from "the past".

All these things make future and past asymmetric. There's no way to define logic in such a way that the future and the past would play symmetrical roles. The only new thing you should do would be to "rename" the future and the past i.e. to pretend that the past is the future and vice versa. But that's just a vacuous linguistic exercise: the actual future and the actual past differ by striking logical properties and it makes no sense to obscure which is which.

Gaps in people's knowledge

I claim that all the stuff above is de facto a rudimentary undergraduate physics material. And I claim that every single person who has recently claimed that the "entropy" is deeply mysterious or it can be used in ways that are "completely new" or that the second law has become a "hot" question has misunderstood some basic segments of the knowledge above which is too bad.

The whole notion of entropy was designed, and is still critically useful, for understanding of the irreversibility in the world because the increasing character of the entropy is its basic property. Erik Verlinde doesn't seem to be getting this point: when a theory talks about changes of entropy, it is inevitably a theory of irreversible phenomena. These phenomena are qualitatively different from the fundamental T-symmetric (and thus reversible) or at least CPT-symmetric laws of mechanics (including gravity) or field theory (including general relativity).

Other people such as Sean Carroll don't understand that the thermodynamic arrow of time is fully explained by the statistical considerations above and by the existence of a logical arrow of time. The latter is inherently T-asymmetric and it always has to exist to allow us any kind of reasoning of the type "A probably implies B" as long as "A" and "B" are propositions about events that occur in time or spacetime. One can't look for any "additional" explanation of the asymmetry because it would clearly be either equivalent to the logic or incompatible with it. In particular, thermodynamics in a lab has nothing whatsoever to do with cosmology.

Media-driven deterioration of physics

After string theory, it was also quantum mechanics, relativity, and now entropy that have gotten very controversial because of the fog produced by tons of incompetent science journalists and that are increasingly expanding in "professional science", too. Junk and manifestly wrong papers and crackpots are being increasingly highlighted while quality physicists are losing influence. This is just a sociological process but the existence of meaningful research does depend on the existence of a broader society that makes it possible.

I wonder what will be the next pillar of physics that will come under attack? Will the apples start to fall to the skies because the new "revolutionary" physicists dudes will be eager to compete with Isaac Newton?

IND2906

Entropy, information, and mixed states

You May Also Like

No comments:

Recent Posts

Facebook

Blog Archive

Popular Posts

Popular Posts

Social Widget

Random Posts

Recent Posts