Gibbs and Boltzmann Entropy in Classical and Quantum Mechanics
Abstract
The Gibbs entropy of a macroscopic classical system is a function of a probability distribution over phase space, i.e., of an ensemble. In contrast, the Boltzmann entropy is a function on phase space, and is thus defined for an individual system. Our aim is to discuss and compare these two notions of entropy, along with the associated ensemblist and individualist views of thermal equilibrium. Using the Gibbsian ensembles for the computation of the Gibbs entropy, the two notions yield the same (leading order) values for the entropy of a macroscopic system in thermal equilibrium. The two approaches do not, however, necessarily agree for nonequilibrium systems. For those, we argue that the Boltzmann entropy is the one that corresponds to thermodynamic entropy, in particular in connection with the second law of thermodynamics. Moreover, we describe the quantum analog of the Boltzmann entropy, and we argue that the individualist (Boltzmannian) concept of equilibrium is supported by the recent works on thermalization of closed quantum systems.
Key words: statistical mechanics; second law of thermodynamics; thermal equilibrium.
Contents
1 Introduction
Disagreement among scientists is often downplayed, and science often presented as an accumulation of discoveries, of universally accepted contributions to our common body of knowledge. But in fact, there is substantial disagreement among physicists, not only concerning questions that we have too little information about to settle them, such as the nature of dark matter, but also concerning conceptual questions about which all facts have long been in the literature, such as the interpretation of quantum mechanics. Another question of the latter type concerns the definition of entropy and some related concepts. In particular, two different formulations are often given in the literature for how to define the thermodynamic entropy (in equilibrium and nonequilibrium states) of a macroscopic physical system in terms of a microscopic, mechanical description (classical or quantum).
1.1 Two Definitions of Entropy in Classical Statistical Mechanics
In classical mechanics, the Gibbs entropy of a physical system with phase space , for example for point particles in with positions and velocities , is defined as
(1) 
where is the Boltzmann constant, the (symmetrized) phase space volume measure, the natural logarithm,^{5}^{5}5One actually takes the expression to mean the continuous extension of the function from to the domain ; put differently, we follow the convention to set . and a probability density on .^{6}^{6}6Changing the unit of phase space volume will change by a constant factor and thus by addition of a constant, an issue that does not matter for the applications and disappears anyway in the quantum case.
The Boltzmann entropy of a macroscopic system is defined as
(2) 
where is the actual phase point of the system, means the volume in , and is the set of all phase points that “look macroscopically the same” as . Obviously, there is no unique precise definition for “looking macroscopically the same,” so we have a certain freedom to make a reasonable choice. It can be argued that for large numbers of particles (as appropriate for macroscopic physical systems), the arbitrariness in the choice of shrinks and becomes less relevant. A natural choice is to partition the phase space into regions we call macro sets (see Figure 1),
(3) 
and to take as the containing . That is,
(4) 
We will give more detail in Section 3. Boltzmann’s definition (2) is often abbreviated as “.” In every energy shell there is usually one macro set that corresponds to thermal equilibrium and takes up by far most (say, more than ) of the volume (see Section 3.1).
An immediate problem with the Gibbs entropy is that while every classical system has a definite phase point (even if we observers do not know it), a system does not have a ; that is, it is not clear which distribution to use. For a system in thermal equilibrium, presumably means a Gibbsian equilibrium ensemble (microcanonical, canonical, or grandcanonical). It follows that, for thermal equilibrium states, and agree to leading order, see (31) below. In general, several possibilities for come to mind:

ignorance: expresses the strength of an observer’s belief that .

preparation procedure: A given procedure does not always reproduce the same phase point, but produces a random phase point with distribution .

coarse graining: Associate with every a distribution on that captures how macrosimilar and are (or perhaps, how strongly an ideal observer seeing a system with phase point would believe that ).
Correspondingly, there are several different notions of Gibbs entropy, which we will discuss in Sections 2 and 4. Here, maybe (c) could be regarded as a special case of (a), and thermal equilibrium ensembles as a special case of (b). In fact, it seems that Gibbs himself had in mind that any system in thermal equilibrium has a random phase point whose distribution should be used, which is consistent with option (b); in his words (Gibbs, 1902, p. 152):
[…] we shall find that [the] distinction [between interaction of a system with a system with determined phase point and one with distribution ] corresponds to the distinction in thermodynamics between mechanical and thermal action.
In our discussion we will also address the status of the Gibbsian ensembles (see also Goldstein, 2019). We will argue that qualifies as a definition of thermodynamic entropy whereas version (a) of does not; (b) is not correct in general; and (c) is acceptable to the extent that it is not regarded as a special case of (a).
Another immediate problem with the Gibbs entropy is that it does not change with time,
(5) 
if is taken to evolve in accord with the microscopic Hamiltonian dynamics, that is, according to the Liouville equation
(6) 
where is the vector field on that appears in the equation of motion
(7) 
for the phase point , such as with matrix in position and momentum coordinates and the Hamiltonian function. Generally, the Gibbs entropy does not change when gets transported by any volumepreserving bijection ,
(8) 
In particular, by Liouville’s theorem of the conservation of phase space volume, this applies when is the Hamiltonian time evolution, with . The time independence of conflicts with the formulation of the second law given by Clausius, the person who coined the “laws of thermodynamics” as follows (Clausius, 1865, p. 365):
1. The energy of the universe is constant.
2. The entropy of the universe tends to a maximum.
Among the authors who took entropy to be the Gibbs entropy, some (e.g., Khinchin, 1941) have argued that Clausius’s wording of the second law is inappropriate or exaggerated, others (e.g., Mackey, 1989) that the Liouville evolution (6) is not the relevant evolution here. We will come back to this point in Section 6. As we will explain in Section 3, Clausius’s statement is actually correct for the Boltzmann entropy.
Different views about the meaning of entropy and the second law have consequences about the explanatory and predictive power of the second law that we will consider in Section 2. They also have practical consequences in the formulation of hydrodynamic equations, e.g., NavierStokes equations, for macroscopic variables (Goldstein and Lebowitz, 2004): such macroscopic equations can be compatible with a microscopic Hamiltonian evolution only if they make sure that the Boltzmann entropy increases.
1.2 Boltzmann’s and
We should address right away in the introduction a certain confusion about Gibbs and Boltzmann entropy that has come from the fact that Boltzmann used, in connection with the Boltzmann equation, the definition
(9) 
for entropy. Boltzmann used the notation (not to be confused with the Hamiltonian function) for the integral in (9), i.e., for ; this functional gave name to the theorem, which asserts that always decreases—except in thermal equilibrium, when is constant in time.
Here, is the 1particle phase space (for definiteness, ) and a distribution density in . The formula (9) obviously looks much like the Gibbs entropy (and has presumably inspired Gibbs’s definition, which was published after (9)). In fact, (9) is also the Gibbs entropy of . But here it is relevant that means the empirical distribution of points in (after smoothing, or in the limit ), and is thus computed from (see Section 3 below for more detail). So, the functional or the in (9) is indeed a function on , and in fact a special case of (2)^{7}^{7}7without symmetrization factor , or with that factor, up to a function of alone. corresponding to a particular choice of dividing into regions of macroscopic equivalence (see Section 3).
1.3 The Quantum Case
Consider a macroscopic quantum system (e.g., a gas of atoms in a box) with Hilbert space . We write for the unit sphere in the Hilbert space . The natural analog of the Gibbs entropy is the quantum Gibbs entropy or von Neumann entropy of a given density matrix on (von Neumann, 1927),
(10) 
Von Neumann (1929) himself thought that this formula was not the most fundamental one but only applicable in certain circumstances; we will discuss his proposal for the fundamental definition in Section 7 below. In a sense, the density matrix plays the role analogous to the classical distribution density , and again, the question arises as to what exactly refers to: an observer’s ignorance or what? Our discussion of options (a)–(c) above for the Gibbs entropy will apply equally to the von Neumann entropy. In addition, there is also a further possibility:

reduced density matrix: The system is entangled with another system , and is obtained from the (perhaps even pure) state of through a partial trace, .
As we will discuss in Section 7.3, with option (d) does not yield a good notion of thermodynamic entropy, either.
In practice, systems are never isolated. But even if a macroscopic system were perfectly isolated, heat would flow in it from the hotter to the cooler parts, and, as we will explain in this section, there is a natural sense in which entropy can be defined and increases. The idealization of an isolated system helps us focus on this sense. In Section 1.5, we will point out why the results also cover nonisolated systems.
The closest quantum analog of the Boltzmann entropy is the following. A macro state should correspond to, instead of a subset of phase space, a subspace of Hilbert space , called a macro space in the following; for different macro states , the macro spaces should be mutually orthogonal, thus yielding a decomposition of Hilbert space into an orthogonal sum (von Neumann, 1929; Goldstein et al., 2010a, b),
(11) 
instead of the partition (3). Now the dimension of a subspace of plays a role analogous to the volume of a subset of , and correspondingly we define the quantum Boltzmann entropy of a macro state by (Griffiths, 1994; Lebowitz, 2008; Goldstein et al., 2010b)
(12) 
In fact, already Einstein (1914, Eq. (4a)) argued that the entropy of a macro state should be proportional to the log of the “number of elementary quantum states” compatible with that macro state; it seems that (12) fits very well with this description. To a system with wave function (or, for that matter, a density matrix concentrated in ) we attribute the entropy value .
It seems convincing that yields the correct value of thermodynamic entropy. For one thing, it is an extensive, or additive quantity: If we consider two systems with negligible interaction, then the Hilbert space of both together is the tensor product, , and it seems plausible that the macro states of correspond to specifying the macro state of and , resp., i.e., with . As a consequence, the dimensions multiply, so
(13) 
For another thing, it is plausible that, analogously to the classical case, in every energy shell (i.e., the subspace corresponding to an energy interval with the resolution of macroscopic energy measurements) there is a dominant macro space whose dimension is near (say, greater than of) the dimension of the energy shell, and this macro state corresponds to thermal equilibrium (e.g., Tasaki, 2016), . As a consequence,
(14) 
and the righthand side is well known to yield appropriate values of thermodynamic entropy in thermal equilibrium. In fact, the righthand side agrees with , the von Neumann entropy associated with the microcanonical density matrix (i.e., the normalized projection to ) and with thermal equilibrium at energy .
Of course, a general pure quantum state will be a nontrivial superposition of contributions from different ’s, , where is the projection to . One can say that is a superposition of different entropy values, and in analogy to other observables one can define the selfadjoint operator
(15) 
whose eigenvalues are the and eigenspaces the .
Here, the question arises as to which entropy value we should attribute to a system in state . At this point, the foundations of statistical mechanics touch the foundations of quantum mechanics, as the problem of Schrödinger’s cat (or the measurement problem of quantum mechanics) concerns precisely the status of wave functions that are superpositions of macroscopically different contributions, given that our intuition leads us to expect a definite macroscopic situation. The standard “Copenhagen” formulation of quantum mechanics does not have much useful to say here, but several proposed “quantum theories without observers” have long solved the issue in clean and clear ways (Goldstein, 1998):

Bohmian mechanics (in our view the most convincing proposal) admits further variables besides by assuming that quantum particles have definite positions, too. Since Schrödinger’s cat then has an actual configuration, there is a fact about whether it is dead or alive, even though the wave function is still a superposition. In the same way, the configuration often selects one of the macroscopically different contributions ; in fact, this happens when
the significantly nonzero do not overlap much in configuration space. (16) Artificial examples can be constructed for which this condition is violated, but it seems that (16) is usually fulfilled in practice. In that case, it seems natural to regard the entropy of that one contribution selected by the Bohmian configuration as the actual entropy value.

Collapse theories modify the Schrödinger equation in a nonlinear and stochastic way so that, for macroscopic systems, the evolution of avoids macroscopic superpositions and drives towards (a random) one of the . Then the question about the entropy of a macroscopic superposition does not arise.

Maybe (Allori et al., 2011) a manyworlds view, in which each of the contributions corresponds to a part of reality, can be viable. Then the system’s entropy has different values in different worlds.
We have thus defined the quantum Boltzmann entropy of a macroscopic system,
(17) 
in each of these theories.
1.4 Second Law in Quantum Mechanics
Let us turn to the question of what the second law asserts in quantum mechanics. Since remains constant under the unitary time evolution in , the discussion given in Section 6 for applies equally to . About , we expect the following.
Conjecture 1.
In Bohmian mechanics and collapse theories, for macroscopic systems with reasonable Hamiltonian and decomposition , every and most are such that with probability close to 1, is nondecreasing with time, except for infrequent, shallow, and shortlived valleys, until it reaches the thermal equilibrium value. Moreover, after reaching that value, stays at that value for a very long time.
Careful studies of this question would be of interest but are presently missing. We provide a bit more discussion in Section 7.
1.5 Open Systems
For open (nonisolated) systems, the quantum Boltzmann entropy can still be considered and should still tend to increase—not for the reasons that make the von Neumann entropy of the reduced density matrix increase (evolution from pure to mixed states), but for the reasons that make the quantum Boltzmann entropy of an isolated system in a pure state increase (evolution from small to large macro spaces). Let us elaborate.
In fact, the first question about an open system (interacting with its environment ) would be how to define its state. A common answer is the reduced density matrix
(18) 
(if is in the pure state ); another possible answer is the conditional density matrix (Dürr et al., 2005)
(19) 
where denotes the partial trace over the nonpositional degrees of freedom of (such as spin), the scalar products are partial scalar products involving only the position degrees of freedom of , and is the Bohmian configuration of . (To illustrate the difference between and , if is Schrödinger’s cat and its environment, then is a mixture of a live and a dead state, whereas is either a live or a dead state, each with the appropriate probabilities.)
Second, it seems that the following analog of (16) is usually fulfilled in practice for both the reduced density matrix and the conditional density matrix :
For those for which is significantly nonzero, the functions on configuration space given by do not overlap much.  (20) 
As with (16), if (20) holds, then the Bohmian configuration of selects the actual macro state of . Moreover, should usually be the same for as for . Then this macro state determines the entropy,
(21) 
In short, the concept of quantum Boltzmann entropy carries over from isolated systems in pure states to open systems.
1.6 Quantum Thermalization
In the 21st century, there has been a wave of works on the thermalization of closed quantum systems, often connected with the key words “eigenstate thermalization hypothesis” (ETH) and “canonical typicality”; see, e.g., (Gemmer et al., 2004; Goldstein et al., 2006; Popescu et al., 2006; Goldstein et al., 2010a; Gogolin and Eisert, 2016; Goldstein et al., 2015; Kaufman et al., 2016) and the references therein. The common theme of these works is that an individual, closed, macroscopic quantum system in a pure state that evolves unitarily will, under conditions usually satisfied and after a sufficient waiting time, behave very much as one would expect a system in thermal equilibrium to behave; more precisely, on such a system with in an energy shell, relevant observables yield their thermal equilibrium values up to small deviations with probabilities close to 1. For example, this happens simultaneously for all observables referring to a small subsystem of that interacts weakly with the remainder , with the further consequence (“canonical typicality”) that the reduced density matrix of ,
(22) 
is close to a canonical one (Goldstein et al., 2006),
(23) 
for suitable and normalizing constant ; here, is the Hamiltonian of . For another example, every initial wave function of in an energy shell will, after a sufficient waiting time and for most of the time in the long run, be close to the thermal equilibrium macro space (Goldstein et al., 2010a), provided that the Hamiltonian is nondegenerate and (ETH) all eigenvectors of are very close to .
These works support the idea that the approach to thermal equilibrium need not have anything to do with an observer’s ignorance. In fact, the system always remains in a pure state, and thus has von Neumann entropy at all times. This fact illustrates that the kind of thermalization relevant here involve neither an increase in von Neumann entropy nor a stationary density matrix of . Rather, reaches, after a sufficient waiting time, the neighborhood of the macro space corresponding to thermal equilibrium in the energy shell (Goldstein et al., 2010a). That is the “individualist” or “Boltzmannian” version of approach to thermal equilibrium in quantum mechanics.
In fact, there are two individualist notions of thermal equilibrium in quantum mechanics, which have been called “macroscopic” and “microscopic thermal equilibrium” (Goldstein et al., 2017b). Boltzmann’s approach requires only that macro observables assume their equilibrium values (Goldstein et al., 2010a), whereas a stronger statement is actually true after a long waiting time: that all micro observables assume their equilibrium values (Goldstein et al., 2015). This is true not only for macroscopic systems, but also for small systems (Goldstein et al., 2017a), and has in fact been verified experimentally for a system with as few as 6 degrees of freedom (Kaufman et al., 2016).
We have emphasized earlier in this subsection that thermalization does not require mixed quantum states. We should add that this does not mean that pure quantum states are fully analogous to phase points in classical mechanics. In Bohmian mechanics, for example, the analog of a phase point would be the pair comprising the system’s wave function and its configuration .
The remainder of this article is organized as follows. In Section 2, we discuss and criticize option (a), the idea that entropy is about subjective knowledge. In Section 3, we explain why Boltzmann entropy indeed tends to increase with time and discuss doubts and objections to this statement. In Section 4, we discuss an individualist understanding of Gibbs entropy as a generalization of Boltzmann entropy. In Section 5, we discuss the status of Gibbs’s ensembles. In Section 6, we comment on a few proposals for how entropy increase should work for Gibbs entropy. In Section 7, we turn to the quantum case. In Section 8, we conclude.
2 Subjective Entropy Is Not Enough
By “subjective entropy” we mean Gibbs entropy under option (a): the view that the Gibbs entropy is the thermodynamic entropy, and that the distribution in the Gibbs entropy represents an observer’s subjective perspective and limited knowledge (e.g., Jaynes, 1965; Krylov, 1979; Mackey, 1989; Garibyan and Tegmark, 2014). We would like to point to three major problems with this view.
2.1 Cases of Wrong Values
The first problem is that in some situations, the subjective entropy does not appropriately reproduce the thermodynamic entropy. For example, suppose an isolated room contains a batterypowered heater, and we do not know whether it is on or off. If it is on, then after ten minutes the air will be hot, the battery empty, and the entropy of the room has a high value . Not so if the heater is off; then the entropy has the low initial value . In view of our ignorance, we may attribute a subjective probability of 50 percent to each of “on” and “off.” After ten minutes, our subjective distribution over phase space will be spread over two regions with macroscopically different phase points, and its Gibbs entropy will have a value between and (in fact, slightly above the average of and ).^{8}^{8}8Generally, if several distribution functions have mutually disjoint supports, and we choose one of them randomly with probability , then the resulting distribution has Gibbs entropy . In our example, . But the correct thermodynamic value is not , it is either (if the heater was off) or (if the heater was on). So subjective entropy yields the wrong value.
The same problem arises with option (b), which concerns a system prepared by some procedure that, when repeated over and over, will lead to a distribution of the system’s phase point . Suppose that the isolated room also contains a mechanism that tosses a coin or generates a random bit in some other mechanical way; after that, the mechanism turns on the heater or does not, depending on . We would normally say that the entropy after ten minutes is random, that with probability 1/2 and with probability 1/2. But the distribution created by the procedure (and the canonical distribution for each given value of ) is the same as in the previous paragraph, and has Gibbs entropy , the wrong value.
2.2 Explanatory and Predictive Power
The second major problem with subjective entropy concerns its lack of explanatory and predictive power. Consider for example the phenomenon that by thermal contact, heat always flows from the hotter to the cooler body, not the other way around. The usual explanation of this phenomenon is that entropy decreases when heat flows to the hotter body, and the second law excludes that. Now that explanation would not get off the ground if entropy meant subjective entropy: In the absence of observers, does heat flow from the cooler to the hotter? In distant stars, does heat flow from the cooler to the hotter? In the days before humans existed, did heat flow from the cooler to the hotter? After the human race becomes extinct, will heat flow from the cooler to the hotter? If not, why would observers be relevant at all to the explanation of the phenomenon?
And as with explanation, so with prediction: Can we predict that heat will flow from the hotter to the cooler also in the absence of observers? If so, why would observers be relevant to the prediction?
So, subjective entropy does not seem relevant to either explaining or predicting heat flow. That leaves us with the question, what is subjective entropy good for? The study of subjective entropy is a subfield of psychology, not of physics. It is all about beliefs.
Some ensemblists may be inclined to say that the explanation of heat flow is that it occurs the same way as if an observer observed it. But the fact remains that observers actually have nothing to do with it.
Once the problem of explanatory power is appreciated, it seems obvious that subjective entropy is inappropriate: How could an objective physical phenomenon such as heat flow from the hotter to the cooler depend on subjective belief? In fact, since different observers may have different, incompatible subjective beliefs, how could coherent consequences such as physical laws be drawn from them? And what if the subjects made mistakes, what if they computed the timeevolved distribution incorrectly, what if their beliefs were irrational—would that end the validity of subjective entropy? Somebody may be inclined to say that subjective entropy is valid only if it is rational (e.g., Bricmont, 2018), but that means basically to back off from the thought that entropy is subjective. It means that it does not play much of a role whether anybody’s actual beliefs follow that particular , but rather that there is a correct that should be used; we will come back to this view at the end of the next subsection.
Another drawback of the subjective entropy, not unrelated to the problem of explanatory power, is that it draws the attention away from the fact that the universe must have very special initial conditions in order to yield a history with a thermodynamic arrow of time. While the Boltzmann entropy draws attention to the special properties of the initial state of the universe, the subjective entropy hides any such objective properties under talk about knowledge.
2.3 Phase Points Play No Role
The third problem with subjective entropy is that has nothing to do with the properties of the phase points at which is significantly nonzero. measures essentially the width of the distribution , much like the standard deviation of a probability distribution, except that the standard deviation yields the radius of the set over which is effectively distributed, whereas the Gibbs entropy yields the log of its volume (see (26) below). The problem is reflected in the fact, mentioned around (8) that any volumepreserving transformation will leave the Gibbs entropy unchanged. It does not matter to the Gibbs entropy how the ’s on which is concentrated behave physically, although this behavior is crucial to thermodynamic entropy and the second law.
Some ensemblists may be inclined to say that the kind of that occurs in practice is not any old density function, but is approximately concentrated on phase points that look macroscopically similar. This idea is essentially option (c) of Section 1.1, which was to take as a kind of coarse graining of the actual phase point . Specifically, if denotes again the set of phase points that look macroscopically similar to , then we may want to take to be the flat distribution over ,
(24) 
With this choice we obtain exact agreement between the Gibbs and Boltzmann entropies,
(25) 
Indeed, whenever is the flat distribution over any subset of phase space , , then
(26) 
(This fact also illustrates the mathematical meaning of the Gibbs entropy of any distribution as the log of the volume over which is effectively distributed.)
Of course, if we associate an entropy value with every in this way, then the use of Gibbs’s definition (1) seems like an unnecessary detour. In fact, we have associated with every an entropy value , and talk about the knowledge of observers is not crucial to the definition of the function , as is obvious from the fact that the function is nothing but the Boltzmann entropy, which was introduced without mentioning observers.
This brings us once more to the idea that the in is the subjective belief of a rational observer. One could always use the Boltzmann entropy and add a narrative about observers and their beliefs, such as: Whenever , a rational observer should use the flat distribution over , and the Gibbs entropy of that observer’s belief is what entropy really means. One could say such words. But they are also irrelevant, as observers’ knowledge is irrelevant to which way heat flows, and the resulting entropy value agrees with anyway.
2.4 What is Attractive About Subjective Entropy
Let us turn to factors that may seem attractive about the subjective entropy view: First, it seems like the obvious interpretation of the density that comes up in all ensembles. But the Boltzmannian individualist view offers an alternative interpretation, as we explain in Section 3.
Second, it is simple and elegant. That may be true but does not do away with the problems mentioned.
Third, the subjective view mixes well with certain interpretations of quantum mechanics such as Copenhagen and quantum Bayesianism, which claim that quantum mechanics is all about information or that one should not talk about reality. These interpretations are problematical as well, mainly because all information must ultimately be information about things that exist, and it does not help to leave vague and unspecified which things actually exist (Goldstein, 1998).
Fourth, the subjective view may seem to mix well with the work of Shannon (1948), as the Shannon entropy is a discrete version of Gibbs entropy and often regarded as quantifying the information content of a probability distribution. But actually, there is not a strong link, as Shannon regarded the probabilities in his study of optimal coding of data for transmission across a noisy channel as objective and did not claim any connection with thermodynamics. (By the way, it is dangerous to loosely speak of the “amount of information” in the same way as one speaks of, e.g., the amount of sand; after all, the sand grains are equal to each other, and one does not care about whether one gets this or that grain, whereas different pieces of information are not equivalent to each other.)
Fifth and finally, a strong pull towards subjective entropy comes from the belief that “objective entropy” either does not work or is ugly—a wrong belief, as we will explain in Section 3.
2.5 Remarks
Further critiques of subjective entropy can be found in (Callender, 1999; Lebowitz and Maes, 2003; Goldstein et al., 2017a; Goldstein, 2019).
We would like to comment on another quote. Jaynes (1965), a defender of subjective entropy, reported a dictum of Wigner’s:
Entropy is an anthropomorphic concept.
Of course, this phrase can be interpreted in very different ways. Jaynes took it to express that entropy refers to the knowledge of human observers—the subjective view that we have criticized. But we do admit that there is a trait in entropy that depends partly on human nature, and that is linked to a certain (though limited and usually unproblematical) degree of arbitrariness in the definition of “looking macroscopically the same.” This point will come up again in the next section.
3 Boltzmann’s Vision Works
Many authors expressed disbelief that Boltzmann’s understanding of entropy and the second law could possibly work. Von Neumann ([)Sec. 0.6]vN29 wrote:
As in classical mechanics, also here [in the quantum case] there is no way that entropy could always increase, or even have a predominantly positive sign of its [time] derivative (or difference quotient): the time reversal objection as well as the recurrence objection are valid in quantum mechanics as well as in classical mechanics.
Khinchin (1941, §33, p. 139):
[One often] states that because of thermal interaction of material bodies the entropy of the universe is constantly increasing. It is also stated that the entropy of a system “which is left to itself” must always increase; taking into account the probabilistic foundation of thermodynamics, one often ascribes to this statement a statistical rather than absolute character. This formulation is wrong if only becuase the entropy of an isolated system is a thermodynamic function—not a phasefunction—which means that it cannot be considered as a random quantity; if and all [external parameters] remain constant the entropy cannot change its value whereas by changing these parameteres in an appropriate way we can make the entropy increase or decrease at will. Some authors (footnote: Comp. Borel, Mécanique statistique classique, Paris 1925.) try to generalize the notion of entropy by considering it as being a phase function which, depending on the phase, can assume different values for the same set of thermodynamical parameters, and try to prove that entropy so defined must increase, with overwhelming probability. However, such a proof has not yet been given, and it is not at all clear how such an artificial generalization of the notion of entropy could be useful to the science of thermodynamics.
Jaynes (1965):
[T]he Boltzmann theorem does not constitute a demonstration of the second law for dilute gases[.]
Even Boltzmann himself was at times unassured. In a letter to Felix Klein in 1889, he wrote:
Just when I received your dear letter I had another neurasthenic attack, as I often do in Vienna, although I was spared them altogether in Munich. With it came the fear that the whole curve was nonsense.
But actually, the curve (i.e., the time evolution of entropy) makes complete sense, Boltzmann’s vision does work, and von Neumann, Khinchin, and Jaynes were all mistaken, so it is perhaps worth elucidating this point. Many other, deeper discussions can be found in the literature, e.g., qualitative, popular accounts in (Penrose, 1989; Lebowitz and Maes, 2003; Carroll, 2010), overviews in (Goldstein, 2001; Lebowitz, 2008; Goldstein, 2019), more technical and detailed discussions in (Boltzmann, 1898; Ehrenfest and Ehrenfest, 1911; Lanford, 1976; Garrido et al., 2004; Goldstein et al., 2017a; Lazarovici, 2018). So we now give a summary of Boltzmann’s explanation of the second law.
3.1 Macro States
We start with a partition of phase space into macro sets as in Figure 1. A natural way of obtaining such a partition would be to consider several functions () that we would regard as “macro variables.” Since macro measurements have limited resolution (say, ), we want to think of the as suitably coarsegrained with a discrete set of values, say, . Then two phase points will look macroscopically the same if and only if for all , corresponding to
(27) 
one for every macro state described by the list of values of all . We will discuss a concrete example due to Boltzmann in Section 3.4. Since coarsegrained energy should be one of the macro variables, say
(28) 
with the Hamiltonian function and the nearest integer to , every is contained in one microcanonical energy shell
(29) 
Of course, this description still leaves quite some freedom of choice and thus arbitrariness in the partition, as different physicists may make different choices of macro variables and of the way and scale to coarsegrain them; this realization makes an “anthropomorphic” element in explicit. Wallace (2018) complained that this element makes the Boltzmann entropy “subjective” as well, but that complaint does not seem valid: rather, and its increase provide an objective answer to a question that is of interest from the human perspective. Moreover, as mentioned already, this anthorpomorphic element becomes less relevant for larger . It is usually not problematical and not subject to the same problems as the subjective entropy.
Usually in macroscopic systems, there is, for every energy shell (or, if there are further macroscopic conserved quantities besides energy, in the set where their values have been fixed as well), one macro set that contains most (say, more than ) of the phase space volume of (see, e.g., Boltzmann, 1898; Lanford, 1973; Lazarovici, 2018);^{9}^{9}9There are exceptions, in which none of the macro sets dominates; for example, in the ferromagnetic Ising model with vanishing external magnetic field and nottoohigh temperature, there are two macro states (the first having a majority of spins up, the second having a majority of spins down) that together dominate but have equal volume; see also (Lazarovici, 2018). But that does not change much about the discussion. in fact (Goldstein et al., 2017b, Eq. (6)),
(30) 
with positive constant . The existence of this dominant macro state means that all macro observables are nearlyconstant functions on , in the sense that the set where they deviate (beyond tolerances) from their dominant values has tiny volume. This macro state is the thermal equilibrium state, , see Figure 1, and the dominant values of the macro observables are their thermal equilibrium values. That fits with thermal equilibrium having maximal entropy, and it has the consequence that
(31) 
where is the microcanonical distribution, and the (relative or absolute) error in the approximation tends to zero as .
Moreover, different macro sets have vastly different volumes. In fact, usually the small macro sets of an energy shell taken together are still much smaller than the next bigger one,
(32) 
with
(33) 
(There are exceptions to this rule of thumb; in particular, symmetries sometimes imply that two or a few macro sets must have approximately the same volume.)
3.2 Entropy Increase
Now increase of Boltzmann entropy means that the phase point moves to bigger and bigger macro sets . In this framework, the second law can be stated as follows.
Mathematical second law. Given , for most phase points in , moves to bigger and bigger macro sets as increases until it reaches , except possibly for entropy valleys that are infrequent, shallow, and shortlived; once reaches , it stays in there for an extraordinarily long time, except possibly for infrequent, shallow, and shortlived entropy valleys.  (34) 
The described behavior is depicted in Figure 2. Entropy valleys (i.e., periods of entropy decrease and return to the previous level) are also called fluctuations. The physical second law then asserts that the actual phase point of a realworld closed system behaves the way described in (34) for most phase points.
As an illustration of (34) and as a step towards making it plausible, let us consider two times, 0 and . Let . By Liouville’s theorem, , and thus, by (32),
(35) 
That is, only a small minority of points in will have entropy smaller than . That is, for most points ,
(36) 
Another simple special case is the one in which the macro evolution is deterministic (Garrido et al., 2004; Goldstein and Lebowitz, 2004; De Roeck et al., 2006). For the sake of concreteness, assume that in a time step of a certain size , gets mapped into , which in turn gets mapped into , and so on up to :
(37) 
Then, by Liouville’s theorem, , so
(38) 
for all , so entropy does not decrease. Of course, in realistic cases, the macro evolution becomes deterministic only in the limit , and as long as is finite, there are a minority of points in that do not evolve to .
Generally, if the Hamiltonian motion is not specially desgined for the given partition of , then it is quite intuitive that the motion of the phase point should tend to lead to larger macro sets, and not to smaller ones. It is also quite intuitive that the phase point would stay in for a very, very long time: If the nonequilibrium set has only the fraction of the volume of the energy shell, cf. (30), then only a tiny fraction of should be able to evolve into the nonequilibrium set in a short time; and if most points in spend a substantial amount of time there, then it will take very, very long until a substantial fraction of has visited . The statement that points in stay there for a long time fits well with the observed stationarity of thermal equilibrium—which is why it is called “equilibrium.”
Let us briefly address two classic objections to the idea that entropy increases:

Time reversal (Loschmidt’s objection) shows that entropy increase cannot hold for all phase points in . Concretely, for relevant Hamiltonians the time reversal mapping , defined by
(39) with the position and the velocity of particle , has the property
(40) Usually, maps onto some (where may or may not equal ), so . So if some evolves to with , then evolves to , and its entropy decreases.

Recurrence (Zermelo’s objection) shows that cannot forever be nondecreasing; thus, cannot stay forever in once it reaches . (The Poincaré recurrence theorem states that under conditions usually satisfied in , every trajectory , except for a set of measure zero of s, returns arbitrarily close to at some arbitrarily late time.)
Contrary to von Neumann’s statement quoted in the beginning of Section 3, the second law as formulated in (34) is not refuted by either objection.
3.3 NonEquilibrium
The term “nonequilibrium” is sometimes understood (Gallavotti, 2003) as referring to so called nonequilibrium steady states (NESS), which concerns, for example, a system coupled to two infinite reservoirs of different temperature; so is an open system heated on one side and cooled on another, and it will tend to assume a macroscopically stationary (“steady”) state with a temperature gradient, a nonzero heat current, and a positive rate of entropy production (Onsager, 1931; Bergmann and Lebowitz, 1955; Derrida, 2007; Goldstein et al., 2017a). In contrast, in this Section 3 we are considering a closed system (i.e., not interacting with the outside), and “nonequilibrium” refers to any phase point in . Examples of nonequilibrium macro states include, but are not limited to, states in local thermal equilibrium but not in (global) thermal equilibrium (such as systems hotter in one place than in another). Other examples arise from removing a constraint or wall; such macro states may have been in thermal equilibrium before the constraint was removed but are not longer so afterwards; for example, think of a macro state in which all particles are in the left half of a box; for another example, suppose we could turn on or off the interaction between two kinds of particles (say, “red ones” and “blue ones”), and think of a macro state that is a thermal equilibrium state when the interaction is off (so that the red energy and the blue energy are separately conserved) but not when it is on, such as when both gases are in the same volume but at different temperatures.
3.4 Concrete Example: The Boltzmann Equation
Here is a concrete example of a partition of phase space due to Boltzmann. Divide the 1particle phase space into cells (say, small cubes of equal volume) and count (with a given tolerance) the particles in each cell. The macro state is described by the list of all these occupation numbers; for convenience, we will normalize them:
(41) 
with the tolerance in counting and again the nearest integer. This example of a partition is good for dilute, weakly interacting gases but not in general (Garrido et al., 2004; Goldstein and Lebowitz, 2004) (see also Section 3.8).
Boltzmann considered billiard balls of radius in a container , so . In a suitable limit in which , , and the cells become small, the normalized occupation numbers become a continuous density . He argued (convincingly) that for most this density, essentially the empirical distribution of the particles in , will change in time according to the Boltzmann equation, an integrodifferential equation (Boltzmann, 1872, 1898; Ehrenfest and Ehrenfest, 1911; Lanford, 1976). It reads, in the version appropriate for the hard sphere gas without external forces,
(42) 
with the “collision term”
(43) 
involving a constant and the abbreviations
(44)  
(45) 
for the outgoing velocities of a collision between two balls with incoming velocities and and direction between the centers of the two balls. The Boltzmann equation is considered for and along with a boundary condition representing that balls hitting the boundary of get reflected there. A function is a stationary solution if and only if it is independent of and a Maxwellian (i.e., Gaussian) in —that is, if and only if it represents thermal equilibrium. Correspondingly, nonequilibrium macro states correspond to any density function that is not a global (i.e., independent) Maxwellian.
The entropy^{10}^{10}10See Footnote 7. turns out to be, neglecting terms of lower order in ,
(46) 
In the limit of small , this becomes (9), i.e., in terms of the functional. Boltzmann further proved the theorem, which asserts that for any solution of the Boltzmann equation,
(47) 
with equality only if is a local Maxwellian. The theorem amounts to a derivation of the second law relative to the partition under consideration.
3.5 Rigorous Result
Some authors suspected that Boltzmann’s vision, and the Boltzmann equation in particular, was not valid. For example, Khinchin (1941, §33, p. 142) complained about individualist accounts of entropy:
All existing attempts to give a general proof of this postulate [i.e., ,] must be considered as an aggregate of logical and mathematical errors superimposed on a general confusion in the definition of the basic quantities.
But actually, the Boltzmann equation (and with it the increase of entropy) is rigorously valid for most phase points in , at least for a short time, as proved by Lanford (1975; 1976). Here is a summary statement of Lanford’s theorem, leaving out some technical details:
Theorem 1.
Let and (the constant in the Boltzmann equation) be constants. For a very large number of billiard balls of (very small) radius with , for every , for any nice density in with mean free time , and for a coarse graining of into cells that are small but not too small, most phase points with empirical distribution (relative to and within small tolerances) evolve in such a way that the empirical distribution of (relative to ) is close to , where is the solution of the Boltzmann equation with initial datum .
It is believed but not proven that the Boltzmann equation is valid for a much longer duration, maybe of the order of recurrence times. The method of proof fails after , but it does not give reason to think that the actual behavior changes at .
Where is the famous Stosszahlansatz, or hypothesis of molecular chaos, in this discussion? This hypothesis was stated by Boltzmann as specifying the approximate number of collisions with parameter between particles from cells and within a small time interval. In our discussion it is hidden in the assumption that the initial phase point be typical in : Both Theorem 1 and the wording (34) of the second law talk merely about most phase points in , and for most phase points in () it is presumably true that the number of upcoming collisions is, within reasonable tolerances, given by the hypothesis of molecular chaos, not just at the initial time but at all relevant times. We will discuss molecular chaos further in Section 3.7.
3.6 Empirical vs. Marginal Distribution
Many mathematicians (e.g., Kac (1956); Cercignani et al. (1994) but also Tolman (1938)) considered the Boltzmann equation in a somewhat different context with a different meaning, viz., with not the empirical distribution but the marginal distribution. This means the following.

The empirical distribution, for a given phase point , is the distribution on with density
(48) As such, it is not a continuous distribution but becomes roughly continuouslooking only after coarse graining with cells in that are not too small (so that the occupation numbers are large enough), and it becomes a really continous distribution only after taking a limit in which the cells shrink to size 0 while fast enough for the occupation numbers to become very large.

The marginal distribution starts from a distribution on phase space and is obtained by integrating over the positions and velocities of particles (and perhaps averaging over the number of the particle not integrated out, if was not permutation invariant to begin with). The marginal distribution can also be thought of as the average of the (exact) empirical distribution: the empirical distribution associated with becomes a continuous function when is averaged over using a continuous .
For example, Kac (1956, first page) wrote:
is the average number of molecules in ,
whereas Boltzmann (1898, §3 p. 36) wrote:
let […] be the number of molecules whose velocity components in the three coordinate directions lie between the limits and , and , and [.]
Note that Kac wrote “average number” and Boltzmann wrote “number”: For Kac, was the marginal and for Boltzmann the empirical distribution.^{11}^{11}11That is also why Boltzmann normalized so that , not : a marginal of a probability distribution would automatically be normalized to 1, not , but if means the empirical density then it is natural to take it to mean the density of particles in , which is normalized to , not 1.
Of course, the (coarsegrained) empirical distribution is a function of the phase point , and so is any functional of it, such as ; thus, the empirical distribution can serve the role of the macro state , and that of Boltzmann entropy. This is not possible for the marginal distribution.
So why would anybody want the marginal distribution? Kac aimed at a rigorous derivation of the Boltzmann equation in whatever context long before Lanford’s theorem, and saw better chances for a rigorous proof if he assumed collisions to occur at random times at exactly the rate given by Boltzmann’s hypothesis of molecular chaos. This setup replaces the time evolution in phase space (or rather, since Kac dropped the positions, in dimensional velocity space) by a stochastic process, in fact Markov jump process. (By the way, as a consequence, any density on space tends to get wider over time, and its Gibbs entropy increases, contrary to the Hamiltonian evolution.) So the mathematician’s aim of finding statements that are easier to prove leads in a different direction than the aim of discussing the mechanism of entropy increase in nature.
Another thought that may lead authors to the marginal distribution is that certainly cannot be an integer but must be an infinitesimal, so it cannot be the number of particles in but must be the average number of particles. Of course, this thought neglects the idea that as long as is finite, also the cells should be kept of finite size and not too small, and the correct statement is that is the number of particles in (or, depending on the normalization of , times the number of particles); when followers of Boltzmann express the volume of as , they merely express themselves loosely.
3.7 The Past Hypothesis
Lanford’s theorem has implications also for negative : For most phase points in , the Boltzmann equation also applies in the other time direction, so that entropy increases in both time directions! (See Figure 3.) That is, before time 0 the Boltzmann equation applies with the sign of reversed.
It is generally the case, not just for a dilute gas of hard spheres, that entropy increases in both time directions for most phase points in . This fact leads to the following worry: If Lanford’s theorem, or the statement (34), persuaded us to expect that entropy increases after the time we chose to call , should it not persuade us just as much to expect that entropy decreases before ? But this decrease does not happen. Does that mean we were unjustified in expecting from Lanford’s theorem or (34) that entropy increases after ?
Here is a variant of this worry in terms of explanation. Statement (34) may suggest that the explanation for the increase of entropy after is that this happens for typical phase points . But if we know that entropy was lower before than at , then was not typical. This undercuts the explanation considered for the behavior after .
Here is the resolution of this worry. The assumption really made by Boltzmann’s followers is not that the phase point at the beginning of an experiment is typical in but the following:
Past hypothesis. The phase point of the universe at the initial time of the universe (presumably the big bang) is typical in its macro set , where has very low entropy.  (49) 
Here, “typical” means that in relevant ways it behaves like most points in . The ways relevant here are features of the macro history of the universe shared by most points in .
Given that entropy keeps increasing, the initial macro state must be one of extremely low entropy; one estimate (Penrose, 1989) yields Joule/Kelvin less than the thermal equilibrium entropy at the same energy; thus, must have extremely small volume compared to the relevant energy shell. All the same, we do not know very clearly what actually is; one proposal is known as the Weyl curvature hypothesis (Penrose, 1979).
A related worry at this point may arise from the observation (for any macroscopic system or specifically a hard sphere gas as considered for the Boltzmann equation) that if the time evolution of lies (say) in a macro set of much greater volume, then phase points coming from would be atypical in . So if the prediction of entropy increase after was based on the assumption that be typical in , then it could not be applied to . So why should entropy still increase at ? Because Lanford’s theorem says so—at least until . But even after that time, it is plausible that the Boltzmann equation continues to be valid (and therefore entropy continues to increase) because it is plausible that the number of upcoming collisions of each type agrees with the value specified by the hypothesis of molecular chaos (see, e.g., Lebowitz, 2008). That is because it is plausible that, for typical , contains very special correlations in the exact positions and velocities of all particles concerning the collisions before , but not concerning those after . Likewise, we would expect for a general macroscopic system, unless the dynamics (as given by the Hamiltonian) is specially contrived for the partition , that coming from a typical behaves towards the future, but of course not towards the past, like a typical point in , as stated by the second law (34). The whole reasoning does not change significantly if is distributed over several of much greater volume, instead of being contained in one .
Putting this consideration together with the past hypothesis, we are led to expect that the Boltzmann entropy of the universe keeps increasing (except for occasional insignificant entropy valleys) to this day, and further until the universe reaches thermal equilibrium. As a consequence for our presentday experiments:
Development conjecture. Given the past hypothesis, an isolated system that, at a time before thermal equilibrium of the universe, has macro state appears macroscopically in the future, but not the past, of like a system that at time is in a typical micro state compatible with .  (50) 
This statement follows from Lanford’s theorem for times up to , but otherwise has not been proven mathematically; it summarizes the presumed implications of the past hypothesis (i.e., of the low entropy initial state of the universe) to applications of statistical mechanics. For a dilute gas, it predicts that its macro state will evolve according to the Boltzmann equation in the future, but not the past, of , as long as it is isolated. It also predicts that heat will not flow from the cooler body to the hotter, and that a given macroscopic object will not spontaneously fly into the air although the laws of mechanics would allow that all the momenta of the thermal motion at some time all point upwards.
By the way, the development conjecture allows us to make sense of Tolman’s ([)§23, p. 60]Tol38 “hypothesis of equal a priori probabilities,” which asserts
that the phase point for a given system is just as likely to be in one region of the phase space as in any other region of the same extent which corresponds equally well with what knowledge we do have as to the condition of the system.
That sounds like is always uniformly distributed over the containing , but that statement is blatantly inconsistent, as it cannot be true at two times, given that for any . But the subtly different statement (50) is consistent, and (50) is what Tolman should have written.
The past hypothesis brings out clearly that the origin of the thermodynamic arrow of time lies in a special property (of lying in ) of the physical state of the matter at the time of the big bang, and not in the observers’ knowledge or their way of considering the world. The past hypothesis is the one crucial assumption we make in addition to the dynamical laws of classical mechanics. The past hypothesis may well have the status of a law of physics—not a dynamical law but a law selecting a set of admissible histories among the solutions of the dynamical laws. As Feynman (1965, p. 116) wrote:
Therefore I think it is necessary to add to the physical laws the hypothesis that in the past the universe was more ordered, in the technical sense, than it is today—I think this is the additional statement that is needed to make sense, and to make an understanding of the irreversibility.
Making the past hypothesis explicit in the form (49) or a similar one also enables us to understand the question whether the past hypothesis could be explained. Boltzmann (1898, §90) suggested tentatively that the explanation might be a giant fluctuation out of thermal equilibrium, an explanation later refuted by Eddington (1931) and Feynman (1965; 1995). (Another criticism of this hypothesis put forward by Popper (1976, §35) is without merit.) Some explanations of the past hypothesis have actually been proposed in highly idealized models (Carroll, 2010; Barbour et al., 2013, 2014, 2015; Goldstein et al., 2016); it remains to be seen whether they extend to more adequate cosmological models.
3.8 Boltzmann Entropy Is Not Always
Jaynes (1965) wrote that
the Boltzmann yields an “entropy” that is in error by a nonnegligible amount whenever interparticle forces affect thermodynamic properties.
It is correct that the functional represents (up to a factor ) the Boltzmann entropy (2) only for a gas of noninteracting or weakly interacting molecules. Let us briefly explain why; a more extensive discussion is given in (Goldstein and Lebowitz, 2004).
As pointed out in Section 3.1, the coarse grained energy (28) should be one of the macro variables, so that the partition into macro sets provides a partition of the energy shell. If interaction cannot be ignored, then the functional does not correspond to the Boltzmann entropy, since restriction to the energy shell is not taken into account by . When interaction can be ignored there is only kinetic energy, so the Boltzmann macro states based on the empirical distribution alone determine the energy and hence the functional corresponds to the Boltzmann entropy.
4 Gibbs Entropy as Fuzzy Boltzmann Entropy
There is another connection between Gibbs and Boltzmann entropy that is not usually made explicit in the literature; it involves interpreting the Gibbs entropy in an individualist spirit as a variant of the Boltzmann entropy for “fuzzy” macro sets. By this we mean using functions instead of sets as expressions of a macro state : some phase points look a lot like , others less so, and quantifies how much. The point here is to get rid of the sharp boundaries between the sets shown in Figure 1, as the boundaries are artificial and somewhat arbitrary anyway. A partition into sets is still contained in the new framework as a special case by taking to be the indicator function of , , but we now also allow continuous functions . One advantage may be to obtain simpler expressions for the characterization of since we avoid drawing boundaries. The condition that the form a partition of can be replaced by the condition that
(51) 
Another advantage of this framework is that we can allow without further ado that is a continuous variable, by replacing (51) with its continuum version
(52) 
It will sometimes be desirable to normalize the function so its integral becomes 1; we write for the normalized function,
(53) 
So what would be the appropriate generalization of the Boltzmann entropy to a fuzzy macro state? It should be times the log of the volume over which is effectively distributed—in other words, the Gibbs entropy,
(54) 
Now fix a phase point . Since is now not uniquely associated with a macro state, it is not clear what should be. In view of (52), one might define to be the average . Be that as it may, if the choice of macro states is reasonable, one would expect that different s for which is significantly nonzero have similar values of , except perhaps for a small set of exceptional s.
Another advantage of fuzzy macro states is that they sometimes factorize in a more convenient way. Here is an example. To begin with, sometimes, when we are only interested in thermal equilibrium macro states, we may want to drop nonequilibrium macro states and replace the equilibrium set in an energy shell by the full energy shell, thereby accepting that we attribute wildly inappropriate s (and s) to a few s. As a second step, the canonical distribution
(55) 
is strongly concentrated in a very narrow range of energies, a fact well known as “equivalence of ensembles” (between the microcanonical and the canonical ensemble). Let us take and as a continuous family of fuzzy macro states. As a third step, consider a system consisting of two noninteracting subsystems, , so , and . Then factorizes,