From æther theory to Special Relativity 1footnote 11footnote 1Preliminary version of the article written for Handbook of Spacetime, A. Ashtekar and V.Petkov (eds.), Springer-Verlag GmbH (Heidelberg), in press.

From æther theory to Special Relativity 111Preliminary version of the article written for Handbook of Spacetime, A. Ashtekar and V.Petkov (eds.), Springer-Verlag GmbH (Heidelberg), in press.

RAFAEL FERRARO222ferraro@iafe.uba.ar. Member of Carrera del Investigador Científico of CONICET.
Instituto de Astronomía y Física del Espacio, C.C. 67, Sucursal 28, 1428 Buenos Aires, Argentina
and
Departamento de Física, Facultad de Ciencias Exactas y Naturales,
Universidad de Buenos Aires, Ciudad Universitaria, Pabellón I, 1428 Buenos Aires, Argentina
Abstract

At the end of the 19 century light was regarded as an electromagnetic wave propagating in a material medium called ether. The speed appearing in Maxwell’s wave equations was the speed of light with respect to the ether. Therefore, according to the Galilean addition of velocities, the speed of light in the laboratory would differ from . The measure of such difference would reveal the motion of the laboratory (the Earth) relative to the ether (a sort of absolute motion). However the Earth’s absolute motion was never evidenced.

Galileo addition of velocities is based on the assumption that lengths and time intervals are invariant (independent of the state of motion). This way of thinking the spacetime emanates from our daily experience and lies at the heart of Newton’s Classical Mechanics. Nevertheless, in 1905 Einstein defied Galileo addition of velocities by postulating that light travels at the same speed in any inertial frame. In doing so, Einstein extended the principle of relativity to the electromagnetic phenomena described by Maxwell’s laws. In Einstein’s Special Relativity the ether does not exist and the absolute motion is devoid of meaning. The invariance of the speed of light forced the replacement of Galileo transformations with Lorentz transformations. Thus, relativistic length contractions and time dilations entered our understanding of the spacetime. Newtonian mechanics had to be reformulated, which led to the discovery of the mass-energy equivalence.

I Space and time in Classical Mechanics

Until 1915, when Einstein’s General Relativity radically changed our way of thinking, the spacetime was regarded as the immutable scenery where the physical phenomena take place. The laws of Mechanics, which describe the motion of a particle subject to interactions, were written to work in this immutable scenery. The form of these laws strongly depends on the properties attributed to the spacetime. Classical Mechanics relies on the assumption that distances and time intervals are invariant. This assumption, which seems to be in agreement with our daily experience, leads to the Galilean addition of velocities which prevents invariant velocities in Classical Mechanics.

i.1 Invariance of distances and time intervals

Classical Mechanics –the science of mechanics founded by Newton– considered that the space is properly described by Euclid’s plane geometry. Then there exist Cartesian coordinates (, , , so the distance between two points placed at (,, and (,, can be computed by means of the Pythagorean formula

(1)

In addition, Classical Mechanics regards distances and time intervals as invariant quantities. Let us explain the meaning of this property with an example of our daily life concerning the invariance of time intervals. Mario frequently flies from Buenos Aires to Madrid; he knows that the journey lasts 12 hours as measured by his watch. This time, Mario wants his friend Manuel to pick him up at Madrid airport. When the fly is near to depart, Mario calls Manuel who tells him that it is 9 am in Madrid. Then Mario asks Manuel to wait for him at 9 pm in Madrid airport, just when the plane will be landing. This way of arranging a meeting assumes that the time elapses in the same way both in the plane and at earth. Of course, it seems to be a good assumption because it effectively works in our daily life. We call invariant a magnitude having the same value in different frames in relative motion (as the plane and the earth in the previous example). Classical Mechanics considers that not only time intervals are invariant but the distances too. In particular, the length of a body is assumed to be independent of its state of motion. We can “verify” this assumption in our daily life. For instance, we can measure a train by spreading a tape measure on the train. The so obtained length will seem to agree with a measure performed along the rail while the train is traveling. Notice that measuring the length of a moving body requires some care; the length is the distance between simultaneous positions of the ends of the body. In the case of the train, we can imagine that the rail is provided with sensors detecting the stretch of rail the train takes up at each instant. We can then determine the length of such stretch of rail by means of a tape measure identical to the one used on the train.

The invariance of distances and time intervals are properties supported by our daily experience. It could be said that space and time look to us as separated concepts, and this separation seems not to be affected by the choice of frame. This somehow naive way of regarding the space and the time is a key piece in the construction of Classical Mechanics. However, to what extent should we be confident of our daily experience? Does our daily experience cover the entire range of phenomena, or it is rather limited? Let us use a familiar example to explain what we are trying to mean: we could well believe that the earth surface is flat if just a little portion of it were accessible to us. However, we realize that the earth surface is nearly spherical by considering it at larger scales. In this example, the scale should be comparable to the globe radius. In the case of the behavior of distances and time intervals under changes of frame, the scale in question is the relative velocity between the frames. How could we be sure that the invariance of distances and time intervals is nothing but an appearance caused by the narrow range of relative velocities covered by our daily experience? As we will explain in Section IV, Einstein’s Special Relativity of 1905 abolished the invariance of distances and time intervals on the basis of new physics developed in the second half of the 19 century.

i.2 Addition of velocities

Velocities are not invariant in Classical Mechanics. Let us consider the motion of a passenger along a train traveling the rail at 100 m/s. The train and the earth are two possible frames to describe the motion of the passenger; they are in relative motion at 100 m/s. It is evident that the velocity of the passenger is different in each frame. For instance, the passenger could be at rest on the train, and thus moving at 100 m/s with respect to the earth. If the passenger walks forward at a velocity of 1 m/s, then it advances 1 meter on the train (as measured by a tape measure fixed to the train) each 1 second (as measured by a clock fixed to the train). Now, how fast does he/she move with respect to the earth? The answer to this simple question depends on the properties of distances and time intervals under change of frame. Since Classical Mechanics assumes that distances and time intervals are invariant, then we can state that the passenger advances 1 meter on the train each 1 second as measured by a clock and a tape measure fixed to the earth (but otherwise identical to those fixed to the train). Besides, in this frame also the train advances at the rate of 100 meters each 1 second. Then, the passenger displaces 101 meters per second. Thus his/her velocity in the frame fixed to the earth is 101 m/s . This addition of velocities is a direct consequence of the classical invariance of distances and time intervals. It means that velocities are not invariant in Classical Mechanics; they always change by the addition of . On the contrary, Einstein’s Special Relativity will rebuild our way of regarding the space and the time by postulating an invariant velocity: the speed of light ( 299,792,458 m/s). The postulate of invariance of the speed of light implies the abandonment of our belief in the invariance of distances and time intervals so strongly rooted in our daily experience. Therefore, deep theoretical and experimental reasons should be alleged to propose such a drastic change of mind. In fact, the idea of invariance of the speed of light is theoretically linked to Maxwell’s electromagnetism and the principle of relativity, as will be analyzed in Section III. Besides, at the end of the 19 century there was enough experimental evidence about the invariance of . However those experimental results were not correctly interpreted until Special Relativity came on stage.

The existence of an invariant speed provides us with a scale of reference to understand why distances and time intervals seem to be invariant in our daily life: according to Special Relativity, distances and time intervals behave as if they were invariant when the compared frames (the train, the plane, the earth, etc.) move with a relative velocity <<. So, it is just an appearance; like the earth surface, that seems to be flat if it is only explored in distances much smaller than the globe radius.

i.3 Coordinate transformations

An event is a point in the spacetime. It represents a place in the space and an instant of time; it is a “here and now”. An event is characterized by 4 coordinates; we will use 3 Cartesian coordinates , , , to localize the place of the event plus its corresponding time coordinate . Cartesian coordinates are distances measured with rules along the Cartesian axes of the frame. The coordinate is measured by clocks counting the time from an instant conventionally chosen as the time origin.

Figure 1 shows two frames and in relative motion; the and axes have the direction of the relative velocity . By comparing distances in the frame , we can state

(2)

In the frame , the distance between –the coordinate origin of – and the place is the coordinate of : . On the other hand, the distance between the origins and increases with time; if is constant and the time in is chosen to be zero when both origins coincide, then . Thus

(3)

Figure 1: Frames and moving at the relative velocity .

We are not allowed to replace the left member with , since . Classical Mechanics, however, assumes that distances have the same value in all the frames. Thus, we obtain the Galileo transformations:


Galileo transformations

(4a)
(4b)
(4c)

We have added the transformations of the Cartesian coordinates , transversal to the relative motion of the frames. These are distances between a given place and the straight line shared by the and axes; according to the classical invariance of distances, they are equal in and .

The classical transformations of the coordinates of an event is completed by considering the invariance of time intervals; so we state that (we are choosing a common time origin for and ). Remarkably, the relation also results from the transformation (4.a) with the help of a simple physical argument: as frames and are on an equal footing, then the respective inverse transformation should look like (4.a) except for the sign of (if moves towards increasing values of in , then moves towards decreasing values of in ; thus the relative velocity changes sign). Therefore

(5)

Then, by adding (4.a) and (5) one obtains

(6)

Galileo addition of velocities

A moving particle traces a succession of events in the spacetime. This world-line can be described by equations , , , which are summarized in a sole vector equation for the position vector r(. According to Galileo transformations (4), the position vector transforms as

(7)

where the invariance of time, , has also been used. By differentiating Eq. (7), it results the Galileo addition of velocities, i.e. the relation between the velocities of the particle in two different frames due to the movement composition with the relative translation between both frames:

(8)

Velocities are not invariant under Galileo transformations. However, the relative velocity between two particles is invariant:

(9)

Galilean invariance of the acceleration

Since V is uniform, the differentiation of Eq. (8) yields the Galilean invariance of the acceleration:

(10)

Ii Relativity in Classical Mechanics

Mechanics describes the motion of interacting particles by means of equations governing the particle world-lines. These equations of motion, together with the initial conditions, yield the coordinates of particles as functions of time: , , . To write the equations of motion we combine the laws of dynamics with the laws of the interactions. Both types of laws must have the same form in all the inertial frames. This is the principle of relativity in Mechanics, which expresses that all the inertial frames are on an equal footing. However, whether a given law consummates or not the principle of relativity is a matter depending on the properties attributed to the space and time.

ii.1 Newton’s laws of dynamics

Newton constructed the dynamics on the basis of three laws 1 ():


First law (principle of inertia): free particles move with constant velocity (they describe straight world-lines in spacetime).


Second law: a particle acted by a force acquires an acceleration that is proportional to the force:

(11)

The proportionality constant is a property of the particle called mass. In terms of the momentum , the law reads .


Third law (action-reaction principle): two particles interact by simultaneously exerting each other equal and opposite forces.

First law is a particular case of the second law (the case ); it establishes the tendency to perdurability as the main feature of motion (as it was envisaged by Galileo 2 (), Gassendi 3 () and Descartes 4 (), in opposition to the Aristotelian thought). On the other hand, the second law becomes the particle equation of motion, once the force is given as a function of , , , etc. Then, a law for the involved interaction is also required (which can be gravitational, electromagnetic, etc.). The third law implies the conservation of the total momentum of an isolated system of interacting particles. In fact, the reciprocal forces and between two particles and satisfies , since they are equal and opposite. If these are the only forces on each particle, we can use the second law to obtain . Thus is a conserved quantity. This argument can be extended to prove the conservation of the total momentum of any isolated system of particles.

Classical Mechanics allows for interacting forces at a distance. They are derived from potential energies depending on the distances between particles, which automatically provide interaction forces accomplishing Newton’s third law.

ii.2 Newton’s absolute space

Newton’s fundamental laws of dynamics are not formulated to be used in any frame. In fact, it is evident that the first law cannot be valid in any frame, since a constant velocity u in a frame does not imply a constant velocity in another frame . This can be easily understood by considering cases where rotates or accelerates with respect to . However if translates uniformly with respect to , either the particle has constant velocities u, in both frames or in none of them. Galileo addition of velocities (8) is a particular example of this general statement. In fact, Galileo transformations (4) were obtained for two equally oriented moving frames; thus, they are in relative translation (absence of relative rotation). Besides the translation is uniform, since the velocity V is constant. Thus is constant in (8) if and only if u is constant.

Although the principle of inertia cannot be valid in any frame, at least it is true that if it is valid in a frame , then it will be valid in any other frame uniformly translating with respect to . Can we extent this statement to the second law? Second law involves the particle acceleration. In Galileo transformations, the acceleration is invariant. Besides, the forces in Classical Mechanics depend on distances (like gravitational and elastic forces) or relative velocities (like the viscous force on a particle moving in a fluid, which depends on the velocity of the particle relative to the fluid). Both the distances and the relative velocities are invariant under Galileo transformations. In this way, each side of second law (11) is invariant under changes of frames in relative uniform translation. Therefore, the invariance of distances and time intervals, which leads to Galileo transformations, is a key piece in the Newtonian construction because it allows the second law to be valid in a family of frames in relative uniform translation. This is the family of inertial frames, and this is the content of the principle of relativity:


Principle of relativity

The fundamental laws of Physics have the same form in any inertial frame.


For instance, the same physical laws describe a free falling body both in a plane and at the earth surface. The principle of relativity in Classical Mechanics tells us that the state of motion of the frame cannot be revealed by a mechanical experiment: the result of the experiment will not depend on the motion of the frame because it is ruled by the same laws in all the inertial frames.

But how can we recognize whether a frame is inertial or not? We could effectively recognize a particle in rectilinear uniform motion; if we were sure that the particle is free of forces, then we would conclude that the frame is inertial. However, Mechanics allows not only for contact forces but for forces “at a distance”. So how can we be sure that a particle is free of forces? Newton was aware of this annoying weakness of its formulation; he then considered that the laws of Mechanics described the particle motion in the absolute space. Thus, the inertial frames are those fixed or uniformly translating with respect to Newton’s absolute space.

While the inertial frames are defined by their states of motion with respect to Newton’s absolute space, this (absolute) motion is not detectable, since the principle of relativity puts on an equal footing all the inertial frames; actually, only relative motions are detectable. Absolute space in Classical Mechanics plays the essential role of selecting the privileged family of inertial frames where the fundamental laws of Physics are valid; but, surprisingly, it is not detectable. In some sense absolute space acts, because it determines the inertial trajectories of particles, but it does not receive any reaction because it is immutable. Leibniz 5 () criticized this feature of the Newtonian construction, by demanding that Mechanics were aimed to describe relations among particles instead of particle motions in the absolute space. In practice, however, Newton’s mechanics is successful because we can choose frames where the non-inertial effects are weak or can be understood in terms of inertial forces that result from referring the frame motion to another “more inertial” frame.

As advanced in Section I.B, Special Relativity will abandon the invariance of distances and time intervals. Then, Galileo transformations will be abandoned too. This means that Newton’s second law (11) and the character of fundamental forces will suffer a relativistic reformulation. However the inertial frames will still keep their privileged status devoid of a sound physical basis; this issue will be only re-elaborated in General Relativity.

Iii The theory of light and the absolute motion

In the second half of the 19 century light was regarded as electromagnetic mechanical waves governed by Maxwell’s laws. These waves were perturbations of a medium called ether; they propagate at the speed relative to the ether. However, the ether could not be evidenced, nor directly neither indirectly. Several experiments did not succeed in revealing the Earth’s motion relative to the ether (a sort of absolute motion), and some forced hypothesis about the interaction between matter and ether were introduced to give account of these null results.

iii.1 The finiteness of the speed of light

As mentioned in Section I.B, velocities are not invariant in Classical Mechanics. Actually, only an infinite velocity would remain invariant under Galileo addition of velocities (8). Are there infinite speeds in nature? Many philosophers (Aristotle among them) thought that the speed of light was infinite. The issue of whether the speed of light was finite or infinite has been the object of debate from the ancient times. In the 17 century, the question was still open. While Kepler and Descartes argued in favor of an infinite speed of light, Galileo proposed a terrestrial test that, however, was not suitable to determine such a large speed. But at the end of 17 century, contemporarily to Newton’s development of Mechanics, an answer came from the Astronomy side.

In 1676 Rømer 6 () noted that the time elapsed between the observations of successive eclipses of Io –the closest of Jupiter’s great moons– was larger when the Earth traveled its solar orbit moving away from Jupiter and shorter when the Earth moved towards Jupiter. Rømer realized that such deviations in this otherwise periodical phenomenon were the sign of a finite speed of light. In fact, if the Earth were at rest, then we would observe one eclipse each 42.5 hours (the orbital period of Io). However, if the Earth moves away from Jupiter, the time between successive observations of the emersions of Io from the shadow cone will be enlarged; this happens because the light coming from the second emersion travels a longer distance at a finite velocity to reach the Earth. This delay, together with the length traveled by the Earth in 42.5 hours, led to the first determination of the speed of light. By recording the accumulative delay of many successive eclipses, Rømer found that the light traveled the diameter of the Earth’s orbit in 22 minutes (the actual value is 16 minutes) 7 ().

Fifty years later, Bradley 8 () discovered the aberration of starlight. Bradley observed that the light coming from a star suffers annual changes of direction in the frame translating with the Earth. The nature of these changes highly disturbed Bradley because they unexpectedly differed from the stellar parallax he was looking for (a tiny effect only measured one hundred years after). Eventually, Bradley concluded that the stellar aberration discovered by him was a consequence of the vector composition (8) between the speed of light and the Earth’s motion around the Sun at 30 km/s. By measuring the aberration angle, Bradley obtained the speed of light within an error of 1% 9 (). In 1849 Fizeau 10 () carried out the first terrestrial measurement of the speed of light. Like any finite velocity, the speed of light is not a Galilean invariant.

iii.2 The wave equation

At the middle of the 19 century the dispute about the corpuscular or undulatory character of light seemed to be settled in favor of the wave theory of light. The corpuscular model sustained by Newton and many other scientists could not explain the totality of the luminous phenomena. In 1821 Fresnel 11 () completed his wave theory of light, so giving a finished mathematical form to the undulatory model proposed by Huygens in 1678 12 (). This theory included the concepts of amplitude and phase to describe interference and diffraction; besides, the light was presented as a transversal wave to explain the phenomena concerning polarization. In 1850 Foucault 13 () measured the speed of light in water, and verified the value / ( is the refractive index) as predicted by the wave theory in opposition to the corpuscular model.

At that time, the light waves were considered matter waves like sound or the waves on the water surface of a lake. Physics and Mechanics were synonymous; so, any phenomenon was regarded as a mechanical phenomenon, and light did not escape the rule. Matter waves propagate in a material medium; they are but medium oscillations carrying energy. In the simplest cases, they are governed by the wave equation

(12)

where represents the perturbation of the medium (for instance the longitudinal oscillations of density and pressure when sound propagates in a gas, or the transversal displacement of a string in a musical instrument). Any function is a solution of the wave equation (12); it represents a perturbation that travels in the -direction, without changing its form, at the constant speed . The general solution is a combination of solutions traveling in all directions.

The wave equation (12) is not written to be used in any inertial frame. It only describes the wave propagation in a frame fixed to the medium. In fact, the wave equation changes form under Galileo transformations. Let us take the -sector of the Laplacian and write:

(13)

where , (or , ). This shows that the wave equation would keep its form in different inertial frames moving along the -axis if were proportional to ; but this is not true in Galileo transformations (4), (6). The fact that the equation governing mechanical waves is fulfilled just in the frame where the medium is at rest does not imply the violation of the principle of relativity. The medium is a physical reason for privileging an inertial frame; furthermore, the Eq. (12) will be accomplished whatever be the inertial frame where the medium is at rest. Actually, the wave equation for mechanical waves can be obtained from the fundamental laws of Mechanics –which certainly accomplish the principle of relativity– under some assumptions valid in the frame fixed to the medium. In this derivation, the propagation velocity results from the properties of the propagating media.

iii.3 The æther theory

In Fresnel’s theory, light was a mechanical wave that propagates in a medium called the ether luminiferous, and was the “velocity of the ethereal molecules”. The speed of light was a property of the ether. To be the seat of transversal waves, the ether should be an elastic material; it was strange that no longitudinal waves existed in this elastic medium. Besides, to produce such enormous propagation velocity, the ether should be extremely rigid. The ether should fill the universe, because light propagates everywhere. It was logical to think the ether as at rest in Newton’s absolute space; the ether became a sort of materialization of Newton’s absolute space.

But such omnipresent substance should produce other mechanical effects, apart from the luminous phenomena. How can planets move through the ether without losing energy? Would the ether penetrate through the moving bodies without disturbing them or it would be dragged by them? If air is pumped out of a bottle, then the sound will cease to propagate inside the bottle; however, the light will still propagate, meaning that the ether was not evacuated together with the air (why?). The ether looked like an elusive intangible substance without any other effect than being the seat of the luminous phenomena.

iii.4 Maxwell’s electromagnetism

In 1873 Maxwell 14 () published his Treatise on Electricity and Magnetism, where electricity and magnetism appeared as two parts of a sole entity: the electromagnetic field. Maxwell’s laws for the electromagnetic field contained as particular cases the well known electrostatic interactions between charges and magnetostatic interactions between steady currents. But the very Maxwell’s achievement was to discover that variable electric and magnetic fields –E and B– create each other. This mutual feedback between electricity and magnetism generates electromagnetic waves. In fact, Maxwell’s equations in the absence of charges lead to wave equations (12), with the Cartesian components of E and B playing the role of . In the electromagnetic wave equations the propagation velocity is . In SI units, is chosen to define the unit of electric current, and is experimentally determined through electrostatic interactions; their values are NA, NAms. To Maxwell’s surprise, the value of coincided with the already measured speed of light; so Maxwell concluded that light was an electromagnetic wave.

Maxwell conceived the electromagnetic waves as a mechanical phenomenon in a propagating medium. Therefore, he believed that his equations were valid in a frame fixed to the medium. The recognition of light as an electromagnetic wave then identified the electromagnetic medium with the luminiferous ether. On another hand, the action of the field on a charge –the Lorentz force – depended on the velocity u of the charge. This velocity was regarded as the velocity of the charge with respect to the ether (the charge absolute velocity).

Differing from Classical Mechanics, Maxwell’s electromagnetism will fit the Special Relativity without changes. Einstein will defy the classical viewpoint by considering that Maxwell’s equations should be valid in any inertial frame. If so, the speed of light would be invariant (i.e., it would have the same value in any inertial frame). To sustain this idea, Galileo transformations should be replaced with transformations leaving invariant the speed of light; this implies the abandonment of the classical invariance of distances and time intervals. In Special Relativity, Maxwell’s electromagnetism will become a paradigmatic theory.

iii.5 The search for the absolute motion

Although the ether resisted a direct detection, at least it could be indirectly tested. In the second half of the 19 century, several experiments were aimed to test the Earth’s motion with respect to the ether (the Earth’s absolute motion). While was considered the speed of light in the frame fixed to the ether, the speed of light in the Earth’s frame should result from composing with the Earth’s absolute motion , according to the Galilean addition of velocities (8). Therefore, some of these experiments were based on the time the light takes to travel a round-trip along a straight path (the light comes back after being reflected by a mirror). To exemplify the idea, we will choose the path to be parallel to the (unknown) Earth’s absolute motion. According to Galileo addition of velocities, the speed of light in the Earth’s frame is when light goes, and when light comes back. If is the length the light covers in each journey, then the total time of the round-trip is

(14)

As can be seen, the Earth’s absolute motion enters the result as a correction of the second order in A correction of even order was, in fact, expectable because the traveling time (14) does not change if the Earth’s motion is reversed. To be conclusive, the experiments should be able to detect at least a value 10. This is because the Earth orbits the Sun at 30 km/s 10; then, even if the Earth were at rest in the ether when the experiment is performed, it would move at 60 km/s six months later. Therefore, any experimental array based on the traveling time (14) should reach a sensitivity of 10. Such strong constraint could be circumvented by experimental arrays sensitive to the change ; if so, the result could be of the first order in . This the case of the experiment performed by Hoek 15 () in 1868, where the symmetry is broken because one of the stretches of the round-trip was not in air but in water; in this stretch, the speed replaces in Eq. (14). However, Hoek’s interferometric device was not effective for determining the Earth’s absolute motion.

There were also two experiments, sensitive to the first order in , that involved Snell’s law. In 1871 Airy 16 () measured Bradley’s stellar aberration with a vertical telescope filled with water. Bradley had measured the annual variation of the aberration angle produced by the Earth’s orbit around the Sun. This variation did not reveal the Earth’s absolute motion V but just the changes of V. Airy’s experiment, instead, took into account that the aberration implied that the telescope was not oriented along the direction of the light ray in the ether’s frame. If Snell’s law were valid in the ether frame, then an additional refraction would take place when the light entered the water in the telescope. This additional refraction would change the view angle to the star by a quantity of the first order in . Nevertheless, Airy’s experiment did not reveal the Earth’s absolute motion. Much earlier, in 1810, Arago 17 () covered a half of the objective of a telescope with a prism, to obtain a second image of the stars. To see the image through the prism, the telescope direction had to be corrected in an angle equal to the deviation angle of the prism. Arago believed that the light refraction in the prism could depend on the velocity of light relative to the prism, which results from the vector composition (8) of the speed of light with the absolute motion of the prism (i.e., the Earth’s absolute motion). This effect could be revealed by observing stars in several directions to get different vector compositions. However, Arago did not notice any change of the deviation angle.

Fresnel 18 () searched reasons for Arago’s null result. In the context of the ether theory, he found that the null result could be explained, at the first order in , by advancing a curious hypothesis: an (absolute) moving transparent substance partially drags the ether contained in its interior. The partial dragging is such that the phase velocity of light –the displacement per unit of time of the wave fronts–, as measured in the frame fixed to the universal ether (rather than the ether inside the substance) is not / but

(15)

where is the propagation direction, V is the absolute motion of the transparent substance and is its refractive index. In practice, Fresnel’s dragging coefficient caused the fulfillment of Snell’s law in the frame fixed to the transparent substance (at the first order in ). Fresnel’s hypothesis explained why Arago did not succeed in his endeavor: the deviation angle of the prism was always the one predicted by Snell’s law, irrespective of the absolute motion of the prism. Besides, it also explained the null result in Airy’s experiment because no additional refraction will be produced if Snell’s law is valid in the frame fixed to the telescope (in this frame the ray of light and the telescope are equally oriented). Moreover, the partial dragging (15) cancels out the first order effects in the time (14) when one of the stretches is not in air but in another transparent substance; so, it also explained Hoek’s null result (Hoek’s device was not sensitive enough to test second order effects).

Fresnel’s partial dragging of ether was measured by Fizeau 19 () in 1851. Since Special Relativity will reject the existence of the ether, Fizeau’s measurement will require a relativistic interpretation. On the other hand, the fulfillment of Snell’s law in the frame fixed to the transparent substance is completely satisfactory in Special Relativity, because that is the only physically privileged frame. For a detailed analysis of the experiments pursuing the absolute motion in connection with Fresnel’s hypothesis, see References 20, , 21, .

iii.6 Michelson-Morley experiment

In 1881 Michelson designed an interferometer aimed to detect the Earth’s absolute motion. In Michelson’s interferometer the light traveled round-trips completely in air. So, the challenge was to achieve a sensitivity of 10. Figure 2 shows the scheme of Michelson’s interferometer. The beam of light emitted by an extensive source is split into two parts by a half-silvered glass plate. After travelling mutually perpendicular round-trips, both parts join again to be collected by a telescope where interference fringes are observed (Fizeau’s fringes 22 ()). The fringes are caused by a slight misalignment of the mirrors; this implies that the images of the mirrors at the telescope form a wedge. The wedge causes that rays 1 and 2 arrive at the telescope with a phase-shift that changes according to the thickness of the wedge at the place where the rays bounced. So, the phase-shift will be different for each one of the rays in the beam; therefore, bright and dark fringes will be observed at the telescope. Notice that and do not need to be equal, but should be smaller than the coherence length of light to preserve the interference pattern.

For each ray in the beam, the phase-shift between parts 1 and 2 determines whether they produce a bright or a dark fringe. This phase-shift results from the times , the rays 1 and 2 employ to cover their respective round-trips; these times depend on the distances , and the velocities , of the rays in the laboratory. , are the result of the vector composition (8) between the speed in the ether frame and the Earth’s absolute motion V; , are clearly different, since the vector composition depends on the direction of each ray. Moreover, if the interferometer were gradually rotated then the velocities , would gradually change. In this way, the rotation of the interferometer would affect the fringes: the position of the bright fringes would gradually displace. Instead, if the interferometer were at rest in the ether, then the fringes would not displace because rays 1 and 2 would travel at the speed irrespective of the orientation of the interferometer. Thus, the displacement of the fringes would be the indication of the Earth’s absolute motion.

Figure 2: Scheme of Michelson’s interferometer.

Let us compute the times , when the arm is oriented along the still unknown absolute motion V. In such case, the ray 1 has speeds , , and the time is given by Eq. (14). On the other hand, the ray 2 is orthogonal to V in the laboratory frame; so the vector composition to obtain the value of is the one shown in Figure 3. As can be seen, the ray 2 goes to the mirror and comes back with a speed . Then, the round-trip along the arm takes a time

(16)

The phase-shift is ruled by the time difference

(17)

Figure 3: Galilean composition of velocities for ray 2.

If the interferometer is rotated , then the arm corresponding to the ray 2, will be aligned with V; so the result will be

(18)

Although the Earth’s absolute motion V is unknown, a gradual rotation will make the interferometer to pass through these two extreme values separated by a right angle. Thus a displacement of the fringes will be observed, in connection with the change of given by

(19)

This change is equivalent to the displacement of ( fringes ( is the light wavelength).

After a failed attempt in 1881, Michelson joined Morley to improve the experimental sensitivity. In 1887 they possessed an interferometer whose arms were 11 m long (this was achieved by means of multiple reflections in a set of mirrors). Then, it was expected at least a result of 0.4. However, no displacement of fringes was observed 23 (); 24 (); 25 (). Michelson was convinced that the null result meant that the Earth carried a layer of ether stuck to its surface. If so, the experiment would have been performed at rest in the local ether, which would explain the null result. Lodge 26 () tried to confirm this hypothesis by unsuccessfully looking for effects due to the ether stuck to a fast rotating wheel. In a revival of the corpuscular model, Ritz 27 () then proposed that light propagates with speed relative to the source. This hypothesis combined with other assumptions about the behavior of light when reflected by a mirror (emission theories) does explain the null result of Michelson-Morley’s experiment with a source at rest in the laboratory, but is refuted by a varied body of experimental evidence 28 (); 29 (); 30 ().

iii.7 FitzGerald-Lorentz length contraction

Lorentz thought that Michelson-Morley’s null result could be understood in a very different way. He considered that a body moving in the ether suffered a length contraction due to its interaction with the ether. The interaction would contract the body along the direction of its absolute motion V, but the transversal dimensions would not undergo any change. In fact, if the contraction factor is applied to in Eq. (17) and in Eq. (18) (i.e., the dimensions along the absolute motion direction in each case), then both time differences will result to be equal, and the expression (19) will vanish. This Lorentz’s proposal of 1892 31 () had been independently advanced by FitzGerald 32 () three years before. This proposal did not mean the abandonment of the belief in the invariance of lengths. The contraction was a dynamical effect; it depended on an objective phenomena: the interaction between two material substances. The contraction should be observed in any frame, and all the frames should agree about the value of the contracted length.

The idea that light was a material wave (i.e., the idea that Maxwell’s laws were written to be used only in the ether frame) and the belief in the invariance of distances and time intervals lead Physics to a blind alley. While complicated dynamical explanations were elaborated to interpret experimental results, like Fresnel’s partial dragging of ether and FitzGerald-Lorentz length contraction caused by the ether, the experimental results were not so complicated; they just said that the absolute motion cannot be detected. However, unless Physics get rid of some classical misconception, such a reasonable conclusion will not fit with its theoretical body.

Iv Einstein’s Special Relativity

In 1905 Einstein postulated that “the same laws of electrodynamics and optics will be valid for all frames of reference for which the equations of mechanics hold good33 (). In this way, Einstein proclaimed that Mawell’s electromagnetism does not possess a privileged system; Maxwell’s laws can be used in any inertial frame. Thus, Einstein raised Maxwell’s laws to the status of fundamental laws satisfying the principle of relativity (as stated in Section II.B). In doing so, Einstein closes the possibility of detecting the state of motion of an inertial frame by electromagnetic means. The ether does not exist; the electromagnetic waves are not material waves. The inertial frames are not endowed with a property V (its absolute motion or the “ether wind”); only the velocity describing the relative motion between inertial frames makes physical sense. Besides, the Snell’s law is valid in the frame where the refracting substance is at rest, whatever this frame is.

An immediate consequence of the use of Maxwell’s laws in any inertial frame is that light in vacuum propagates at the speed in any inertial frame; is an invariant velocity (“light is always propagated in empty space with a definite velocity c which is independent of the state of motion of the emitting body33 ()). The existence of an invariant velocity implies that Galilean addition of velocities is a classical misconception to be got rid of; such step entails the revision of the classical belief in the invariance of distances and time intervals.

iv.1 Relativistic length contractions and time dilations

We will re-elaborate the transformations of spacetime coordinates without prejudging about the behavior of distances and time intervals, but subordinating them to the invariance of the speed of light. Figure 4 shows a particle traveling between the ends of a bar, as seen in the frame where the bar is fixed and the frame where the particle is fixed. The relative motion bar-particle is characterized by the sole velocity . It is useful to call proper length the length of the bar at rest. Notice that, since all inertial frames are on an equal footing, the length of the bar will be in any inertial frame where the bar is at rest. Instead, we could expect a different length in a frame where the bar moves lengthways at a relative velocity . For this reason, in Figure 4 the bar is represented with different lengths in each frame. In the frame fixed to the bar (proper frame of the bar) the particle takes a time to cover the length ; then, it is /. On the other hand, in the frame fixed to the particle, the ends of the bar take a time to pass in front of the particle; then /. We should not prejudge about the nature of time; then, we are opening the possibility that the time interval between the same pair of events be different in each frame. It is also useful to call proper time the time between events as measured in the frame where the events occur at the same place (if such a frame exists). In our case, the events are the passing of each end of the bar in front of the particle; they occur at the same place in the frame where the particle is fixed. So, we have computed the same value of with lengths and times measured in two frames that relatively moves at a velocity . Thus, we conclude that

(20)

Figure 4: Relative motion bar-particle in the proper frames of the bar (left) and the particle (right).

Each side in Eq. (20) could only depend on the relative velocity between the considered frames. Then, the Eq. (20) says that each side is the same function of :

(21)

In Classical Physics is assumed to be 1. On the contrary, in Special Relativity the value of will be subordinated to the invariance of the speed of light. It should be remarked that Eq. (21) is not deprived of assumptions about the nature of spacetime. In fact, the quotients / and / could also depend on the event of the spacetime where the measurements take place and the orientation of the bar. Eq. (21) actually assumes that the spacetime is homogeneous and isotropic; these assumptions will be revised in General Relativity.

On one hand, Eq. (21) expresses the relation between the length of a bar moving at a velocity and its proper length . On the other hand, Eq. (21) expresses the relation between the times elapsed between two events as measured in the frame where they occur at the same place (proper time and other frame moving at a velocity relative to the former one (. As Eq. (21) shows, both ratios are strongly interconnected.

The relations (21) are independent of the particular case examined in Figure 4. To obtain we will now study a case involving the speed of light, where the relations (21) will enter into play too. Figure 5 shows a bar of proper length supporting at its ends a source of light and a mirror. Let us consider the time elapsed between the emission of a pulse of light from the source and its return to the source. Both events occur at the same place in the proper frame of the bar; then, the proper time is the time the light takes to cover the distance 2 at the speed :

(22)

In another frame where the bar moves at a velocity (but light still propagates at the speed , we will decompose the time between events as . When light goes towards the mirror at the speed it covers the distance plus the displacement of the mirror . Instead, when light returns to the source it covers the distance due to the displacement of the source. Therefore,

(23)

Figure 5: A light pulse traveling a round trip between the ends of a bar, as regarded in the frame where the bar moves with velocity V.

Solving these equations for , one gets

(24)

We divide Eqs. (22) and (24), and use (21) for obtaining the function :

(25)

Then, replacing in Eq. (21) we get the expressions for the relativistic length contraction and time dilation:

(26)

Noticeably, the relativistic length contraction has the same form proposed by FitzGerald and Lorentz to explain the null result of Michelson-Morley experiment. However, its meaning is completely different. Lorentz considered that the contraction was a dynamical effect produced by the interaction between a body and the ether. For Lorentz, in Eq. (26) was the velocity of the body with respect to the ether, and the contraction was measured in all the frames. In Relativity, instead, the length contraction is a kinematical effect. The bar looks contracted whatever be the frame where it moves at the velocity ; besides, it has its proper length whatever be the frame where the bar is at rest.

Length contractions and time dilations are not perceptible in our daily life because we compare frames moving at relative velocities <<. One of the first direct evidences of this phenomenon came from measuring the length traveled by decaying particles moving at a speed close to , as compared to their half-life measured at rest 34 ().

iv.2 Lengths transversal to the relative motion

The device of Figure 5 is also useful to explore the behavior of the dimensions transversal to the relative motion. Figure 6 shows the device put in a direction orthogonal to the relative motion. Eq. (22) is still valid in the proper frame of the bar. In a frame where the bar transversally displaces at the velocity , the ray of light will travel along an oblique direction (this is nothing but the aberration due to the composition of motions). When the pulse of light goes towards the mirror, it covers in a time the hypotenuse of a right triangle whose legs are and . Since the light travels at the speed in any frame, we get

(27)

We remark the use of Pythagoras’ theorem in this expression. This means that we assume the space is endowed with a flat geometry; this assumption will be revised in General Relativity. Due to the symmetry of the path traveled by the light, it is , then

(28)

We divide Eqs. (22) and (28), and use (21) to get that transversal lengths are invariant:

(29)

Figure 6: The round trip of light between the ends of a bar, as regarded in a frame where the bar displaces transversally at the velocity .

iv.3 Lorentz transformations

We are now in position of reanalyzing the transformation of the Cartesian coordinates of an event. Let us come back to the Eq. (3) where the relation between | and is pending. By definition, the coordinate is the distance measured by a rule fixed to the frame : | . This rule looks contracted in the frame ; according to Eq. (26) it is |. Therefore,

(30)

is the transformation that replaces (4.a). We can now reproduce the argument of Section (I.C) to obtain the transformation of the time coordinate of an event. Since frames and are on an equal footing, the inverse transformations have the same form, except for the change . In particular, the inverse transformation of (30) is

(31)

Eq. (30) can be replaced in (31) to solve as a function of , . Besides, due to the relativistic invariance of the transversal lengths (see Eq. (29)), the transformations (4.b), (4.c) remain valid. Finally, we obtain the Lorentz transformations:

(32a)
(32b)
(32c)
(32d)

where , . Lorentz transformations (32) express the relativistic transformation of the coordinates of an event, when the inertial frame is changed for an equally oriented inertial frame that moves along the (shared) -axis at the relative velocity . Notice that, since the transformation (32) is homogeneous, the same event is the coordinate origin for and . Figure 7 shows the lines constant (i.e., const) and constant (i.e., const) in the plane vs . Figure 7 also displays a ray of light passing the coordinate origin and traveling in the direction; its world-line is a straight line at because . Galileo transformations (4) are the limit of Lorentz transformations (32).

Figure 7: Coordinate lines of in the plane vs. .

The transformations (32) were independently obtained by Lorentz 35 (); 36 () and Larmor 37 () as the linear coordinate changes leaving invariant the form of Maxwell’s wave equations (see also Voigt 38 ()). In fact, the null coordinates , transform as , , so leaving invariant the form of the wave equation (13) for . In other words, the d’Alembertian operator

(33)

is invariant under transformations (32). In 1905 Poincaré 39 () underlined the group properties of relations (32) and called them Lorentz transformations.

In 1905 Einstein re-derived the Lorentz transformations and gave to the rank of real time measured by clocks at rest in . In Einstein’s Special Relativity the physical equivalence of the inertial frames, which is the content of the principle of relativity, means that the fundamental laws of Physics keep their form under Lorentz transformations rather than Galileo transformations. Maxwell’s laws accomplish this relativistic version of the principle of relativity, once the transformations of the fields are properly defined. Actually, Maxwell’s electromagnetism is the paradigm of a relativistic theory. The electromagnetic Lorentz force is a typical relativistic force; its magnetic part depends on the charge velocity relative to the inertial frame. But, which part of the field is electric and which one is magnetic depends on the frame as well; even if the force is entirely electric in a given frame, it will have a magnetic part in another frame. On the contrary, Classical Mechanics fulfilled the principle of relativity under Galileo transformations; then, the Mechanics needed a reformulation to accommodate to the relativistic meaning of the principle of relativity.

iv.4 Relativistic composition of motions

The composition of motions that replaces the Galilean addition of velocities is obtained by differentiating the Eqs. (32) and taking quotients. Notice that

(34)

Therefore,

(35a)
(35b)

The procedure can be repeated to transform the accelerations. Contrasting with Galilean transformations, the acceleration is far to be invariant under Lorentz transformations.

Eqs. (35) can be combined to get ; it is easy to verify that

(36)

Since <1 (otherwise Lorentz transformations should be ill-defined), then both hand sides of Eq. (36) have the same sign. Therefore and are both lower, equal or bigger than ; this is an invariant property of the speed.

As an application of transformations (35), let us compute the speed of light when light propagates in a transparent substance that moves at the velocity ; then, /, where is the refractive index. We will use the inverse transformations to get (i.e., we change for in Eq. (35.a)):

(37)

This result has the same form that Fresnel’s partial dragging. However, in Eq. (37) is not the velocity of the transparent substance with respect to the ether; it is the motion of the transparent substance relative to an arbitrary inertial frame. What Fizeau measured in 1851 was a relativistic composition of motions.

iv.5 Relativity of simultaneity. Causality

Two events 1 and 2 (two points in the spacetime) are simultaneous if they have the same time coordinate: . In Classical Physics the time is invariant; so the simultaneity of events possesses an absolute meaning. However, in Special Relativity does not imply . Then the simultaneity acquires a relative meaning; it is frame-depending. In fact, the pairs of events that are simultaneous in the frame lie on horizontal lines ( constant) in Figure 7; these lines cross the constant lines. Therefore the events simultaneous in have different time coordinate in .

To understand why the simultaneity is relative in Special Relativity, let us consider a bar of proper length which is equipped with a source of light at its center. In the proper frame of the bar, a pulse of light will arrive simultaneously at both ends of the bar, because it covers the same distance /2 at the same speed in both directions In another frame the bar is moving but light still propagates at the speed in any direction. Thus, the pulse will arrive before at the rear end of the bar because this end moves towards the pulse of light. Then, the same pair of events (the arrivals of the light to the ends of the bar) is not simultaneous in a frame where the bar is moving. Moreover, since which end is at rear depends on the direction of the motion (i.e., it depends on the frame), the temporal order of this kind of events can be inverted by changing the frame.

Figure 8: a) In the proper frame of the bar, the pulses of light arrive at the ends of the bar at the same time. b) In a frame where the bar is moving, the light arrives at the rear end before than the front end. In both frames the speed of light is (the rays of light are lines at ).

Figure 8 shows the world-lines of the ends of the bar and the pulses of light both in the bar proper frame and a frame where the bar moves to the left (then moves to the right relative to , so it is >0). In Figure 8a the ends of the bar are described by vertical world-lines because the positions are fixed. In Figure 8b the world-lines have the slope corresponding to the velocity V the bar has in the frame . In both frames the light travels at the speed . Events and are simultaneous in the proper frame of the bar (Figure 8a), and they occur at a distance . Then, , (, etc.). The time elapsed between and in the frame can be obtained by means of Lorentz transformations. Since Lorentz transformations are linear, they are equally valid for the differences of coordinates of a pair of events. So, Eq. (32.a) also means

(38)

Then it is in Figure 8.b. This result could be also achieved by applying elementary kinematics in the frame , and using the length contraction .

In any case, Eq. (38) says that and cannot be both zero (apart from the case where the events are coincident). Moreover, and in Eq. (38) could even have opposite signs, which would amount to the inversion of the temporal order of the events. This alteration of the temporal order in Lorentz transformations would be acceptable only for pairs of events without causal relation; otherwise it would constitute a violation of causality. Remarkably, the violation of causality is prevented because the speed of light cannot be exceeded in Special Relativity. As it will be shown in Section V, is an unreachable limit velocity for massive particles. Consistently, it is <1 in Lorentz transformations. Therefore, those pairs of events such that r> cannot be in causal relation because neither particles nor rays of light can connect them. For instance, in Figure 8 the events and cannot be in causal relation because their spatial separation is larger than their temporal separation. This property does not depend on the chosen frame, as can be checked in the transformations (32) or inferred from Eq. (36). On the contrary, the pairs of events having |r| can be causally connected. But in this case, it results that ||<||. Thus || is not large enough to invert the temporal order in Eq. (38); so causality is preserved.

The relativity of simultaneity usually is the explanation to some “paradoxes” in Special Relativity. For instance, let us consider two bars having the same length if compared at relative rest. Then, if they are in relative motion, each one will appear shorter when regarded from the proper frame of the other one. How could this make sense? It makes sense because the length of a bar results from comparing the simultaneous positions of its ends. Since the simultaneity is not absolute in Special Relativity, then a length measurement performed in is not consistent in .

iv.6 Proper time of the particle

While those events having > admit a frame where they occur at the same time (or, moreover, frames where their temporal order is inverted), those events having < admit a frame where they occur at the same place. This is a consequence of the symmetric form of Eqs. (32.a) and (32.b). From a more physical standpoint, the events having < can be joined by a uniformly moving particle. The proper frame of the particle effectively realizes the inertial frame where both events occur at the same place: the events occur at the (fixed) position of the particle. These observations show that the concept of proper time, as defined in Section IV.A, applies to pairs of events whose spatial separation is smaller than the temporal separation.

In general, any moving particle causally connects events. Figure 9 shows the world-line of a particle that moves non-uniformly. Since the world-line cannot exceed the angle of characterizing the speed of light, any pair of events on the world-line of the particle will satisfy <. Let us consider two infinitesimally closed events, like those shown in Figure 9 corresponding to the times and dt. The frame where these two events occur at the same place is the proper frame of the particle moving at the speed . Let us rewrite the Eq. (36) with the help of Eq. (34) to get

(39)

As is seen, this is a combination of speed and time of travel which has the same value in any frame: it is invariant. By comparing with Eq. (26) one realizes that the invariant (39) is nothing but the proper time elapsed between the infinitesimally closed events. In other words, (39) is the time measured by a clock fixed to the particle; it is the proper time of the particle:

(40)

Figure 9: Two infinitesimally closed events belonging to the world-line of nonuniformly moving particle. They are causally connected: .

This expression can be integrated along the world-line to get the total time measured by a clock that moves between a given pair of causally-connectable events. Clearly, the integral depends on the world-line the clock uses to join the initial and final events (it depends on the function . It is easy to prove that the total proper time is maximized along an inertial world-line. This result is related to the so called twin paradox. The paradox refers to twin brothers who separate because one of them has a space voyage. When they meet again, the “inertial” brother who remained at the Earth is older than the astronaut. Actually this result is not paradoxical; the brothers are not on an equal footing because Special Relativity confers a privileged status to the inertial frames.

iv.7 Transformations of rays of light

Let us consider a monochromatic plane solution of the Eq. (12) for waves traveling at the speed of light:

(41)

where the unitary vector is the propagation direction, and is the frequency. Let us use the inverse Lorentz transformations to rewrite the phase of the wave in terms of coordinates in :

(42)

Since the d’Alembertian operator (33) keeps the same form if rewritten in coordinates of , the result (42) should be reinterpreted as . Therefore, one obtains:


Doppler effect for light

The frequency in the frame is

(43)

Factor is absent in classical Doppler effect. It implies that the frequency shift exists even if the propagation direction is orthogonal to V (transversal Doppler effect) due to time dilation. The first verification of the relativistic Doppler frequency shift was made in 1938 40 ().


Light aberration

Besides, it is . If is the angle between the ray of light and the -axis, then it is . Thus, the propagation direction transforms as

(44)

The aberration angle is ; is very small whenever it is <<1. So, we can approach . Besides, the right-hand side of Eq. (44) can be approached by Therefore,

(45)

which is the Galilean approach Bradley used to obtain the speed of light from the annual variation of the starlight aberration.

V Relativistic Mechanics

While the principle of inertia remains valid in Special Relativity, instead Newton’s second law has to be reformulated because it does not satisfy the principle of relativity under Lorentz transformations (forces and accelerations behave differently under Lorentz transformations). The relativistic Mechanics can be constructed from a Lorentz-invariant functional action that reproduces the Newtonian behavior at low velocities. In Special Relativity, energy and momentum are strongly related. The momentum is conserved in any frame if and only if the energy is conserved too. When particles collide, the conservation of the relativistic energy takes the role of the classical mass conservation. However, the relativistic energy is a combination of mass and kinetic energy; so, mass can be converted in kinetic energy (or other energies, like the electromagnetic energy associated with photons) and vice versa. Classical interactions at a distance are excluded because the relativity of simultaneity prevents non-local conservations of energy-momentum. Instead, the interactions “at a distance” are realized through mediating fields carrying energy-momentum that locally interact with the particles.

v.1 Momentum and energy of the particle

The fulfillment of the principle of relativity under Lorentz transformations can be achieved by starting from a Lorentz-invariant functional action. In this way, it is guaranteed that different inertial frames will agree about the stationarity of the action. Thus, the same set of Lagrange dynamical equations will be valid in all the inertial frames.

Let us start by building the action of a free particle. This action not only has to be Lorentz invariant but must be equivalent to the classical action when |u|<<. The (invariant) proper time along the particle world-line (40) is the right choice for the functional action of the free particle:

(46)

When |u|<< the Lagrangian goes to . By differentiating the Lagrangian with respect to u one gets the conjugate momentum u of a free particle. One then defines the momentum of the particle as

(47)

(the last step results from Eq. (40)), which goes to the classical momentum u when |u|<<.

Since is invariant (see Eq. (39)), the change of p under Lorentz transformations emanates from the behavior of r. A Lorentz transformation mixes r with c dt. Then p will be mixed with mc dt/, which is a quantity intimately related to the energy. In fact, the Hamiltonian of the free particle is

(48)

Then, we define the energy of the particle as

(49)

The energy is a combination of energy at rest mc and kinetic energy. In fact, by Taylor expanding (49) we obtain

(50)

where is the kinetic energy of the particle in Special Relativity (at low velocities, it coincides with the classical kinetic energy). Notice that the combination of (47) and (49) yields

(51)

which says that the momentum is a flux of energy (as in electromagnetism, where the density of momentum is proportional to the Poynting vector).

Eq. (40) can be used to replace in the energy (49); it yields

(52)

Then is proportional to the ratio of the time dt measured by frame clocks to the respective proper time of the particle. As stated above, the invariance of in Eqs. (47) and (52) implies that (/, p) transforms like () under Lorentz transformations, i.e.:

(53a)
(53b)
(53c)
(53d)

and combine to yield the square particle mass, an invariant result called the energy-momentum invariant:

(54)

Let us differentiate the Eq. (54) to obtain

(55)

or, replacing p with Eq. (51):

(56)

which suggests that the force is associated with p/dt. If so, the Eq. (56) would express the equality between the work of the force and the variation of the energy. Notice that F p/dt implies that the force is not parallel to the acceleration in general, due to the term containing the derivative of . Remarkably, if the work goes to infinity, then the energy diverges and the velocity in (49) goes to . In this way, the speed of light is an unreachable limit for the particle.

In electromagnetism, the interaction of a charge with a given external field is described by adding the action (46) with the term , where and A are the scalar and vector potentials evaluated at the position of the charge. It can be proven that the interaction action is Lorentz-invariant, as required in Special Relativity. The variation of the action leads to the equation of motion

(57)

where and . In Eq. (57) we recognize the Lorentz force on the left-hand side, and the derivative of the relativistic momentum (47) on the right-hand side. In 1908 Bucherer 41 () observed the movement of an electron in an electrostatic field, and obtained an incontestable evidence of the validity of the relativistic dynamics expressed in Eq. (57). If the charge is initially at rest in a uniform static field E, then we integrate the Eq. (57) to get . So, goes to when goes to infinity.

v.2 Photons

In 1905 Einstein 42 () stated that the photoelectric effect could be better understood by proposing that light interacts with individual electrons by exchanging packets of energy ( is Planck’s constant and is the frequency of light). In this way, the understanding of light-matter interactions required a new concept where light shared characteristics of both wave and corpuscle. In 1917 Einstein 43 () convinced himself that the quantum of light should be also endowed with directed momentum, like any particle. The reality of the photon was confirmed by Compton’s experiment in 1923 44 (), where the energy-momentum exchange between a photon and a free electron was measured. The energy and momentum of photons traveling along the direction,

(58)

are those of a particle having zero mass (cf. Eq. (54)) and the speed of light (cf. Eq. (51)). Lorentz transformations (53) for the energy and the momentum (58) become the transformations (43) and (44) for the frequency and the propagation direction of a ray of light 45 ().

v.3 Mass-energy equivalence

In Relativity, the conservations of momentum and energy cannot be dissociated. While the conservation of momentum comes from the symmetry of the Lagrangian under spatial translations, the conservation of energy results from the symmetry under time translation. However space and time are frame-depending projections of the spacetime. Space and time intermingle under Lorentz transformations. Consequently, the conservation of momentum in all the inertial frames requires the conservation of energy and vice versa. This conclusion is evident in the transformations (53) where energy and momentum mix under a change of frame; so, the momentum would not be conserved in frame if the energy were not conserved in . In sum, the conserved quantity associated to the symmetry of the Lagrangian under spacetime translations is the total energy-momentum.

In Classical Mechanics, instead, the transformation of the momentum of the particle does not involve its energy. In fact, if Eq. (8) is multiplied by the mass, then the transformation is obtained. Thus, an isolated system of interacting particles conserves the total momenta in all the inertial frames irrespective of what happens with the classical energy. Noticeably, is conserved whenever is conserved because the total mass is assumed to be a conserved quantity (classical principle of conservation of mass). This is no longer true in Special Relativity. For instance, let us consider the plastic collision between two isolated particles of equal mass . In the center-of-momentum frame the (conserved) total momentum vanishes; so the particles have equal and opposite velocities before the collision. In the collision, the masses stick together and remain at rest. If no energy is released, then the conservation of energy implies

(59)

where is the mass of the resulting body. Since >1, then it is >2; in fact, the resulting body contains the masses of the colliding particles and their kinetic energies. In Einstein’s words, “the mass of a body is a measure of its energy-content” 46 ().

In general, the mass (energy at rest) of a composed system includes not only the masses of its constituents but any other internal energy as measured in the center-of-momentum frame. For instance, a deuteron is constituted by a proton and a neutron. The deuteron mass is lower than the addition of the masses of a free proton and a free neutron; this evidences a negative binding energy among the constituents. The mass defect is 2.22 MeV. In general, when light nuclides merge into a heavier nuclide (nuclear fusion) some energy has to be released to conserve the total energy. On the contrary, the mass of a heavy nucleus is larger than the sum of the masses of its constituents. Therefore, also there is a released energy in the nuclear fission of heavy nuclei. This dissimilar behavior comes from the fact that the (negative) binding energy per nucleon increases with the mass number for light nuclei but decreases for heavy nuclei (the inversion of the slope happens at a mass number around 60).

The kinetic energy can be used to create particles. For instance, a neutral pion can be created in a high energy collision between protons ; the reaction is . This reaction can happen only if a threshold energy is reached to give account of the created particle. The neutral pion has energy at rest (mass) of 134.98 MeV; then, in the center-of-momentum frame the pion is created if each colliding proton reaches the kinetic energy of 67.49 MeV. In such case, all the kinetic energy is used to create the pion; the products remain at rest, since no kinetic energy is left for the products, and the total momentum is conserved. Therefore, the threshold energy of the reaction in the center-of-momentum frame is equal to the energy at rest of the products: 1876.54 MeV 134.98 MeV. In this case, the energy balance is (the particles are approximately free before and after the reaction):

(60)

which means that the velocity of the colliding protons in the center-of-momentum frame is 0.36 . In another frame, the threshold energy is higher because the products must keep some kinetic energy to conserve the (non-null) total momentum. We can use the Eqs. (53) for transforming the total energy-momentum of the system (since the transformations are linear, they can be used to transform a sum of energies and momenta). In the center-of-momentum frame the total momentum is zero; then Eq. (53.a) says that . For instance in the “laboratory frame” where one of the colliding protons is at rest (i.e., it is 1.072 ; deducting the masses of projectile and target, we obtain that the reaction is feasible if the projectile reaches the kinetic energy of 279.67 MeV.

The previous example is a case of inelastic collision. A collision is called elastic if the particles keep their identities. Thus, the masses (energies at rest) before and after the collision are the same; so, the conservation of the energy of the colliding free particles is equivalent to the conservation of the total kinetic energy.

The interaction among charged particles can result in the release of electromagnetic radiation. In such cases the radiation enters the energy-momentum balance in the form of photons. For instance a pair electron-positron annihilates to give two photons (the positron is the anti-particle of the electron; they have equal mass but opposite charge). In the center-of-momentum frame, the photons have equal frequency and opposite directions to conserve the total momentum (notice that at least two photons are needed to conserve the momentum). If is the velocity of both particles in the center-of-momentum frame, then the energy balance is

(61)

Conversely, two photons can create a pair electron-positron. In this case the threshold energy is equal to the mass of two electrons. So the minimum frequency to create the pair in the center-of-momentum frame is given by

(62)

which is a frequency in the gamma-ray range of the electromagnetic spectrum.


Compton effect

In 1923 Compton measured the scattering of X-rays by electrons in graphite. X-ray photons have energies much larger than the electron bound energies. So, the phenomenon can be studied as the elastic collision between a photon and a free electron. In the frame where the electron is initially at rest, its final momentum and energy are

(63)

as results from compensating the changes of momentum and energy suffered by photon and electron (in Eq. (63) the labels and allude to the initial and final states of the photon). The replacement of these values in the electron energy-momentum invariant (54) yields:

(64)

Eq. (64) contains the relation between the ingoing and outgoing photons. Let us call the angle between the initial and final directions of propagation: