A few years ago, at a big physics conference, I was party to an argument about whether we should be teaching the Bohr model of the atom in lower-level physics classes. The argument in favor was that the Bohr model is easy to teach and gives a simple way to think about the structure of atoms. The argument against was that the Bohr model is completely outdated, conceptually inaccurate, and has long been superseded by a more correct theory. The major statement of the opposition argument was that it doesn’t do anyone much good to learn an idea that’s wrong.
How strongly I disagree with that statement!
I, personally, love the Bohr model. It’s founded on a cartoonishly simple way of thinking about quantum mechanical effects, but it can give you a surprisingly solid way of thinking about quantum problems for very little effort. In other words, even when the Bohr model doesn’t give you the exact right answer, it is very good at teaching you how to feel about a quantum system.
The purpose of this post is, more or less, to be a defense of the Bohr model. After outlining what the Bohr model is, I’ll show how the exact same logic gives a very quick and surprisingly accurate sketch of another major phenomenon in the quantum world: Landau quantization. Then, at the end, I’ll wax philosophical a bit about why it’s a mistake to try and teach only “true” ideas in science.
The Bohr model of the atom
The essence of the Bohr model approach is to start by thinking about the problem using only classical physics, and figure out what different states look like. Then, once you’re done, remember that quantum mechanics only allows certain particular ones of those states.
This way of thinking developed very naturally, because at the time the Bohr model was developed (1913) there was no quantum mechanics. So as people were puzzling about how to describe the hydrogen atom, they started with the eighteenth and nineteenth-century physics that they knew, and then tried to figure out how it might be modified by funny “new” stuff.
To see how this works, start by thinking about the Hydrogen atom using only high school-level physics. You have a single electron running around a single proton, and the picture that emerges is that the electron should orbit around the proton in the same way that the earth orbits around the sun. Like this:
You can work out everything about this orbit (by balancing the attractive force between the charges with the centripetal acceleration), and what you’ll find is that orbits with any radius r are possible. Orbits with small r have a large momentum (high speed) and a deeply negative energy, while orbits with a large radius have a small momentum and small energy.
More specifically, the momentum is and the energy is . [Here, is the electron charge, is the electron mass, and is the electric constant.]
Once you’ve figured out everything that would happen for classical electrons, you can remember that quantum mechanics only allows certain kinds of trajectories to be stable. The key idea, which was developed only slowly and painfully during the first few decades of the 20th century, is that moving particles have a wavelength associated with them, called the “de Broglie wavelength” . A larger momentum p implies a shorter wavelength: , where is Planck’s constant. [My way of thinking about the wavelength is that fast-moving particles make short, choppy waves in the quantum field, while slow-moving particles make gentle, long-wavelength ripples.] For a trajectory to be stable, the orbit of the electron needs to have an integer number of wavelengths. Otherwise, the trajectory gets unsettled by ripples in the quantum field. This stability is often demonstrated with pictures like this:
My own personal image for the stability/instability of different quantum trajectories comes from all the time I spent playing in the bathtub as a little kid. Like many kids, I imagine, I used to try and slide back and forth in the tub to get the water sloshing from side to side in big dramatic waves. Like this:
What I found is that making these big “tidal waves” requires you to rock back and forth with just the right frequency. If you continue to rock with that frequency, then you get one big wave moving back and forth, and you can slide around the tub while staying inside the biggest part of the wave as it shifts from one side to the other. But if you try to change your frequency, then suddenly you find yourself colliding with the tidal wave and water goes flying everywhere. This is something like what happens with electrons in the Bohr model. If they travel around their orbits at just the right speed, then they move together with the ripples in the quantum field. But moving at other speeds leads to some kind of unstable mess, and not a stable atom.
…I wonder whether I can go back and explain to my parents that all that water on the floor was really just an important part of my training for quantum mechanics.
Anyway, applying the Bohr stability condition to the classical electron trajectories gives the result that the orbit radius can only have the following specific values:
where (and meters). Correspondingly, the energy can only have the values
Now, the Bohr model is not a true representation of the inside of an atom. The movement of electrons around a nucleus is not nearly so simple as the circular orbits I drew above. The Bohr model also doesn’t tell you anything about how many electrons you can fit on different orbits. And you certainly couldn’t use the Bohr model to predict subtle effects like the Lamb shift. But the Bohr model very quickly tells you some important things: how big the atom is, how deep the energy levels are, and how the energy levels are arranged. And in this case, it happens to get those answers exactly right.
To a certain degree, Bohr was lucky that this line of very approximate reasoning got him the exact right answers. But I think it is often underappreciated how useful the Bohr model is as a paradigm for approaching quantum problems. To illustrate this point, let me use the same exact thinking on another problem that wasn’t figured out until decades after the Bohr model.
As it happens, there is another kind of problem where charges run around in circular orbits, which you are also likely to learn about in a first or second-year physics course. This is the problem of a electrons in a magnetic field. As you might remember, a magnetic field pushes on moving charges, bending their trajectories into circles. Like this:
A magnetic field makes a force that is always perpendicular to the velocity of a moving charge, causing the charges inside the field to run in closed circles. In the classical world, those circles can have any size, but quantum mechanics should select only some of them to be stable.
So what kind of quantum states can electrons in a magnetic field have?
Following the Bohr model philosophy, we can approach this problem by first working everything out as if quantum mechanics did not exist. The physics of charges in a magnetic field is a few hundred years old, and fairly simple. You can use it to figure out that faster charges have bigger orbit radii, according to the relation , where is the strength of the magnetic field. The kinetic energy of the electron is , which in terms of the radius means . So, there is a whole range of classical trajectories with different radii. Those trajectories with larger radius correspond to faster electron speed and larger energy.
Now we can examine this classical picture through the lens of Bohr’s stability criterion, which says that only trajectories with just the right radius can be stable. In particular, only trajectories whose length is an integer multiple of the de Broglie wavelength can survive as stable orbits (remember the “sloshing in the bathtub” analogy). Applying this condition gives:
where, again, . If you put this result for into the kinetic energy, you get:
These discrete energies are called “Landau levels” (after the Soviet Union’s legendary alpha physicist). And while the quantization of magnetic trajectories is not quite as widely known as the hydrogen atom, it has been the source of just as many strange scientific observations, and has kept many scientists (myself included) gainfully employed for more than half a century.
Here the Bohr model approach again provides a quick and easy guide to these energy levels. First, one can see that different energy levels come with a uniform spacing in energy, , where is the “cyclotron frequency.” Second, the radius of the corresponding trajectories grows with the square root of the energy. Finally, the smallest possible cyclotron trajectory has a radius , which is called the “magnetic length.”
These are important (and correct) results which can lead you quite far in conceptual thinking about what magnetic field does to electronic states. And while they were really only appreciated in the second half of the twentieth century (after quantum mechanics had come into full bloom), they could have been mostly derived as early as 1913 using Bohr’s way of thinking.
[As it happens, the formulas above are not exactly correct, as they were in the Bohr model. The correct result for the energy is
So the “Bohr model” type approach gives the exact right answer for the lowest energy level, but is wrong about the spacing between levels by a factor of two.
UPDATE: Here‘s a simple addition you can make to get the answer exactly right.]
What is “truth,” really?
I have no qualifications as a philosopher of science or of anything else. That much should be emphasized. But nonetheless I can’t resist making a larger comment here about the idea that certain scientific ideas shouldn’t be taught because they are “not true.”
Science, as I see it, is not really a business of figuring out what’s true. As a scientist, it is best to take the perspective that no scientific theory, model, or idea is really “true.” A theory is just a collection of ideas that can stick in the human mind as a useful way of imagining the natural world.
Given enough time, every scientific theory will ultimately be replaced by a more correct one. And often, the more correct theory feels entirely different philosophically from the one it replaces. But the ultimate arbiter of what makes good science is not whether the idea an true, but only whether it is useful for predicting the outcome of some future event. (It is, of course, that predictive power that allows us to build things, fix things, discover things, and generally improve the quality of human life.)
It is undeniable at this point that the Bohr model is decidedly not true. But, as I hope I have shown, it is also undoubtedly very useful for scientific thinking. And that alone justifies its presence in scientific curricula.
For most of my (still nascent) scientific career, I have worked on the physics of materials. This likely sounds pretty humdrum to you. To me, at least, the terms “materials” or “materials science” conjure up images of stodgy old nerds meticulously optimizing the chemical composition of some slurry to be used in one or another mind-numblingly specific manufacturing process.
[The likely source of this prejudice is my stodgy old materials science professor from college, who made us spend 4 months memorizing the iron-carbon phase diagram.]
But for a physicist, the study of materials can be something significantly more dramatic and imaginative. In short, every new material is like a new, synthetic universe. It has its own quantum fields that arise from the motions and interactions of the atoms in the material. And these fields give rise to their own kinds of particles, which may look a lot like the electrons, atoms, and photons we’re used to, or which may have completely different rules of engagement.
For example, the recently-discovered graphene is essentially a two-dimensional universe where electrons have no mass and the speed of light is 300 times smaller than its normal value. For a physicist, that’s a dramatic thing.
The downside to working in materials is that it’s often hard to see what’s going on, and to know whether the material you have is the same as the material you think you have. For this reason materials scientists become dependent on a barrage of characterization methods, each of which probes some slightly different aspect of the material’s properties.
In this second edition of Spare Me the Math, I want to describe one of the most crucial and ubiquitous of the material characterization tools: Raman scattering.
In Raman scattering, you shine light on a material and see some of it get reflected back with a different frequency (that is, a different color). Since light is made of individual photons, and the energy of these photons is proportional to their frequency, this shift in frequency means that the light is gaining or losing some of its energy inside the material.
As it turns out, that lost energy goes into exciting a vibration in the material. (Or, conversely, the light can gain energy by stealing some existing vibrational energy from the material). Thus, the shift of the light tells you something about the way the material vibrates, and therefore about what the material is made of.
But how, exactly, is the light frequency getting shifted? How does light energy get mixed up with vibrational energy?
Usually the process of Raman scattering is explained only in terms of some “electron-phonon scattering” and accompanied by opaque diagrams like this one:
or this one. But what is really going on?
In this post I want to try and demystify this process a little bit, by explaining how, exactly, the light frequency gets shifted. It turns out that pretty much everything can be understood by imagining that your material is made of stretchy metal balls.
Imagine, for a moment, a metal ball. Like this:
This metal ball will stand for a molecule; you should imagine that there are bazillions of little metal balls making up your material.
A metal ball is like a molecule in the sense that it can rearrange its electrons a bit to adapt to the presence of an electric field: the ball is polarizable. So when an electric field gets applied, the ball moves some positive charge to one side and some negative charge to the other side. Like this:
In polarizing this way, the ball creates its own electric field that partially counteracts some of the applied electric field.
When an oscillating electric field (a beam of light) is applied to the ball, the electric charge on the ball redistributes to point in the direction of the light’s electric field. Like this:
In this picture, the arrow above shows the direction and magnitude of the electric field coming from the light, and the colors show the induced charge density (red for positive, blue for negative).
It is common to say that by this process of polarizing back and forth, the metal ball “scatters” some of the incident light. But one can just as well say that in sloshing its charge back and forth, the ball creates is own light waves that emanate outward.
(I like this language a little better. When the sky is a beautiful vibrant blue, I like to think how all the little molecules in the air are getting excited by the sun and glowing bright blue light in my direction, like bioluminescent algae.)
In the picture above, however, the ball’s electric charge is oscillating in lock-step with the applied electric field, which means that its own radiated light is at exactly the same frequency as the incoming light. To get the frequency shift implied by Raman scattering, we have to make the ball a bit stretchy.
So imagine now that the metal ball is a bit squishy, and can get stretched out in different directions the same way that a real molecule can. Say, like this:
Crucially, you should notice that in its stretched-out state, the ball responds differently to an applied electric field. Basically, it puts positive and negative charge further apart, and in doing so it does a better job of screening out the applied field. Like this:
Since the ball is elastic, however, it can easily start wobbling back and forth when it gets stretched out. Like this:
The frequency of this wobbling, which I call , depends on how stiff the ball is. Every molecule has its own characteristic wobbling frequency (or rather, its own set of frequencies, one for each different way it can be excited.)
Now, the essence of Raman scattering can be understood by thinking about what happens if you try to apply an oscillating electric field while the ball is wobbling. Basically, the electric polarization frequency and the wobbling frequency both get involved in determining how light is radiated by the ball. The picture is something like this:
Here you can see how the light frequency and wobbling frequency get mixed up. The ball is sometimes at its fattest while the electric field is strongest (making ah enhanced dipole) and sometimes at its thinnest while the electric field is strong (making a weakened dipole). As a result, the ball radiates some light at the original frequency and radiates some light at the shifted frequencies and . Its just like beating in sound waves: when something is the outcome of two frequencies simultaneously, you see (hear) the sum and the difference of the two frequencies also.
And that’s how you can shine light on a material and see some of it come back to you at a different frequency. The point is that the molecules inside the material are like squishy metal balls: they polarize, and they wobble. And so the light that they radiate has information about the frequencies of both processes. By applying light with a known frequency, you can figure out how quickly the material’s constituent molecules wobble, and thereby say something about what they are made of.
You may well ask “what gets the ball wobbling in the first place?” It turns out there are two possible answers. First, the ball can start wobbling just by random kicks that it gets from its thermal environment. In this case the intensity of the scattered light is proportional to the square of the temperature multiplied by the square of the incident electric field.
On the other hand, if the temperature is small or the intensity of the applied light is large, then the ball can start wobbling due to the stretching forces it feels from the light itself. (Briefly, when a molecule gets electrically polarized, it feels a force pushing it apart that is proportional to the strength of the applied field.) This is called “stimulated Raman scattering,” and it produces a scattered light intensity that is proportional to the fourth power of the incident electric field.
In case you hadn’t heard, the universe is governed by four fundamental forces. But when it comes to understanding nature at almost any level larger than a nucleus and smaller than a planet, only one of them really matters: the Coulomb interaction.
The Coulomb interaction — the pushing and pulling force between electric charges — is almost incomprehensibly strong. One common way to express this strength is by considering the forces that exist between two electrons. Two electrons in an otherwise empty space will feel pulled together by their mutual gravitational attraction and pushed apart by the Coulomb repulsion. The Coulomb repulsion, however, is stronger than gravity by 4,000,000,000,000,000,000,000,000,000,000,000,000,000,000 times. (For two protons, this ratio is a more pedestrian times.)
When I was a TA, I enjoyed demonstrating this point in the following way. Take a balloon, and rub it against the top of your head until your hair starts to stand on end. Then stick the balloon to the ceiling, where it stays without falling due to static electricity. Now consider the forces acting on the balloon. Pulling up on the balloon are electric forces between the relatively few electrons I just rubbed off from my hair and the opposite charge that they induce in the ceiling. Pulling down on the balloon are gravitational forces coming from the pull of the entire mass of the Earth. Apparently the electric force created by those few (something like ) electrons is more than enough to counterbalance the gravitational pull coming from every proton, neutron and electron in the planet below it (something like ).
So electric forces are strong. Why is it, then, that we can go about our daily lives without worrying about them buffeting us back and forth?
The short answer is that they do buffet us back and forth. Pretty much any time you feel yourself being pushed or pulled by something (say, the ground beneath your feet or the muscles tied to your skeleton), the electric repulsion between microscopic charges is ultimately to blame.
But a better answer is that the very strength of electric forces is responsible for their seeming quietude. Electric forces are so tremendously strong that nature will not abide having a large amount of electric charge collect in one place. And so electric forces, at the scale of people-sized objects, are largely neutralized.
But what if they weren’t?
When I was a TA I got to walk my students through the following morbid little problem, which helped them see why it is that electric forces don’t really appear on the human scale. Perhaps you will enjoy it. Like most good physics problems, it is thoroughly contrived and, for a new student of physics, at least, its message is completely memorable.
The problem goes like this:
What would happen if your body suddenly lost 1% of its electrons?
Now, 1% may not sound like a big deal. After all, there is almost no reason for excitement or concern when you lose 1% of your total mass. But losing 1% of your electrons, without at the same time losing a equal number of protons, means that suddenly, within your body, there is an enormous amount of positive, unneutralized electric charge. And nature will not abide its strongest force being so unrequited.
I’ll use my own body as an example. My body has a mass of about 80 kg, which means that it contains something like protons, and an almost exactly equal number of electrons. Losing 1% of those electrons would mean that my body acquires an electric charge of electron charges, or about Coulombs.
Now, 4 billion Coulombs is a silly amount of charge. It is about 300 million times more than what gets discharged by a lightning bolt, for example. So, in some sense, losing 1% of your electrons would be like getting hit by 300 million lightning bolts at the same time.
Things get even more dramatic if you start to think about the forces involved.
Suppose, for example, that in their rush to escape my body, those 4 billion Coulombs split in half and flowed to opposite extremities. Say, each hand suddenly acquired a charge of 2 billion Coulombs. The force between those two hands (spread apart, about 6
meters feet) would be Newtons, which translates to about pounds. Needless to say, my body would not retain its structural integrity.
Of course, in addition to the forces pushing the extremities of my body apart, there would also be a force similar in magnitude pulling me toward the ground. You may recall that when an electric charge is next to a grounded surface (like, say, the ground) it induces some opposite charge on that surface in a way that acts like an “image charge” of opposite sign. In my case, the earth would accumulate a huge amount of negative charge around my feet so as to create a force like that of an “image me.”
Because of my 4 billion Coulombs, the force between myself and my “image self” would be something like tons. To give that some perspective, consider that something with the same mass as the planet earth weighs only about tons. So the force pulling me toward the earth would be something like the force of a collision between the earth and the planet Saturn.
But my hypercharged self would not only crush the earth. It would also break open the vacuum itself. At the instant of losing those 1% of electrons, the electric potential at the edge of my body would be about 40 exavolts. This is much larger than the voltage required to rip apart the vacuum and create electron-positron pairs. So my erstwhile body would be the locus of a vacuum instability, in which electrons were sucked in while positrons were blasted out.
In short, if I lost 1% of my electrons, I would not be a person anymore. I would be a bomb. A Coulomb bomb, if you will, with an energy equivalent to that of ten trillion (modern) atomic bombs. Which would surely destroy the planet. All by removing just 1 out of every 100 of my electrons.
The moral of this story, of course, is that nothing of observable size will ever get 1% charged. The Coulomb interaction cannot be thus toyed with. All of chemistry and biology function by the interactions between just a few charges at a time, and their effects are plenty strong as they are.
As a PhD student, I worked on all sorts of problems that involved the Coulomb interaction, and occasionally my proposed solution would be very wrong. The worst kind of wrong was the one that made my advisor remark “What you just created is a Coulomb bomb,” which meant that I had proposed something that wasn’t neutral on the large scale.
It’s one thing to feel like you just solved a problem incorrectly. Its another to feel like your proposed solution would destroy the planet.
Physics and math have a complicated relationship, and I mean that in almost exactly the same way that your Facebook friends mean it.
Allow me to elaborate.
One very legitimate way to view mathematics is as the exploration of a pristine and entirely non-physical universe of numbers and relationships. In this view, which is largely the view of the academic mathematician, the universe of math exists in parallel to our own, real, universe. Each mathematical theorem is a discovery of some feature of that universe, and its correctness does not depend in any way on the physical features of reality. [In the (paraphrased) words of G.H. Hardy, it is true that 2 + 3 = 5, regardless of whether you think “2” stands for “two apples” or “two pennies”, or anything else.] In other words, pure mathematics exists completely independently of the human brain and its interests, and mathematicians are merely its explorers.
Physics, on the other hand, is a much more blue-collar pursuit. The goal of physics is only to describe past observation so that we can predict the outcome of future observations. Of course, such predictions can bring a tremendous amount of practical power, and they provide the foundation for nearly all technological innovation. Physics can also be tremendously interesting, and even aesthetically pleasing (at least to suitably eccentric people like myself). Still, by design physics makes no claim about absolute or human-independent truth, and indeed, the idea of truth outside of observable reality is fundamentally abhorrent to the discipline of physics.
Given this difference in philosophy, it may seem odd that physics has been so hopelessly entangled with mathematics for hundreds of years. The reason for this extended liaison can be seen as a consequence of the remarkable parallels that keep emerging between our own real universe and the mathematical universe. The discoveries of mathematicians keep proving to be useful, for no particularly apparent reason, in creating descriptions of the real universe, and so we continue to exploit them. Still, to a physicist, there is nothing sacred about the use of mathematics. Math is a tool that is useful only insomuch as it can be used as a highly-accurate metaphor for physical reality. Math deals only with exact statements about a “fictitious” universe. But physics must make approximate statements about a “real” universe. If getting a useful descriptive/predictive statement requires abusing the purity and exactitude of mathematics along the way, then so be it.
In short, physicists view math in much the same way that politicians view philosophy. You use it earnestly when you can, and you twist it to suit your own purposes when you can’t.
Part of becoming a physicist is learning to get comfortable with this ethos of exploitation, to one degree or another. One has to get “familiar” with abuses of mathematics, and develop “intuition” as to how far it can be stretched before it yields, under duress, an answer that is wrong, or worse, not even wrong.
Lately I’ve been on a streak of talking about examples where “intuitive” manipulation of mathematics can lead to answers while straightforward calculation is difficult (namely, integrating sin(x) to infinity and deriving the Pythagorean theorem). In this post I thought I would share one more of my favorite abuses of mathematics. This is a derivation of the formula for the Fibonacci sequence.
[I apologize if what follows is a bit stream-of-consciousness-y, but I thought it might help to illustrate the sort of intuitive line of thinking that one (I, at least) would actually follow to get the answer.]
The Fibonacci sequence, in case you have never encountered it, is the sequence of numbers that results from writing first and , and then adding the previous two numbers to get the next one in the sequence. The resulting sequence goes on like this:
Famously, as you go to high numbers in the sequence, the ratio of two successive numbers approaches the golden ratio
What is perhaps less well-known is that you don’t have to count through the sequence one number at a time in order to figure it out. There is a simple formula, , for the sequence, which can tell you any term that you want to know. For example, I can say without doing any tedious addition that the 91st term of the Fibonacci sequence is 4,660,046,610,375,530,309 (about 4 quintillion).
If you want to derive this sequence the way a physicist would, you should start with the following two steps:
1) Write down the exact relation that defines the sequence: .
2) Squint at it until it starts to look like something you already know how to deal with.
In my own personal case, this “squinting” involves thinking about what happens when gets really large. Clearly, at large the sequence grows very quickly; by , is already at 4 quintillion! Usually, when something grows that quickly, it means there is some kind of exponential dependence. Exponential dependencies arise when your rate of growth is proportional to your size (just like logarithmic dependencies arise when your rate of growth is inversely proportional to your size). So now there is a lead to follow: is the rate of growth of in fact proportional to its own value?
In fact, it is, and the simple way to see this is by first replacing by in the definition of the sequence above, so that you have
Now, if you really consider to be large, then you can think of as , where is something small (compared to ). Then the right-hand side of the equation above really looks like a derivative: . This is all the evidence that you need to confirm your suspicion that is indeed proportional to its own derivative, at least at large , and so it should be exponential.
Now you can make an educated guess for : it should be something exponential, like , where and are some unknown constants. This means and . Put these into the equation above, and what you’ll find is
You can solve for , but it’s actually more interesting (and easy) to solve directly for . You can use the quadratic equation for this, and what you find is that there are two solutions:
The first one of those two solutions (the plus sign), is the golden ratio, ! I hope this gives you another feeling of being on the right track.
The fact that there are two solutions for — let’s call them and — means that there are two different kinds of solutions for the governing equation . Namely, these solutions are and . Any combination of these two satisfies the same “Fibonacci relation”, so you can write
Now all that’s left is to figure out the values of and by applying the conditions and . This process gives the final result:
So there you have it. Now you can impress your friends by telling them that the 1776th Fibonacci number is approximately .
You can also see why the ratio of two successive Fibonacci numbers at large gives you the golden ratio: since is smaller than , at large its contribution gets completely eliminated from the sequence, and all you’re left with is , so that .
1. You can make your own “pseudo-Fibonacci” sequence by starting with any two numbers of your choosing, rather than and , and then following the rule of adding the previous two to get the next number. The same formula as above will hold, except that the coefficients and will be different. And the ratio of two subsequent pseudo-Fibonacci numbers will still be equal to (regardless of whether you choose to start your sequence with positive, negative, or even imaginary numbers).
2. It is perhaps funny to notice that since is negative, the quantity only gives a real answer when is an integer. That means that if you think of as a continuous function, then its value lives in the complex plane and only crosses the real axis when is an integer. Like this.
3. The fact that is not real at non-integer means that if someone asks you “what’s the 1.5th term of the Fibonacci sequence?”, you can answer “”.
4. You may have noticed that when I wrote , the right-hand side looks like twice the derivative. This means that at large you should have . In fact, if you work it out, the exact answer is .
5. If I were only slightly less mature, the second sentence of this post would have been:
Namely, Physics uses Math for .
Sometimes I am amazed by the permanence of mathematical discovery. Math, it seems to me, is quite unique among the creative intellectual pursuits (science, art, engineering) for the seemingly unlimited lifetime of its innovations.
For example, Aristotle was a brilliant natural philosopher, as much a genius as just about any modern scientist, and he advanced (what would become) physics tremendously during the 4th century BC. But by now his theory of the five elements is completely unnecessary for anyone to learn. While it produced an important advancement in our thinking, it has been replaced by more correct physical theories. Thus, Aristotle suffered that same fate that meets seemingly every scientist or inventor eventually: further discoveries made him obsolete.
Pythagoras, on the other hand, who lived roughly 200 years before Aristotle, is someone whose major contribution to mathematics is still used every day. I literally could not do my job without the Pythagorean theorem, and neither could just about any scientist or engineer. Unlike nearly all other kinds of innovations, it has very much not been replaced.
What’s important to notice is not just that Pythagoras’s result is still important, but that the type of reasoning that leads to his result is still important. Put simply, a good scientist or engineer needs to be capable of understanding and reproducing a derivation of the 2500-year-old Pythagorean theorem, not just because the theorem is important, but because that level of logical thinking is necessary for his/her job.
So in this post I think it’s worth sharing my own favorite derivation of the Pythagorean theorem. This derivation is the simplest one I know of, and it doesn’t require any tremendous geometric cleverness (like a tangram puzzle) or complicated diagrams. Instead, it relies only on a very basic use of scaling arguments.
Scaling arguments are among the simplest and most powerful tools in theoretical physics. They allow you to reach remarkably concrete conclusions about a problem even when you don’t know essentially any details about the system in question. The key idea is to imagine scaling the system up or down in size, and then saying something about how it should change as you do so.
For example, suppose you don’t know anything about triangles except that they have an area. Since area is measured in units of length squared, you can immediately say that if you take some triangle and make its length times bigger, than its area must get times larger.
then the triangle below, which is the same as the previous one only magnified two times, must have an area .
Meanwhile, all the side lengths of the bigger triangle are exactly two times longer than for the smaller one.
What all this means is that, for a given triangle, the area is proportional to the square of any one of its side lengths. I know this because as I make the triangle times bigger, the side lengths all get times longer, and the area gets times bigger. So if I want I can write
The “something” in that equation depends on the angles in the triangle, but for now let’s assume that I am more or less completely ignorant about triangles and I can’t tell you what it is. Luckily enough for ignorant me, it turns out I don’t need to know what the “something” is in order to prove the Pythagorean theorem.
The key trick is to divide the large triangle into two smaller and completely equivalent triangles. That is, take this triangle:
and draw one line (an altitude through the right angle) so that it gets divided into two smaller triangles, like this:
You can tell that the two newly-created triangles are just scaled-down versions of the original one, because they have all the same angles. This means that the original triangle can be written as the sum of two smaller but otherwise completely identical triangles. Like this:
Finally, to prove the Pythagorean theorem, we just have to invoke the one equation in this post, for each triangle. This gives:
Since all the triangles are the same, all the “something”s are also the same, which means
Not bad, eh?
I don’t know whether you found the above proof “aesthetic,” but I certainly did. And it’s a pretty nice feeling to think that an insight had by someone more than 2,500 years ago can still feel beautiful to someone like me. And even more remarkably, that my life (and professional career) continue to profit from it.
UPDATE: A number of readers have pointed out that they learned this argument from Migdal’s wonderful book Qualitative Methods in Quantum Theory (which is probably where Levitov learned it also).
Take one moment and try to answer, for yourself, the following question:
How happy are you?
Try to rate it on a scale from 1 to 10. I’ll wait until you’re done.
Now let’s talk about how you came up with your answer.
On the face of it, the question “how happy are you?” is both difficult and almost impossibly ill-defined. Nonetheless, I bet that you were able to come up with a number that felt reasonably accurate for you. This number almost certainly didn’t come from any formula or numerical weighing of different factors, but rather from an instinctive overall feeling of satisfaction with your life.
But what determines this overall feeling? This question, it seems to me, is an important one. Our perception of our own lives has a very real effect on our happiness. So it’s worth trying to figure out what it is that we measure our lives against when we assess their quality.
One angle through which you can examine this issue is by looking for a correlation between wealth and self-reported happiness. After all, nearly all of us put a lot of effort into obtaining money, so apparently money should be a significant contributor to happiness.
And in fact, it’s fairly clear that there is a correlation between wealth and perceived happiness. For example, recent data collected by researchers at the University of Michigan characterizes the relationship like this:
This study looked at 13 different countries, but I should say first off that using the data to comment on the relative happiness levels of different countries is an almost entirely meaningless exercise, as Steven Landsburg describes that pretty well here. What I do think is interesting, though, is the way happiness depends on income within a given country. For simplicity, during the remainder of this post I’ll focus on the USA.
Most people, including the authors of the study in question, will take as the primary conclusion of the above graph that more money equals more happiness, with no sign of satiation. For me, though, what’s more interesting (and more accurate to say) is that self-reported happiness grows logarithmically with income.
Here, for example, is the same data above for the USA extrapolated to cover a wider range of income:
I should emphasize, in case it’s unclear, that this is a very slow growth. For example, the difference between a $10,000/year income (in the US, this is the bottom 6%) and $100,000/year (the top 20%) is only about 1 point of “happiness.” The far left side of the plot is a $1,000/year household income, and the right side is $10 million/year.
Here is that same curve plotted in a normal (non-logarithmic) scale [UPDATE: These are the exact same lines, just shown with a non-distorted x-axis]:
[In lieu of a stern and much-needed warning about the danger of such extreme extrapolation, I’ll just post this:
Nonetheless I will continue to take the apparent logarithmic dependence seriously.]
One excellent, and not terribly surprising, feature that jumps out from the data above is that every income group rates itself as happier than average (). You have to extrapolate the curve all the way down to $700/year of household income in order to arrive at a hypothetical demographic group that would consider itself less happy than average. This, it seems to me, is a clear manifestation of the “Lake Wobegon Effect,” a psychological bias nearly everyone has toward considering themselves above average (named after the fictional town of Lake Wobegon, Minnesota, “where all the children are above average.”) But the scale from 1-10 is arbitrary anyway, so whether the scale effectively starts from zero or from 5 doesn’t really matter.
The real question is, what does this logarithmic growth of “life satisfaction” with income imply about how we assess our happiness?
In general, logarithmic growth occurs when something is measured relative to itself. For example, the plot above suggests that doubling someone’s income will have, on average, the same expected effect on their happiness, regardless of what the person’s salary was to start with. That is, a “poor” person who has their annual salary increased from $10,000 to $20,000 will gain as much in happiness as a “rich” person who has their salary increased from $100,000 to $200,000 (about 0.4 points in each case).
In other words, as the wealth of a person increases, their standards for what constitutes a “better life” seem to increase proportionally. And this is the fundamental reason why happiness increases only logarithmically with an improved standard of living.
If I were a very cynical or very idealistic person, who was inclined to interpret the world through a moral (or religious) lens, I would conclude here by making some ethical or spiritual point. But for me, the elusively shifting standard of human happiness is something interesting rather than depressing. It seems to me, for example, that an alien theorizing about human happiness would anticipate that, since money has a fixed purchasing power, a person should gain a constant amount of happiness from a constant amount of money. But a real person will not be surprised to learn that this is not at all the case. Humans, in some sense, are wired with a constant drive for accomplishment. With each accomplishment a person gains some happiness, and some ability. And as that person’s abilities and prior accomplishments grow, their standard for further accomplishment also grows.
This seems to be a beautiful design of evolution to keep our species alive and at the top of the food chain. And I think it deserves to be celebrated as much as it deserves to be declaimed. It is part of what it means to be human.
Most of all, our proportional measuring of happiness deserves to be recognized and to be understood, especially if we are to attempt to maximize our individual and collective well-being.
1. I personally think that our perception of the passage of time is also a logarithmic process, for similar reasons: psychologically, we weigh lengths of time against our own age.
2. I wonder whether there is something very biologically programmed about our ability to appreciate increases only in proportion. Our physical senses, for example, are subject to the Weber-Fechner law, which says that our sensitivity to small changes decreases in proportion to the magnitude of the sensory stimulus.
For example, you can hear a slight whisper during a silent scene in a movie theater, but in a loud rock concert you won’t be able to perceive anything quieter than a freight train. Similar relations hold for our sense of sight (think of trying to see a faint light in a dark room versus a bright afternoon), touch, smell, and taste.
3. It is perhaps instructive to compare the happiness-vs-income plot to the actual distribution of income in the US. In the same scale as the plot above, that distribution looks like this:
(Data from The US Census Bureau.)
The takeaway message from combining these two plots is like this: If you live in the US, then there is a 90% chance that you belong to a demographic group whose average self-reported happiness is between 6.5 and 8.0.
Please note, by the way, that all income numbers in this post are total household income, and not the salaries of individual jobs.
4. Here is a fun fact related to the “Lake Wobegon Effect”: 93% of Americans consider themselves above-average drivers.