1 is more important than 9: Benford’s Law
Let’s start like this: think of some number that describes nature, or any object in it. It can be any mathematical or physical constant or measurement, in any system of units.
I predict, using my psychic powers, that you were much more likely to have thought of a number that begins with 1, 2, or 3 rather than a number that begins with 7, 8, or 9.
As it turns out, the probability is about four times higher. In fact, the probability of having a particular first digit decreases monotonically with the value the digit (1 is a more common first digit than 2, 2 is more common than 3, and so on). And the odds of you having picked a number that starts with 1 are about seven times higher than the odds of you having picked a number that starts with 9.
This funny happenstance is part of a larger observation called Benford’s law. Broadly speaking, Benford’s law says that the lower counting numbers (like 1, 2, and 3) are disproportionately likely to be the first digit of naturally-occurring numbers.
In this post I’ll talk a little bit about Benford’s law, its quantitative form, and how one can think about it.
But first, as a fun exercise, I decided to see whether Benford’s law holds for the numbers I personally tend to use and care about.
(Here I feel I must pause to acknowledge how deeply, ineluctably nerdy that last sentence reveals me to be.)
So I made a list of the physical constants that I tend to think about — or, at least, of the ones that occurred to me at the moment of making the list. These are presented below in no particular order, and with no particular theme or guarantee for completeness and non-redundancy (i.e., some of the constants on this list can be made by combining others).
After a quick look-over, it’s pretty clear that this table has a lot more numbers starting with 1 than numbers starting with 9. A histogram of first digits in this table looks like this:
Clearly, there are more small digits than large digits. (And somehow I managed to avoid any numbers that start with 4. This is perhaps revealing about me.)
As far as I can tell, there is no really satisfying proof of Benford’s law. But if you want to get some feeling for where it comes from, you can notice that those numbers on my table cover a really wide range of values: ranging in scale from (the Planck length) to (the sun’s mass). (And no doubt they would cover a wider range if I were into astronomy.) So if you wanted to put all those physical constants on a single number line, you would have to do it in logarithmic scale. Like this:
The funny thing about a logarithmic scale, though, is that it distorts the real line, giving more length to numbers beginning with lower integers. For example, here is the same line from above, zoomed in to the interval between 1 and 10:
You can see in this picture that the interval from 1 to 2 is much longer than the interval from 9 to 10. (And, just to remind you, the general rule for logarithmic scales is that the same interval separates any two numbers with the same ratio. So, for example, 1 and 3 are as far from each other as 2 and 6, or 3 and 9, or 500 and 1500.) If you were to choose a set of numbers by randomly throwing darts at a logarithmic scale, you would naturally get more 1’s and 2’s than 8’s and 9’s.
What this implies is that if you want a quantitative form for Benford’s law, you can just compare the lengths of the different intervals on the logarithmic scale. This gives:
where is the value of the first digit and is the relative abundance of that digit.
If you have a large enough data set, this quantitative form of Benford’s law tends to come through pretty clearly. For example, if you take all 335 entries from the list of physical constants provided by NIST, then you find that the abundance of different first digits is described by the formula above with pretty good quantitative accuracy:
Now, if you don’t like the image of choosing constants of nature by throwing darts at a logarithmic scale, let me suggest another way to see it: Benford’s law is what you’d get as the result of a random walk using multiplicative steps.
In the conventional random walk, the walker steps randomly to the right or left with steps of constant length, and after a long time ends up at a random position on the number line. But imagine instead that the random walker takes steps of constant multiplicative value — for example, at each “step” the walker could have his position multiplied by either 2/3 or 3/2. This would correspond to steps that appeared to have constant length on the logarithmic scale. Consequently, after many steps the walker would have a random position on the logarithmic axis, and so would be more likely to end up in one of those wider 1–2 than in the shorter 8–9 bins.
The upshot is that one way to think about Benford’s law is that the numbers we have arise from a process of multiplying many other “randomly chosen” numbers together. This multiplication naturally skews our results toward numbers that begin with low digits.
By the way, for me the notion of “randomly multiplying numbers together” immediately brings to mind the process of doing homework as an undergraduate. This inspired me to grab a random physics book off my shelf (which happened to be Tipler and Llewellyn’s Modern Physics, 3rd edition) and check the solutions to the homework problems in the back.
So the next time you find yourself trying to randomly guess answers, remember Benford’s law.