The Historical Development
And the Profound Meaning of
Probability and Statistics
Clifford F. Thies
Clifford F.
Thies is a professor of economics and finance at Shenandoah University,
Winchester, VA. He can be reached at: cthies@su.edu.
The
fault, dear Brutus, is not in the stars,
But in ourselves, that we are underlings.
—William Shakespeare, Julius Caesar
The word
“statistics” originally meant the collection and analysis of
numerical data of concern to the state. Hence the word “state”
within the word “statistics.” Today, the word “statistics”
simply refers to the collection and analysis of numerical data.
The first
recognized work of statistics was John Graunt’s Natural and
Political Observations made upon the Bills of Mortality (1662). Graunt was, until the age of 42, a
merchant. Then, he became interested in the births and deaths in London as
were compiled weekly on forms (“Bills of Mortality”).
At the time,
because of fear of the plague, births and deaths were being recorded.
Graunt’s book compiled these figures from 1604 to 1661, and included an
extended commentary on them. His book received an avid following, and he
himself was named a member of the original Royal Society.
In the week of
September 12-19, 1665, the number of people who died (“buried”)
was 8,297, and the number who died due to the plague was 7,165. (You may
wonder why are there eight days in this “week,” i.e., September
12th to September 19th. Counting from mid-day to mid-day, there would be
eight days in a “week,” a half day, followed by six whole days,
followed by another half-day. The reason why noon, as opposed to midnight, is
used to reckon when people die, is that people usually know when, relative to
noon, someone dies during the day, but often do not know when, relative to
midnight, someone dies during the night.) Only four of the 130 parishes of London were free of the
plague.
In addition,
for many who died, the cause of death (“diseases and casualties”)
is given; e.g., five died due to abortive pregnancies, and forty-three to old
age. Graunt found that, contrary to common opinion, few died due to murder.
It is often the case that people believe that dramatic deaths occur more
frequently than they actually occur.
Graunt
wondered, as statisticians often do, about some of the numbers. The small
number of deaths due to venereal disease was, he speculated, because coroners
and other authorities were reluctant to attribute death to venereal disease
unless those who died were “hated persons,” or their condition
quite advanced (i.e., their “very Noses were eaten off”).
Otherwise, the death would be recorded as due to sores or ulcers.
Graunt not only
compiled a great many numbers, he used the figures to make inferences. For
example, he inferred the population of London from his data. From his data,
he estimated an average of 12,000 births per year. Then, he assumed that (a)
a fecund woman bears a child every other year, (b) there are twice as many
households as fecund women, and (c) there are on average eight persons to a
household. Graunt thereby estimated that London had a population of 384,000
(12,000 times 2 times 2 times 8).
Combining his
exposure to a great deal of numerical data with a huge dose of intuition,
Graunt also inferred a table of life expectancy. Starting from estimates that
about sixty-four percent of live births survived until age six, and that very
few survived until seventy-five, he constructed the following table.
Table 1. Table
of Life Expectancy, 17th century London, as estimated by John Graunt (Ian
Hacking, The Emergence of Probability, 1975).
Age Survival
Rate from Live
Birth
(out of 100)
6 64
16 40
26 25
36 16
46 10
56 6
66 3
76 1
For comparison,
Table 2 presents a recent table of life expectancy from the National Center
for Health Statistics.
Table 2. U.S.
Table of Life Expectancy, 1996 (National Center for Health Statistics, http://www.cdc.gov/nchs/data/nvs47_28.pdf).
Age
Survival Rate from Live
Survival
Rate from Live
Birth
(out of 100)
Birth
(out of 100)
Males
Females
5 99 99
15 99 99
25 98 99
35 96 98
45 93 96
55 88 93
65 76 86
75 55 70
85 25 42
Graunt
estimated life expectancy at live birth in 17th century London to be 16. In
comparison, life expectancy at live birth in the United States today is 77
(and is substantially higher for females than it is for males). It was precisely his table of life expectancy that set into motion
the development of statistics.
The life
insurance industry was tremendously interested in an accurate estimate of
life expectancy. With an accurate table of life expectancy, a life insurance
company can precisely estimate the benefits that would be paid at various
times in the future, and can provide in advance for these benefits by
investing the insurance premiums it receives in financial securities. (We
presume that the life insurance company insures many people, so that
uncertainty about any particular person’s age at death would be
inconsequential.) Graunt’s table of life expectancy was indeed
pathbreaking, but since the data he used were deficient, the accuracy of his
table was subject to question.
The next great
step in the development of statistics was taken by Edmund Halley, an
astronomer. Halley is, of course, best known for his identification of
“Halley’s Comet.” (In 1705, in his book on the orbit of comets,
Halley identified 24 comets that had appeared between 1337 and 1698. Three of
them—the comets of 1531, 1607 and 1682—were so similar that he
thought they were the same comet making repeat appearances, and he therefore
predicted that it would reappear in 1758, i.e., in 76 years, which it did.
The comet appeared again in 1835, in 1910, and in 1986, and is scheduled to
reappear in 2061. I am told that the comet’s orbit is sometimes
affected by the gravitational pull of major planets and other complicating
factors, and that, going back in history, it has not always appeared in
intervals of about 76 years.)
Having promised
to write an article for a newly established scientific journal, Halley
thought it would be clever to perform some analysis on a body of social
statistics, as opposed to astronomical statistics. He was aware of John
Graunt’s work on life expectancies, and of the deficiencies in the data
upon which Graunt relied. And, by reason of a fellow astronomer seeking to
disprove certain superstitions concerning the moon, he was aware of a set of
statistics sufficient to precisely calculate a table of life expectancies.
The set of statistics were the birth and death records kept by the small
German city of Breslaw (which since WWII and its transfer to Poland is Wrozlaw, Poland). Most importantly for the purpose at hand, the city fathers
of Breslaw recorded age at death, as well as the total number of births and
of deaths.
In 1693, Halley
published his article deriving a table of life expectancy from the Breslaw
data, and not only that, he derived the values of life annuities (i.e.,
series of payments that continue until the owner of the policy dies). In the
insurance industry, an annuity is the mirror image of a life insurance policy
(i.e., a single payment made upon the death of the owner).
With the math
done, “all” that was
left was for real-world insurance companies to implement the new science of
statistics. From the publication of Halley’s table of life expectancy,
it would take about one hundred years for the first “actuarially
based” insurance companies to get underway. (“Actuarial”
means relating to statistical calculations, especially of life expectancy.
Thus, an actuarially based life insurance company calculates its liability as
the present value of future benefits to be paid, the future benefits being
based on statistical calculations. In addition, it “funds” its
liability with its investment assets, i.e., stocks, bonds, mortgages,
commercial real estate and so forth.)
This step, of course, required changes not only in business practice,
but in the understanding of insurance on the part of the customers of the
industry, as well as in the law and regulations that underpin the marketplace
in general and the insurance market in particular. And, back in the early
days of capitalism, changes were much slower to implement than they are
nowadays.
The important
points are that:
* Statistics
developed from the need to base decisions—in this case, the value of an insurance
policy—on real numbers.
* Statistics
involves both the collection of data and its analysis.
* While
there are times that statistical analysis allows a precise calculation, there
are other times when the available data are not sufficient. In these cases,
the available data may still be informative, but a decision will require some
judgment or intuition. (“Intuition” means the ability to form an
opinion “without evident rational thought and inference.”
Intuition is not necessarily irrational. Imagine the following three sets of
propositions: propositions we know or at least reasonably suspect are true,
propositions we know or at least reasonably suspect are false, and
propositions about which we are genuinely uncertain. As long as our intuition
is involved in the third set of propositions, and uses the information that
is available to tentatively consider whether a proposition to be either true
or false, weighing the risks and rewards involved, then the use of intuition
is not irrational.)
* The
collection of data is sometimes itself a decision to be made, where the
potential benefit of the additional data to be gathered has to be balanced
against the cost of collecting it.
The Nature of
Probability
In
his book Against the Gods
(1996), Peter L. Bernstein makes the argument that one of the conditions that
enabled the development of modern business and finance was a change in our
understanding of probability.
For the ancient
Greeks, the risks of life were due to the mischievousness of the gods.
Indeed, in many pagan religions, the gods were inclined to wreak havoc on
humans, and needed to be placated in various ways (e.g., human sacrifices).
But, according to Bernstein, this concept of risk was overthrown upon the
rise of the great monotheistic religions. With monotheism, the universe was understood
to be orderly and amenable to human understanding. While decision-making
involves risk, risk can be quantified, and so allows for prudent risk-taking.
To say that the
universe is orderly is not to say that it is simple. Indeed, once we concern ourselves
with the phenomena of social science, the complexities of cause and effect
make prediction very difficult, especially regarding individual events.
To make an
analogy, I suppose it is possible nowadays to design a robot hand that could
toss a coin and always get “heads.” However, with a human being
tossing the coin, you cannot predict the outcome of a coin toss because the
“causes” (e.g., the force imparted to the coin by the flick of
the thumb) cannot be sufficiently controlled.
In the world of
social science, there are usually many causes, only some of which are
important enough to notice. These causes, both the major ones and the minor
ones, often have non-linear and interactive effects, making the development
of precise relationships very difficult. As a practical consequence, it
becomes impossible to predict individual events, although it may be possible
to predict tendencies.
The above
paragraph describes “chaos theory.” Chaos theory explains that
even if everything is caused, individual events may be unpredictable.
However, this word “chaos” is not the dictionary word
“chaos,” which means utter confusion. As Albert Einstein once
said to a fellow scientist, “You believe in a God who plays with dice,
and I in complete law and order in a world in which objectivity
exists.” Randomness therefore means the extent to which individual
events are unpredictable due to the complexity that is deep in that nature of
the universe in which we live, and thus the practical impossibility of
identifying all the factors involved in cause and effect.
Thus, in this
universe, both in its physical realm and even more so in its social realm,
the collection and analysis of data allows a forecast of the probable range
of outcomes of a decision, but cannot entirely eliminate risk. This makes the
business person or entrepreneur
who uses statistics to inform his decision-making a prudent risk-taker.
In the
beginning, God brought order out of chaos, separating the light from the
darkness, and the land from the water. But, by making the world complex, with
many causes, many too small to even notice, He left a degree of wildness in
the universe. Ω