The Historical Development
And the Profound Meaning of
Probability and Statistics

Clifford F. Thies

      Clifford F. Thies is a professor of economics and finance at Shenandoah University, Winchester, VA. He can be reached at: cthies@su.edu.

The fault, dear Brutus, is not in the stars,
But in ourselves, that we are underlings.
—William Shakespeare, Julius Caesar

      The word “statistics” originally meant the collection and analysis of numerical data of concern to the state. Hence the word “state” within the word “statistics.” Today, the word “statistics” simply refers to the collection and analysis of numerical data.

      The first recognized work of statistics was John Graunt’s Natural and Political Observations made upon the Bills of Mortality (1662). Graunt was, until the age of 42, a merchant. Then, he became interested in the births and deaths in London as were compiled weekly on forms (“Bills of Mortality”).

      At the time, because of fear of the plague, births and deaths were being recorded. Graunt’s book compiled these figures from 1604 to 1661, and included an extended commentary on them. His book received an avid following, and he himself was named a member of the original Royal Society.

      In the week of September 12-19, 1665, the number of people who died (“buried”) was 8,297, and the number who died due to the plague was 7,165. (You may wonder why are there eight days in this “week,” i.e., September 12th to September 19th. Counting from mid-day to mid-day, there would be eight days in a “week,” a half day, followed by six whole days, followed by another half-day. The reason why noon, as opposed to midnight, is used to reckon when people die, is that people usually know when, relative to noon, someone dies during the day, but often do not know when, relative to midnight, someone dies during the night.)  Only four of the 130 parishes of London were free of the plague.

      In addition, for many who died, the cause of death (“diseases and casualties”) is given; e.g., five died due to abortive pregnancies, and forty-three to old age. Graunt found that, contrary to common opinion, few died due to murder. It is often the case that people believe that dramatic deaths occur more frequently than they actually occur.

      Graunt wondered, as statisticians often do, about some of the numbers. The small number of deaths due to venereal disease was, he speculated, because coroners and other authorities were reluctant to attribute death to venereal disease unless those who died were “hated persons,” or their condition quite advanced (i.e., their “very Noses were eaten off”). Otherwise, the death would be recorded as due to sores or ulcers.

      Graunt not only compiled a great many numbers, he used the figures to make inferences. For example, he inferred the population of London from his data. From his data, he estimated an average of 12,000 births per year. Then, he assumed that (a) a fecund woman bears a child every other year, (b) there are twice as many households as fecund women, and (c) there are on average eight persons to a household. Graunt thereby estimated that London had a population of 384,000 (12,000 times 2 times 2 times 8).

      Combining his exposure to a great deal of numerical data with a huge dose of intuition, Graunt also inferred a table of life expectancy. Starting from estimates that about sixty-four percent of live births survived until age six, and that very few survived until seventy-five, he constructed the following table.

      Table 1. Table of Life Expectancy, 17th century London, as estimated by John Graunt (Ian Hacking, The Emergence of Probability, 1975).

 

Age     Survival Rate from Live

              Birth (out of 100)

6                      64
16                    40
26                    25
36                    16
46                    10
56                    6
66                    3
76                    1

      For comparison, Table 2 presents a recent table of life expectancy from the National Center for Health Statistics.

      Table 2. U.S. Table of Life Expectancy, 1996 (National Center for Health Statistics, http://www.cdc.gov/nchs/data/nvs47_28.pdf).

 

Age               Survival Rate from Live                 Survival Rate from Live

                          Birth (out of 100)                            Birth (out of 100)

                                    Males                                           Females
5                                     99                                                  99
15                                   99                                                  99
25                                   98                                                  99
35                                   96                                                  98
45                                   93                                                  96
55                                   88                                                  93
65                                   76                                                  86
75                                   55                                                  70
85                                   25                                                  42

      Graunt estimated life expectancy at live birth in 17th century London to be 16. In comparison, life expectancy at live birth in the United States today is 77 (and is substantially higher for females than it is for males). It was precisely his table of life expectancy that set into motion the development of statistics.

      The life insurance industry was tremendously interested in an accurate estimate of life expectancy. With an accurate table of life expectancy, a life insurance company can precisely estimate the benefits that would be paid at various times in the future, and can provide in advance for these benefits by investing the insurance premiums it receives in financial securities. (We presume that the life insurance company insures many people, so that uncertainty about any particular person’s age at death would be inconsequential.) Graunt’s table of life expectancy was indeed pathbreaking, but since the data he used were deficient, the accuracy of his table was subject to question.

      The next great step in the development of statistics was taken by Edmund Halley, an astronomer. Halley is, of course, best known for his identification of “Halley’s Comet.” (In 1705, in his book on the orbit of comets, Halley identified 24 comets that had appeared between 1337 and 1698. Three of them—the comets of 1531, 1607 and 1682—were so similar that he thought they were the same comet making repeat appearances, and he therefore predicted that it would reappear in 1758, i.e., in 76 years, which it did. The comet appeared again in 1835, in 1910, and in 1986, and is scheduled to reappear in 2061. I am told that the comet’s orbit is sometimes affected by the gravitational pull of major planets and other complicating factors, and that, going back in history, it has not always appeared in intervals of about 76 years.)

      Having promised to write an article for a newly established scientific journal, Halley thought it would be clever to perform some analysis on a body of social statistics, as opposed to astronomical statistics. He was aware of John Graunt’s work on life expectancies, and of the deficiencies in the data upon which Graunt relied. And, by reason of a fellow astronomer seeking to disprove certain superstitions concerning the moon, he was aware of a set of statistics sufficient to precisely calculate a table of life expectancies. The set of statistics were the birth and death records kept by the small German city of Breslaw (which since WWII and its transfer to Poland is Wrozlaw, Poland). Most importantly for the purpose at hand, the city fathers of Breslaw recorded age at death, as well as the total number of births and of deaths.

      In 1693, Halley published his article deriving a table of life expectancy from the Breslaw data, and not only that, he derived the values of life annuities (i.e., series of payments that continue until the owner of the policy dies). In the insurance industry, an annuity is the mirror image of a life insurance policy (i.e., a single payment made upon the death of the owner).

      With the math done, “all”  that was left was for real-world insurance companies to implement the new science of statistics. From the publication of Halley’s table of life expectancy, it would take about one hundred years for the first “actuarially based” insurance companies to get underway. (“Actuarial” means relating to statistical calculations, especially of life expectancy. Thus, an actuarially based life insurance company calculates its liability as the present value of future benefits to be paid, the future benefits being based on statistical calculations. In addition, it “funds” its liability with its investment assets, i.e., stocks, bonds, mortgages, commercial real estate and so forth.)  This step, of course, required changes not only in business practice, but in the understanding of insurance on the part of the customers of the industry, as well as in the law and regulations that underpin the marketplace in general and the insurance market in particular. And, back in the early days of capitalism, changes were much slower to implement than they are nowadays.

      The important points are that:

*    Statistics developed from the need to base decisions—in this case, the value of an insurance policy—on real numbers.

*    Statistics involves both the collection of data and its analysis.

*    While there are times that statistical analysis allows a precise calculation, there are other times when the available data are not sufficient. In these cases, the available data may still be informative, but a decision will require some judgment or intuition. (“Intuition” means the ability to form an opinion “without evident rational thought and inference.” Intuition is not necessarily irrational. Imagine the following three sets of propositions: propositions we know or at least reasonably suspect are true, propositions we know or at least reasonably suspect are false, and propositions about which we are genuinely uncertain. As long as our intuition is involved in the third set of propositions, and uses the information that is available to tentatively consider whether a proposition to be either true or false, weighing the risks and rewards involved, then the use of intuition is not irrational.)

*    The collection of data is sometimes itself a decision to be made, where the potential benefit of the additional data to be gathered has to be balanced against the cost of collecting it.

The Nature of Probability
      In his book Against the Gods (1996), Peter L. Bernstein makes the argument that one of the conditions that enabled the development of modern business and finance was a change in our understanding of probability.

      For the ancient Greeks, the risks of life were due to the mischievousness of the gods. Indeed, in many pagan religions, the gods were inclined to wreak havoc on humans, and needed to be placated in various ways (e.g., human sacrifices). But, according to Bernstein, this concept of risk was overthrown upon the rise of the great monotheistic religions. With monotheism, the universe was understood to be orderly and amenable to human understanding. While decision-making involves risk, risk can be quantified, and so allows for prudent risk-taking.

      To say that the universe is orderly is not to say that it is simple. Indeed, once we concern ourselves with the phenomena of social science, the complexities of cause and effect make prediction very difficult, especially regarding individual events.

      To make an analogy, I suppose it is possible nowadays to design a robot hand that could toss a coin and always get “heads.” However, with a human being tossing the coin, you cannot predict the outcome of a coin toss because the “causes” (e.g., the force imparted to the coin by the flick of the thumb) cannot be sufficiently controlled.

      In the world of social science, there are usually many causes, only some of which are important enough to notice. These causes, both the major ones and the minor ones, often have non-linear and interactive effects, making the development of precise relationships very difficult. As a practical consequence, it becomes impossible to predict individual events, although it may be possible to predict tendencies.

      The above paragraph describes “chaos theory.” Chaos theory explains that even if everything is caused, individual events may be unpredictable. However, this word “chaos” is not the dictionary word “chaos,” which means utter confusion. As Albert Einstein once said to a fellow scientist, “You believe in a God who plays with dice, and I in complete law and order in a world in which objectivity exists.” Randomness therefore means the extent to which individual events are unpredictable due to the complexity that is deep in that nature of the universe in which we live, and thus the practical impossibility of identifying all the factors involved in cause and effect.

      Thus, in this universe, both in its physical realm and even more so in its social realm, the collection and analysis of data allows a forecast of the probable range of outcomes of a decision, but cannot entirely eliminate risk. This makes the business person or entrepreneur who uses statistics to inform his decision-making a prudent risk-taker.

      In the beginning, God brought order out of chaos, separating the light from the darkness, and the land from the water. But, by making the world complex, with many causes, many too small to even notice, He left a degree of wildness in the universe.    

 

[ Who We Are | Authors | Archive | Subscribtion | Search | Contact Us ]
© Copyright St.Croix Review 2002