A Brief History of Data - Part 2

The Birth of Statistical Culture


Following this introduction to the history of data (Part 1), we will examine the genesis of the statistical culture. In order to understand how we have gone from simple enumeration, to descriptive statistics, then prescriptive, and finally today... "predictive" statistics.















The different forms of statistics


The development of statistics has successively addressed several concerns. Initially, statistics, according to the medieval tradition of the "mirror-prince", had a pedagogical vocation, consisting in instructing the regent while showing him the reflection of his greatness: description of the kingdom's provinces, of its territory, of the amount of taxes he could collect. Statistical analysis then abandoned the perspective of royal power to focus on the condition of society itself and its inhabitants for practical purposes: inventories of the prices of agricultural and industrial products, population enumerations, means of subsistence. Subsequently, times of famine, plagues and war called for increasingly specialized and regular statistical studies: Abbé Terray's annual records of births, marriages and deaths from 1772 onwards, and Montyon's record of criminal convictions from 1775 onwards.


« Statistical analysis then abandoned the perspective of royal power to focus on the condition of society itself and its inhabitants for practical purposes »

The first public controversy on the use of statistics arose during the debate on the inoculation of smallpox. The controversy began when, in 1774, smallpox swept the king away, leading his successor Louis XVI to inoculate the entire royal family. This technical progress was made possible by the work of Edward Jenner. According to this English physician, women working in contact with cows, especially those who milked them, did not get smallpox. His work showed that cattle carried an infectious disease called vaccinia. This gave him the idea to "vaccinate" - to infect humans as an antidote to smallpox. The question then arose as to whether public health would be better served by making the vaccine compulsory.


For lawyer Daniel Bernoulli, the "chances of winning" militated in favor of a vaccination campaign. Brother of the mathematician Jacques Bernoulli - inventor of the law of large numbers - he proposed to solve the political question of vaccination, by applying a formula similar to that used in games of chance. He thus calculated that the chances of winning, in this case a three-year longer life expectancy for the inoculated individuals, established the benefits of vaccination. On the other hand, French philosopher and mathematician d'Alembert and French physician Claude Bernard were opposed to this approach, which in their view led to a confusion between the "mean" and the norm. According to them, there should be no equivalence between the observed fact and the law deriving from it. This opposition, which ultimately proved the proponents of vaccination right, is one of the first instances of statistical calculation's victory over law.


From the « artificial man » to the « artificial intelligence »


The pre-industrial thrusts and the gradual emergence of capitalism led to the dissolution of the feudal political system, giving rise to unprecedented administrative, economic and social problems. A renewed scientific and ideological foundation was therefore needed to tackle the new forming societies. This is the context in which statistical methodology emerges. Before statistics became an autonomous and unified discipline, it had two very different origins: German descriptive statistics (Staatenkunde) and English political arithmetic.


« The birth of statistics thus correlated with the creation of the modern state, which the English philosopher Hobbes referred to - and this is not trivial - as the "artificial man"»

The term "statistics" was first used by the economist Gottfried Achenwall in the old Germany of the mid-18th century. It was an activity of a purely descriptive nature, intended to present the characteristics of States, their territories - climate, natural resources, land constitution - and their populations, notably with the help of cross-tabulations. This method was based on the distinguishing criteria of nations developed by Leibniz during the same time, in an effort to be able to draw comparisons between each European country. The classification of heterogeneous knowledge was meant to allow for a distinction between natural and material wealth, types of regimes and administrations.


It was a nomenclature with a holistic intention, designed to facilitate the memorization of facts and teaching, for the good use of statesmen. Understanding the State in order to rule it more effectively became essential in the second half of the 19th century. After the Thirty Years' War, what was to become Germany was still a country divided into more than three hundred micro-States. This fragmentation explains the desire to establish a general framework for classifying, cataloguing and archiving information in order to organize the collective memory, and subsequently on this basis: trade, justice and political decisions. The birth of statistics thus correlated with the creation of the modern state, which the English philosopher Hobbes referred to - and this is not trivial - as the "artificial man".


The statistical description of reality to apprehend mass phenomena is found at the same time, in a completely different form, in England. Here, the scientific government resorted to numbers - the counting of parish baptismal registers, construction of mortality tables, calculation of life expectancy - to chase out the arbitrary, giving way to "political arithmetic" in 1648. The British polymath William Petty, inventor and pioneer of this set of techniques, described it as an art of reasoning by figures upon questions of government: « The Method I take to do this, is not yet very usual; for instead of using only comparative and superlative Words, and intellectual Arguments, I have taken the course (as a Specimen of the Political Arithmetick I have long aimed at) to express myself in Terms of Number, Weight, or Measure ». The development of mathematical tools for quantification - averaging, dispersion, correlation, sampling - is destined to apprehend a supposedly uncontrollable diversity.


« Modern statistical thinking, by its mathematical vocation, inspires objectivity, and therefore the legitimacy of a new administration of beings and things »

The main difference between these two models, German and English, stems from the fact that German descriptive statistics sought to give a global picture of the State without resorting specifically to quantification, whereas English political arithmetic - much closer to today's statistics - was entirely based on figured censuses. This heterogeneity eventually converged. Modern statistical thinking, by its mathematical vocation, inspires objectivity, and therefore the legitimacy of a new administration of beings and things. States thus created the first official statistical agencies, the forerunners of national statistical institutes such as INSEE, or private polling institutes, who are now themselves disrupted by the advent of predictive algorithms.


Unlike today's reality, statistical surveys were then intended only for governments - sovereigns and their administrations - and never for civil society. This difference is significant, and shall be discussed later, especially when we consider the movement that has been going on for the last ten years with the opening up of public data (open data), which is supposed to inform everyone on multiple subjects, from air quality, to the availability of electric cars, WiFi access points in one's neighborhood, budgets and decisions voted by local representatives.