The history of statistics is filled with long hesitations. The difficulty of reconciling statistical regularities with the singular situations of observed cases is found throughout the debates on calculation and interpretation methods for probabilities. Can we reduce societal interactions or behaviours to mathematical equations? What doest it mean to be "normal"?
Two approaches have been in opposition for a long time: the descriptive approach and the prescriptive approach. It is particularly interesting to consider for a moment the reasons that led to the criticism of the prescriptive approach, in particular Adolphe Quételet's notion of the « average man ».
This article aims to bring a criticial perspective to the substitution of correlation for causality, which consists in prescribing measures, without any further concern for explaining the reasons for doing so. This analysis will allow us to understand why some of the prescriptive reasoning now applied to Big Data should in turn be decried and what are the underlying risks of using statistics and predictive models to moralize our conducts.
The notion of « average man »
In the 1830s, the Belgian astronomer Adolphe Quételet started from the fact that births like deaths, crimes like suicides occur randomly, for reasons that are different and specific to each case. However, they occur, each year, for a given country, on a regular basis. Moral behaviour and physical attributes, despite their apparent heterogeneity, would possess unity. According to Quételet, this suggests the existence of constant causes. Regularity at the macroscopic level is no longer seen as a sign of a divine order, but as a statistical fatality. Above these singular cases, the « average man » would constitute the « norm », whose other occurrences would only be imperfect imitations.
« Regularity at the macroscopic level is no longer seen as a sign of a divine order, but as a statistical fatality. »
By measuring, for example, the size distribution of a given population, we can see the existence of a bell curve - the Gauss curve - which provides a distribution around a central value (the mean). The mean of this law - which will be presented from 1894 onwards by the British mathematician Karl Pearson under the name of « normal distribution » - is then considered as the real value of the observed size. The other values scattered around this mean being errors. For proponents of social physics, deviations from the central value should not be taken into account, as these constituted imperfections.
The shift from descriptive to prescriptive statistics is looming. Put differently, the number of smallpox patients is no longer simply measured, but is used to make the decision whether or not to vaccinate a given population and whether or not to administer this mandatory preventive intervention. From a moral perspective, normality is assimilated to good: « the individual who would sum up in himself, at a given time, all the qualities of the average man, would represent all that is great, beautiful and good at the same time » (Adolphe Quételet, Sur l'homme et le développement de ses facultés, ou essai d'une physique sociale, Editions Bachelier, 1835). In trying to identify the « average man », statistics are meant to be moralizing. The « average man » is thus supposed to become the standard of society. Belonging to the average no longer refers to mediocrity. It becomes the object of a new fascination.
Interpreting the diversity of human measurements as variation around the « average man » is not limited to physical attributes such as height or weight. According to Quételet, averaging is an ideal. « Social physics » is therefore interested in social regularities, such as the occurrence of crimes and suicides. Based on these metrics, statistics postulate the existence of objective causes explaining the measured regularities. Quételet said he was fascinated by « the frightening accuracy with which crimes are repeated ». The statistical results were all the more striking as they revealed a regularity, a social necessity, even for the most unpredictable acts. The statistical measurement of social phenomena thus held out the promise of being able to better explain them.
The moralizing statistics
This moralizing can be found in the very title of the underlying mathematical theorems’ terminologies. We refer to « binomial distribution » with Bernouilli, then « normal distribution » with Pearson, in order to model natural phenomena resulting from various random elements. They postulate the existence of constant causes, likely to explain the observed regularities. It is then no longer a question of measuring the effects of an identified cause but, using an inductive method, of inferring from the constancy of certain measurements the presence of constant causes. In many ways, these aspects are similar to the current Big Data approach, which no longer focuses on explaining causes, but merely highlights the existence of statistical correlations.
« The reduction of social facts to mathematical formulae and indicators was already a concern for many philosophers, writers and scientists, who saw in it a risk of moralizing social life through algebra and calculations. »
The law has therefore attached normative effects to statistical measures. The first social laws, for example, were the result of measuring the particular physical risks to which workers were exposed when operating the machines. Employment law was thus the first to have been indexed on social distinctions, revealed by quantification. We can already see in these probability laws the fantasy of substituting calculation for law, ancestor of the contemporary dream of replacing the Legal Code by the Computer Code, and why not tomorrow, to replace judges by machines.
The reduction of social facts to mathematical formulae and indicators was already a concern for many philosophers, writers and scientists, who saw in it a risk of moralizing social life through algebra and calculations. This led Auguste Comte to break with « social physics » and start talking about « sociology ». This succession of trends and neologisms shows above all that the appearance of statistics contributed to the empowerment of society in relation to political power. Gradually, statistics were no longer solely at the leaders' service, but became autonomous to represent a whole social reality with its own laws and regularities that political powers in turn had to learn to know and measure.
The criticism of the « average man »
The historical opposition between the descriptive (« there is ») and prescriptive (« there must be ») statistical approach could be summarized as follows. According to the descriptive approach, man is a finite being, unable to know the universe, which implies making bets. In probability, we refer to « reasons to believe » or « degree of belief », allowing one to guide and orient one's choices in a situation of uncertainty. Probability then appears as a measuring tool of man's ignorance, and wants to help him to overcome it. On the contrary, the prescriptive approach, known as the « frequentist » approach, focuses on the regularity of the measured phenomena, and uses these measurements to justify and recommend actions.
In his reference work The Politics of Numbers (La politique des grands nombres), the former INSEE administrator and specialist in the history of statistics, Alain Desrosières, perfectly summarizes the impact that the prescriptive approach has had on statistical culture and the abuses it may have caused: « [The prescriptive approach] forms the heart of statistical instrumentation in the public space. It was shown for the transformation of how industrial accidents were socially addressed in the 19th century, from individual responsibility as defined by the Civil Code, to the company's insurance responsibility, based on calculations of probability and average. Insurance and social protection systems are based on this transformation of individual hazards into stable collective objects, that can be publicly evaluated and debated. However, by paying attention not to unpredictable individuals, but to an average that can be used as a basis for controllable action, Quételet's reasoning does not provide a tool for debates on distributions and orders between individuals. Tending towards the reduction of heterogeneity, he is not interested in its’ objectivization, which is necessary if a debate is to focus precisely on it. This happens when a hereditary Darwinian problem of inequalities between individuals is imported from the animal world to the human world, by Galton. ».
« The technique of regression is applied in order to bring out the effects of heredity, to favor the birth rate of men categorized as the most able and to limit the reproduction of the poorest, who are unfit (...) »
Indeed, from the 1870s, eugenists, including Galton and Pearson, take up Quételet’s idea of the normal distribution of human attributes around a central value, but this time to classify individuals, according to what they will call a law of deviation. Rather than eliminating these deviations from the average, they instead focus on them. Statisticians then become activists for a political cause. The technique of regression is applied in order to bring out the effects of heredity, to favor the birth rate of men categorized as the most able and to limit the reproduction of the poorest, who are unfit: « This scientific-political war machine is directed, on the one hand, against the landed gentry and the clergy, who are hostile to modern science and Darwinism, and, on the other hand, against reformers for whom misery results from economic and social causes, more than biological ones, and who militate for the establishment of welfare systems. ».
In an article published in the Journal of the Royal Statistical Society in 1910, the economist Keynes, defending an anti-frequentist approach, opposed Karl Pearson's statistical induction procedures. Keynes objected to a recent study by Pearson that sought to demonstrate how children's abilities were purely hereditary and not contingent on their parents' lifestyles. In this case, the purpose of the publication was to show that alcoholic parents - alcoholism being considered a way of life, free from any genetic cause - could produce children with undamaged physical and intellectual aptitudes. For Keynes, Pearson's study was dangerous, since nothing ruled out the possibility that causes other than alcoholism might be responsible for the results obtained.
« Canguilhem questioned what it means to be « normal ». For every human being is doomed to step, even if only temporarily, outside the norm. »
The difference in approach, both descriptive and prescriptive, has not failed to spark controversy among physicians as well. Illness was often seen as a single event and not a generalizable one. The categorization of patients - between those who were part of the norm and those who were abnormal - was therefore difficult to accept. French philosopher and physician Georges Canguilhem observed: « The ambiguity of the term normal has often been noted, which sometimes designates a factual situation able to be described by a statistical census - the average of the measurements made on a trait presented by a species and a plurality of individuals presenting this trait according to the average or with a few deviations considered indifferent - or an ideal, a positive principle of appreciation, in the sense of a prototype or a perfect form. The fact that these two meanings are always linked, that the term normal is always confused, is what emerges from the very advice we are given to avoid any ambiguity. ». Canguilhem questioned what it means to be « normal ». For every human being is doomed to step, even if only temporarily, outside the norm. For example, we sometimes catch the flu or a fever in the winter. So, there is a certain normality in being sick.
« There is no normal or pathological fact in itself. The abnormality or mutation is not in itself pathological; it expresses other possible norms of life »
Therefore, should it be considered that there are several possible standards? The goal of 19th century medicine was the restoration of normality, of which illness was presented as a deviation. So, what if average people are sick, would it then still be normal? So we see that under the idea of norm - in the scientific, medical sense - there is a moral ground, that is a normative judgment according to which any difference, any variation from what has been stated to be the norm, must be considered as a risk, a danger, that must be addressed and fought against. On the contrary, for Canguilhem, normality should not be approached in a universal approach (the same standard for all), but should rather be understood as singular. According to him: « There is no normal or pathological fact in itself. The abnormality or mutation is not in itself pathological; it expresses other possible norms of life ». Being normal would mean, on the contrary, being able to deviate from the norm by inventing new norms. The rejection of deviations from the norm will then need to be integrated into our approach to Big Data. Its biopolitical objectives of identifying patterns to produce a healthy population should not result in locking us into a standard-setting society.
Lawyers, coders, statisticians and technologists will have to learn how to work together to ensure an ethical use of statistical information, which is favorable to the numerous differences of the human species, as it is a condition for any democratic deliberation.