PrefaceEnglish

PREFACE: "The statistics" vs "Statistics"

Statisticians should fit the needs of the users, not the reverse! - J.W. Tukey

From the statistics to Statistics

There are statistics and statistics. "The statistics" - familiarly the "stats" - are the statistical data (averages, percentages, numbers of all sorts) ubiquitous in the media and found in all possible and imaginable areas: official statistics , surveys, etc. By "Statistics" - often written with a capital S (the "science of Statistics") - is meant the scientific discipline dealing with the methods for analyzing statistical data. My work has been concerned with "Statistics".
Before we proceed, two massive facts must be mentioned, that overhang any developments. Firstly, there is the overwhelming dominance (from 1945 to the present day) of Anglo-saxon statistics; see a quasi-monopoly. Secondly, there is the phenomenon of hyperspecialization, which fragments the same topic (such as statistics) into isolated subspecialties.

Academic Statistics and Statistics for researchers

The statistical discipline is a "metadiscipline", whose raw material lies outside the discipline. By its very nature, it lies at the junction of two lines of thinking, namely mathematics and empirical sciences. Among the founding fathers of statistics, the two lines were always present; whereas nowadays they are well separated, with academic statistics on the one hand, and statistics for researchers on the other hand.

One finds academic statistics in the mathematics departments of universities and in the "theoretical" teaching in institutions like (in France) INSEE (Economical Research) or INSERM (Medical Research). This discipline is self-called "mathematical statistics" and purports to be a deductive theory, like mathematical physics.
One finds statistics for researchers in laboratories and empirical studies, from natural sciences to social sciences. It is an essentially normative discipline, which aims at providing legitimate "scientific proof", controlled by the referees of scientific journals. Let us be clear: We do say "statistics for researchers", not "applied statistics", because even though the canons of academic statistics are recognized by statistics for researchers in principle, they are hardly applied in practice.
My conviction is that while separating the two lines, it is vital to maintain the unity of Statistics (On the unfortunate consequences of the current division, see Medical statistics on the carpet). What justifies Statistics is its auxiliary role ( "Hilfswissenschaft") of empirical disciplines. Statistics for researchers should guide theoretical statistics. The ideal situation is where the statistician participates in a large-scale empirical research, with scientists specifying the questions, and works out the statistical procedures to help answer these questions.The key ideas that give sense to my work have all emerged in interaction with research problems; and my contributions have tended to construct an autonomous statistics for researchers.

The foundations of statistics; the history of statistics

The statistical discipline is a recent one, highly dependent on computational tools. Not surprisingly, it has faced persisting identity problems. Originally a branch of probability theory, it was then, in the blooming days of Operation Research, nearly absorbed in the "science of decisions". Nowadays, it would rather tend to become a part of algorithmics (a field surely more creative).
Our key ideas indeed refer to the fundamentals of statistics. But talking of the foundations of a discipline means a specialized area, on the side of a discipline whose content is "well established." The status of the key ideas, in contrast, is to call for a restructuring of the traditional chapters of statistics.
The same goes for the history of statistics, to which I have been initiated by G.Th. Guilbaud and B. Bru: cf. Rouanet & Bru (1994b). At the age of the Internet, browsing the "Electronic Journal of the history of probability and statistics" is for me a real pleasure. However, I must confess that epistemology is not my strong point. If history fascinates me, it is (to paraphrase Marc Ferro about history in general), "provided that his study provides an understanding of the problems of our time." Rather than scrutinize the forerunners of present dominating trends, I try to (re)discover neglected ways, that the tools of our time can make practicable.
It is clear that many theoretical constructions in the past have been built in order to bypass the obstacle of computation: for example, the normal model. Other theories remained in sketch form: for example, classification procedures, or permutation modeling. Now that the computational obstacle is virtually removed, that the era of statistical tables is (or should be) over, one can and should prefer, I believe, a direct approach to tackle the real issues that justify using statistical methods. In fact, what were the problems that Binet, or Durkheim, were attempting to solve ? What if they had had computers at their disposal, with their colossal databases and their fabulous means of calculation?

Statistics in Human Sciences

My work has focused on statistics in the human sciences, mainly psychology and social science, in other words, behavioral sciences, bordered by bio-medical statistics on one side, and econometry on the other. As far as statistics is concerned, this constitutes a quite homogeneous field: There is "statistics for human sciences", not really "statistics for psychologists," "statistics for sociologists", and so on.
In my view, the role of statistics in a research paper should always conform to the following pattern:

Problem Research --> relevant data --> Statistical Analysis --> Statistical Results --> Research conclusions.

Relevant data must constitute a representative inventory of the area under study. This is the "completeness requirement" of Benzécri, close to the notion of "field" of Bourdieu. The statistical analysis should either bring an answer to the research questions, or else show that the available data are insufficient to meet them. Enforcing the foregoing scheme should facilitate the critical examination of a research report and enable one to pinpoint at which stage(s) errors may have been committed: 1) Relevant data have been omitted; 2) The statistical analysis carried out is inadequate; 3) The conclusions drawn exceed those authorized by the statistical results (over-interpretation).

In academic statistics, "real-life data" are often just invoked in order to illustrate techniques, while ignoring research problems. Blatant violations to the requirement of completeness abound. Suffice to mention an article by Goodman (1991), which purports to seriously discuss the comparative merits of methods on a simple 4x5 array of social mobility, disconnected from any context. In his reply, D.R. Cox notes shrewdly: "A key question concerns how the models are to be adapted to address detailed substantive questions (etc.); for example, there may be further dimensions or concurrent comment on the individuals concerned. . "

Two crucial distinctions

Beyond the diversity of disciplines, two distinctions are essential:

1) Between experimental data (factors of interest are controlled) and observational data (factors of interest are only observed).
2) Descriptive procedures (the findings relate to the data) and inductive ones, alias statistical inference (the conclusions go beyond data); with in the background, the perennial problem of the role of probabilities in statistics.

Texts and publications

The references to my texts and publications are given on the one hand in chronological order, on the other hand by themes (domains). Some texts are mathematically oriented and may call mathematicians interested in the applications. Others texts are case studies, where the statistical approach is exposed "in situation", and are directly readable by researchers (not necessarily versed in mathematics).

Organization of the heading "Statistical Work" (travaux statistiques)

. Key ideas: Formalization, geometric, descriptive-inductive, specific, probability.

. Domains: Stochastic models, analysis of variance and structured data, Combinatorial inference, Bayesian inference, Geometric Data Analysis, Regression.

. Software, teaching, etc...

. Reading Notes.

Heading "Personalia"

. CV and Scientific trajectory.

Please note. The heading "Loisirs" and "Feuilles et Bons Mots" lie outside of my work.

Hyperspecialisation.

A quasi- monopoly.

Medical statistics on the carpet

Home page

Top