Twenty years ago, the late Leo Breiman sent a wake-up call to the
statistical community, thereby criticizing the dominant use of `data
models’ (Breiman, 2001). In this talk, I will revisit his critiques in
light of the developments on algorithmic modeling, debiased machine
learning and targeted learning that have taken place over the past 2
decades, largely within the causal inference literature (Vansteelandt,
2021). I will argue that these developments resolve Breiman’s critiques,
but are not ready for mainstream use by researchers without in-depth
training in causal inference. They focus almost exclusively on
evaluating the effects of dichotomous exposures; when even slightly more
complex settings are envisaged, then this restrictive focus encourages
poor practice (such as dichotomization of a continuous exposure) or
makes users revert to the traditional modeling culture. Moreover, while
there is enormous value in the ability to quantify the effects of
specific interventions, this focus is also artificial in the many
scientific studies where no specific interventions are
targeted.
I will accommodate these concerns via a general
conceptual framework on assumption-lean regression, which I recently
introduced in a discussion paper that was read before the Royal
Statistical Society (Vansteelandt and Dukes, 2022). This framework
builds heavily on the debiased / targeted machine learning literature,
but intends to be as broadly useful as standard regression methods,
while continuing to resolve Breiman’s concerns and other typical
concerns about regression.
A large part of this talk will be
conceptual and is aimed to be widely accessible; parts of the talk will
demonstrate in more detail how assumption-lean regression works in the
context of generalised linear models and Cox proportional hazard models
(Vansteelandt et al., 2022).
References:
Breiman, L. (2001).
Statistical modeling: The two cultures (with comments and a rejoinder by
the author). Statistical science, 16(3), 199-231.
Vansteelandt, S.
(2021). Statistical Modelling in the Age of Data Science. Observational
Studies, 7(1), 217-228.
Vansteelandt, S and Dukes, O. (2022)
Assumption-lean inference for generalised linear model parameters (with
discussion). Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 84(3), 657– 685.
Vansteelandt, S., Dukes,
O., Van Lancker, K., & Martinussen, T. (2022). Assumption-lean Cox
regression. Journal of the American Statistical Association, 1-10.
Using analysis of covariance to improve the efficiency of clinical
trials has a long tradition within drug development and is explicitly
recognised as being a valuable thing to do by regulatory guidelines.
Nevertheless it continues to attract criticism and it also raises
various issues. In this talk I shall look at some of them in
particular.
1. What the difference is between stratification and
analysis of covariance.
2. How this relates to type I and type II
sums of squares.
3. Whether propensity score adjustment is a valid
alternative to analysis of covariance.
4. What problems arise in
connection with hierarchical data.
5. What the Rothamsted approach
teaches us and its relevance to Lord’s paradox.
6. What changes when
we move from common two parameter models, such as the Normal model, to
single parameter models such as the Poisson distribution.
7. Whether
marginal or conditional estimates are generally to be preferred or of
there is a role for both.
8. What care must be taken when considering
covariate by treatment interaction.
I shall conclude that using
covariates wisely does require care but it is valuable and that despite
the general regulatory approval, underused and that it would make a much
bigger contribution to design efficiency than the currently fashionable
topic of flexible designs.
Will appear later
You can find CSS next to the Botanical Garden, 5 minutes from Nørreport station.
Meeting room 5.2.46 is the library of the Biostatistics section, located in building 5, 2nd floor, room 46. See the map below for directions inside CSS.