Upcoming seminars

Monday, November 10, 15:00

Bart J. A. Mertens
Department of Biomedical Data Sciences, Leiden University Medical Centre, Leiden, The Netherlands
Prediction in the presence of missing values. Are there credible alternatives to imputation-based use of the predictive density?

Prediction in the presence of missing values is a complex and still poorly understood problem, particularly when future records also contain missing values. Mertens, et al. (2020) demonstrate that with non-linear models (such as logistic regression or Cox survival) and when using imputations, averaging of multiple predictions obtained from distinct models fitted on imputed data should be preferred to use of pooled models. Imputation is often regarded as computationally cumbersome however. It also tends to be poorly understood by applied researchers utilizing statistical methods. For such reasons, the method is often avoided. This raises the question whether other approaches could reasonably be used to handle missing values in prediction problems.

In this talk we contrast predictive averaging with some potential alternatives, such as complete-case-based model calibration (CC) as well as use of missing-indicator (IDX) and Pattern Submodel (PS) approaches. Connections between these methods are discussed. We focus on the problem of risk prediction. Simulations are used to ensure knowledge of the true risk in a comparison of prediction performance between methods. We demonstrate that only predictive averaging guarantees required coverage levels in prediction. Measures such as Brier scores or AUC would seem to strongly favour IDX and PS methods however. We argue that this is due to the biased nature of these methods, which (Brier) scoring or AUC measures cannot correct for. Beyond potential concerns about the CC, IDX and PS methods, this raises broader concerns on how prediction method performance should be measured in the presence of missing values.

Map of CSS

You can find CSS next to the Botanical Garden, 5 minutes from Nørreport station.


Meeting room 5.2.46 is the library of the Biostatistics section, located in building 5, 2nd floor, room 46. See the map below for directions inside CSS.