Prediction in the presence of missing values is a complex and still
poorly understood problem, particularly when future records also contain
missing values. Mertens, et al. (2020) demonstrate that with non-linear
models (such as logistic regression or Cox survival) and when using
imputations, averaging of multiple predictions obtained from distinct
models fitted on imputed data should be preferred to use of pooled
models. Imputation is often regarded as computationally cumbersome
however. It also tends to be poorly understood by applied researchers
utilizing statistical methods. For such reasons, the method is often
avoided. This raises the question whether other approaches could
reasonably be used to handle missing values in prediction
problems.
In this talk we contrast predictive averaging with some
potential alternatives, such as complete-case-based model calibration
(CC) as well as use of missing-indicator (IDX) and Pattern Submodel (PS)
approaches. Connections between these methods are discussed. We focus on
the problem of risk prediction. Simulations are used to ensure knowledge
of the true risk in a comparison of prediction performance between
methods. We demonstrate that only predictive averaging guarantees
required coverage levels in prediction. Measures such as Brier scores or
AUC would seem to strongly favour IDX and PS methods however. We argue
that this is due to the biased nature of these methods, which (Brier)
scoring or AUC measures cannot correct for. Beyond potential concerns
about the CC, IDX and PS methods, this raises broader concerns on how
prediction method performance should be measured in the presence of
missing values.
You can find CSS next to the Botanical Garden, 5 minutes from Nørreport station.
Meeting room 5.2.46 is the library of the Biostatistics section, located in building 5, 2nd floor, room 46. See the map below for directions inside CSS.
