class: center, middle, inverse, title-slide # Big data ## the inevitable problem with overdiagnosis ### Claus Thorn Ekstrøm
UCPH Biostatistics
.small[
ekstrom@sund.ku.dk
] ### August 21st, 2018
.small[Slides @
biostatistics.dk/talks/
] --- background-image: url(pics/ghostbusters.jpg) background-size: 100% --- background-image: url(pics/iwgsmeeting.png) background-size: 100% --- class: inverse, center, middle # "With enough data, the numbers speak for themselves." --- background-image: url(pics/ifikserbillede.png) --- # The lure of big data .pull-left[ Promises * Absurdly accurate results * Everything can be captured * The correlations will tell all - causality less important * Statistical models not needed ] .pull-right[ For genomics we have * Cheap to collect * Likely to influence multiple diseases * Stable ] --- background-image: url(pics/bigdatateenagesex.jpg) background-size: 80% --- # How are big data currently used? 1. Larger samples - better coverage 2. More precise diagnoses 3. Earlier scans 4. Scan for multiple features/diseases/abnormalities 5. Lifetime trajectories 7. Redefine disease definitions --- # Large samples Better coverage. `\(N=\)` all? `\(P\)` massive ![](preventing-overdiagnosis-2018_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- # More precise diagnoses *"If screening positive do I have the disease?"* Positive predictive value: `$$\begin{split}PPV &= P(\text{disease} | \text{screen positive}) \\ &= \frac{P(\text{screen positive}|\text{disease}) \cdot P(\text{disease})}{P(\text{screen positive})} \\ &= \frac{\text{sensitivity} \cdot P(\text{disease})}{\text{sens.} \cdot P(\text{disease}) + (1-\text{spec.}) \cdot P(\text{no disease})} \end{split}$$` --- # Honest PPV *"If screening positive do I have a disease that will give me problems?"* `$$\begin{split} hPPV &= P(\text{problematic disease} | \text{screen pos.}) \\ & = P(\text{prob. dis.} | \text{scr pos., disease})\cdot P(\text{dis.}|\text{scr pos.}) \\ & + P(\text{prob. dis.} | \text{scr pos., no dis.})\cdot P(\text{no dis.}|\text{scr pos.}) \end{split}$$` --- # Honest PPV *"If screening positive do I have a disease that will give me problems?"* `$$\begin{split} hPPV &= P(\text{problematic disease} | \text{screen pos.}) \\ & = \underbrace{P(\text{prob. dis.} | \text{scr pos., disease})}_{(1-OD)}\cdot \underbrace{P(\text{dis.}|\text{scr pos.})}_{PPV} \\ & + \underbrace{P(\text{prob. dis.} | \text{scr pos., no dis.})}_{0}\cdot \underbrace{P(\text{no dis.}|\text{scr pos.})}_{} \\ &= (1-OD)\cdot PPV \end{split}$$` --- background-image: url(pics/inature.png) background-size: 90% --- # Multiple features ![](preventing-overdiagnosis-2018_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- # Lifetime trajectories
--- class: inverse, middle, center # Modeling over-diagnosis --- # Step 1: Redefine the outcome(s) When do you *really* have the disease? Diagnosis `\(\leadsto\)` Treatment `\(\leadsto\)` No problem -- Diagnosis `\(\leadsto\)` No treatment `\(\leadsto\)` No problem -- What is *"no problem"*? No amount of big data + machine learning techniques will fix this! --- background-image: url('pics/graph.png') background-size: 90% class: top # .black[Step 2: Define a loss function] --- # Step 3: Timing *When* do we have a diagnosis? Multiple outcomes ...
Competing risks --- # Step 3: Timing *When* do we have a diagnosis? Multiple outcomes ...
Competing risks ... of overdiagnosis *If I am highly likely to die from XXX in the next 3 weeks should I worry about YYY?* --- background-image: url(pics/ghostbusters.jpg) background-size: 100% class: middle, center # The beams MUST cross