For whom M L rolls?

# For whom M L rolls?
## Sense and feasibility
### Claus Thorn Ekstrøm<br>UCPH Biostatistics<br>.small[<a href="mailto:ekstrom@sund.ku.dk" class="email">ekstrom@sund.ku.dk</a> ]
### DES, May 20th 2021<br><svg aria-hidden="true" role="img" viewBox="0 0 512 512" style="height:1em;width:1em;vertical-align:-0.125em;margin-left:auto;margin-right:auto;font-size:inherit;fill:#1DA1F8;overflow:visible;position:relative;"><path d="M459.37 151.716c.325 4.548.325 9.097.325 13.645 0 138.72-105.583 298.558-298.558 298.558-59.452 0-114.68-17.219-161.137-47.106 8.447.974 16.568 1.299 25.34 1.299 49.055 0 94.213-16.568 130.274-44.832-46.132-.975-84.792-31.188-98.112-72.772 6.498.974 12.995 1.624 19.818 1.624 9.421 0 18.843-1.3 27.614-3.573-48.081-9.747-84.143-51.98-84.143-102.985v-1.299c13.969 7.797 30.214 12.67 47.431 13.319-28.264-18.843-46.781-51.005-46.781-87.391 0-19.492 5.197-37.36 14.294-52.954 51.655 63.675 129.3 105.258 216.365 109.807-1.624-7.797-2.599-15.918-2.599-24.04 0-57.828 46.782-104.934 104.934-104.934 30.213 0 57.502 12.67 76.67 33.137 23.715-4.548 46.456-13.32 66.599-25.34-7.798 24.366-24.366 44.833-46.132 57.827 21.117-2.273 41.584-8.122 60.426-16.243-14.292 20.791-32.161 39.308-52.628 54.253z"/></svg> <span class="citation">@ClausEkstrom</span>

---

---

# Sorry!

---

# Can Machine Learning Assist Epidemiologists in Drawing Causal Inference?

???

Yes

What is the special role of ML? Hype

What is ML

How can it help us - and what is ML anyway

How? How not?

Pitfalls

---

---

---

# Excerpt from NC on Health Research Ethics

**Protocol**: *The full dataset will be .yellow[analyzed using supervised and unsupervised machine learning methods] to identify associations and patterns in radiological diagnoses that traditional statistical models cannot identify.*

*These associations can be used to explain combinations of factors, where patients are potentially unnecessesary scanned.*

*It is not possible to make a power calculation for this study since there are more factors in play when research is done with machine learning algorithms.*

???
 
 The lack of detail on methods

---

# Proponents

The .yellow[magic] of ML methods:

.pull-left[
*   Allow the data to speak for themselves
*   Better
*   More flexible 
*   Have fewer assumptions
]

.pull-right[
*   Random forest
*   .red[Neural networks]
*   Penalized regression
*   Gradient boosting
*   Logistic regression
*   Algorithms]

But what about causality?

???

Also ... loss function. Optimization ...

---

# Causal Inference and Directed Acyclic Graphs

<div id="htmlwidget-629ba9c25d5646edcc96" style="width:960px;height:300px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-629ba9c25d5646edcc96">{"x":{"diagram":"\n  digraph {\n    bgcolor = \"transparent\"\n    rankdir = LR\n    node [shape = none, fontcolor = \"white\", color = \"white\"]\n    edge [color = \"white\"]\n    A -> {Y M}\n    X -> { A Y }\n    M -> {Y}\n    A[fontcolor=\"yellow\"]\n    Y[fontcolor=\"yellow\"]\n    X[fontcolor=\"gray\"]\n    \n  }","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

Read causal relationships. Can we identify causal effects? Confounders, colliders, conditional independence.

Assumptions untestable. *"Let the DAG be given ..."*

???

DAG

What are the links? Functional relationships

Conditional independencies

Relationships

Causal assumptions

Selection bias - dimension reduction

Algorithm for determining whether or not the causal effect of
A on Y given Z can be identified from the complete records

---

# "Let the data speak ..."

<div id="htmlwidget-dcabe0be6167ad531b3b" style="width:450px;height:100px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-dcabe0be6167ad531b3b">{"x":{"diagram":"\n  digraph {\n    bgcolor = \"transparent\"\n    rankdir = LR\n    node [shape = none, fontcolor = \"white\", color = \"white\"]\n    edge [color = \"white\"]\n    A -> {Y}\n  }","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

<div id="htmlwidget-471b26e24ce354ec5d29" style="width:450px;height:100px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-471b26e24ce354ec5d29">{"x":{"diagram":"\n  digraph {\n    bgcolor = \"transparent\"\n    rankdir = RL\n    node [shape = none, fontcolor = \"white\", color = \"white\"]\n    edge [color = \"white\"]\n    Y -> {A}\n  }","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

Confounders, colliders, ...

]

*   What variables to include?
*   Time? Non-equidistant measures.

*How long since I last visited my GP?*

]

---

# ML is more flexible

Could achieve the same with traditional models.

Price: interpretability

]
.right-column[

]

---

# Non-continuous risk prediction

![](ml_files/figure-html/unnamed-chunk-2-1.png)

???

Ensemble + random forests

Risk prediction

Models

---

# Where can ML play a critical role in CI?

---

# Estimating causal effects with ML

`$$\mathbb{E}(Y | A, X) = \beta_0  + \beta_1 A + \beta_2 X$$`

---

# Estimating causal effects with ML

`$$\mathbb{E}(Y | A, X) = \text{"ML"}$$`

Average Treatment Effect (ATE)

`$$\mathbb{E}_X [ \mathbb{E}(Y | A=1, X) - \mathbb{E}(Y | A=0, X)]$$`
with estimator

`$$\frac{1}{N}\sum_{i=1}^N [ \widehat{\mathbb{E}}(Y | A=1, X_i) - \widehat{\mathbb{E}}(Y | A=0, X_i)]$$`

???

We can think about defining our parameters more flexibly outside the context of a parametric model!

---

# Machine Learning `$g$`-formula algorithm

1.  Estimate `$\mathbb{E}(Y | A, X)$` using our machine learning tool. Even better: an ensemble tool
2.  Set `$A=1$` for all observations and predict outcomes for all
3.  Set `$A=0$` for all observations and predict outcomes for all

`$$\frac{1}{N}\sum_{i=1}^N [ \widetilde{\mathbb{E}}(Y | A=1, X_i) - \widetilde{\mathbb{E}}(Y | A=0, X_i)]$$`

To interpret *causally* (average causal treatment effect) we still need the standard causal assumptions *and* proper models.

---

# Causal discovery / structure learning

*Let the DAG be given ...*

Use ML to discover causal relationships from observational data.

PC algorithm identifies conditional independencies among the variables.

---

# PC algorithm

Input: .yellow[a set of variables].

Output: .yellow[*completed partially directed acyclic graph*] (CPDAG).

Assumptions:

*   The set of observed variables is sufficent
    *   All common causes present in the dataset
    *   Extensions that account for latent variables do exist!
*   The distribution of the observed variables is faithful to a DAG

.caption-right-vertical[Spirtes & Glymour (1991). *An algorithm for fast recovery of sparse causal graphs*. Social Science Computer Review.]

---

# PC algorithm 2

There is an edge `$A − Y$` if and only if `$A$` and `$Y$` are dependent
conditional on every possible subset of the other variables.

`$A \perp Y$`? `$A \perp Y | X$`? `$A \perp Y | M$`? `$A \perp Y | X, M$`?

Number of tests? Prone to statistical mistakes? ML for (conditional) independence testing? Time?

After skeleton: Orient triplets `$X − Y − Z$` as `$X \rightarrow Y \leftarrow Z$` *iff*
`$X$` and `$Z$` are dependent conditional on every set
containing `$Y$`.

**Or use additional information**

---

Use *temporal information* to help orient edges (Temporal PC).

---

# Metropolit Cohort

*   Danish men born in 1953. Followed from birth until 65 yo.

*   Surveys at age 12 and 51. Extensive administrative register data from the Danish national registers. `$N = 2928$`.

*   Consider 33 variables measured in 5 periods over the life course: birth, childhood (age approximately 12), youth (age 18-30), adulthood (age approximately 51), and early old age (age approximately 65).

*   Outcome: clinical depression.

.caption-right-vertical[Osler et al. *Cohort profile: the Metropolit 1953 Danish male birth cohort.* International Journal of Epidemiology]

---

---

# Summary

No inherent benefits for ML wrt causal inference.

Useful in *combination* with existing framework(s) for causal inference. But no free lunch.

*   Machine learning provides a useful alternative/addendum to modeling.
*   Ideas in ML force us out of the old go-to techniques.
*   Improved algorithms can perhaps make approaches feasible.

We still need to **think**. Field-knowledge is ever-more crucial.