Benjamin Skov Kaas-Hansen
Let \((\Omega, F, P)\) be a measure space. If \(P(E)\) is the probability of an event \(E\), then \((\Omega, F, P)\) is a probability space with sample space \(\Omega\), event space \(F\) and probability measure \(P\). Three axioms:
The probability of an event is a non-negative real number: \(P(E) \in \mathbb{R} \text{ and } P(E) \geq 0 \quad \forall E \in F\)
At least one of the events in the entire set \(\Omega\) will occur: \(P(\Omega) = 1\)
Any countable sequence of disjoint (= mutually exclusive) sets \(E_1, E_2, ..., E_\infty\) satisfies: \(P \left( \bigcup_{i=1}^\infty E_i \right) = \sum_{i=1}^\infty P(E_i)\)
Flip a coin. Tail: 1 up. Head: 1 down. Where do we end?
Good — then let’s start over
A number between 0 and 1 which encompasses my (our?) statement about uncertainty/certainty
1 is complete certainty that something is the case
0 is complete certainty that something is not the case
Subjective measure. Can we ever be completely certain about something in the real world?
Very pragmatic and applied
Get real experience with Bayesian analysis
We won’t get lost in technical details
Off-the-shelve software: as easy as other statistical models
Study design and data collection ≠ data analysis
Don’t be afraid of Bayesian analyses
Describe the applications of Bayesian analysis
Express prior belief with probability distributions
Fit Bayesian GLMs
Complete the five steps of BDA
Follow the guidelines for reporting Bayesian analyses
Follows the 5 steps of Bayesian analyses
Characterise data
Define appropriate data model
Specify priors
Update parameters in light of new data
Diagnostics (convergence and posterior predictive checks)
Frequent breaks. You present the exercises to each other.
Medical doctor with MSc and PhD
Postdoc in intensive care
Bayesian platform trial (methods and data science)
Mainly real-world evidence (routine-care and PRO data)
“The Bayesian approach is tailored to decision making.
Designing a clinical trial is a decision problem.
Drawing a conclusion from a trial (…) is a decision problem.
Allocating resources among various research projects is a decision problem.
Stopping drug development is a decision problem.”
\(P(\theta | D) \propto P(D | \theta) \times P(\theta)\)
posterior \(\propto\) likelihood \(\times\) prior
Start with a belief about something
Observe the world to learn something new
Update your brief in light of these new observations
We make probability statements about model parameters
Derived quantities follow directly (as we’ll see)
What is the average weight of patients?
What is the effect of this treatment? Is it better than placebo?
Will our capacity be exhausted tomorrow?
Formalism for human intuition
Priors express pre-existing knowledge (or expectations)
Easier to build complex models
Use priors to regularise estimation
Modelling uncertainty enables decision analysis
Seen as complicated and “mathy”
Erroneous idea that Bayesian analyses are subjective (and, thus, unreliable) and frequentist are objective
It’s a bit more tricky than conventional analyses
Devising reasonable priors not trivial
You force yourself to formalise existing knowledge or lack thereof
Frequentist methods the “go-to” in health research
The proportion of women among patients at a 22-bed ICU on a given day? Prior: Equal distribution between women and men. Data: 8 women. The next day?
What might be wrong with this conclusion? What do we assume?
RCT with 2 arms
We’re interested in the probability of death (= mortality risk) \(\theta\) in each arm
Count outcome => Binomial likelihood
Outcomes across patients assumes independent
ECMO | Control | |
---|---|---|
Deaths | \(z_\text{ecmo}\) | \(z_\text{ctl}\) |
Survivors | \(N_\text{ecmo}\) | \(N_\text{ctl}\) |
Risk | \(\frac{z_\text{ecmo}}{N_\text{ecmo}}\) | \(\frac{z_\text{ctl}}{N_\text{ctl}}\) |
\(N\) independent trials
Two possible outcomes: success (1) and failure (0)
\(z\) is the number of successes observed
Each trial has the same probability of success = \(\theta\)
The posterior distribution is the same type as the prior
Over-simplistic for any realistic analysis
Prior: \(\theta_\text{prior} \sim\) Beta(\(\alpha\), \(\beta\))
Likelihood: \(z \sim\) Binomial(\(N\), \(\theta_\text{prior}\))
Posterior: \(\theta_\text{posterior} \sim\) Beta(\(\alpha + z\), \(\beta + n - z\))
Naive (but correct) approach for Binomial likelihood:
Assume prior distribution of \(\theta\) (any will do)
Draw one \(\theta'\)from prior distribution
Draw a sample outcome \(z'\) given \(\theta'\)
Keep \(\theta'\) if \(z'\) is the same as the observed \(z\)
Repeat 2. through 4. a large number of times
n_samples <- 1000
z_observed <- 8 # no. women
N_observed <- 22 # no. beds
theta_prime <- runif(n_samples, 0, 1)
z_prime <- rbinom(n_samples, N_observed, theta_prime)
posterior_samples <- theta_prime[z_prime == z_observed]
median_95CrI <- quantile(posterior_samples, probs = c(0.025, 0.5, 0.975))
(HPDinterval(as.mcmc(posterior_samples)))
lower upper
var1 0.1739939 0.4917233
attr(,"Probability")
[1] 0.9473684
Non-informative priors don’t exist
Even entirely uniform priors carry information about your expectations
Uniform priors are implicit in frequentist results
Published results (RCTs, observational studies)
Domain experts
Logical conclusions (RR > 10 very unlikely)
Regularisation of estimates (e.g. lasso or ridge regression)
Beta distribution obvious choice: flexible and easy to intuit
Task: Expecting 30% proportion in a certain group and the lower bound of the 95% CrI lying at 10%, we would like to devise the corresponding beta prior.
Recall that coefficients in logistic regressions are on the log-OR scale.
So how do we express meaningful and realistic priors for such coefficients?
For a 2x2 table we have that the approximate SD of the log-OR is given by,
\(\qquad \sigma = \sqrt{\frac{1}{z_0} + \frac{1}{N_0 - z_0} + \frac{1}{z_1} + \frac{1}{N_1 - z_1}}\)
If we assume the same event proportion \(r\) in both arms and a total of \(N = N_0 + N_1\) patients, we see that,
\(\qquad z_0 = z_1 = \tfrac{1}{2} N r, \quad N_0 - z_0 = N_1 - z_1 = \tfrac{1}{2} N (1-r)\)
Rearranging yields,
\(\qquad \sigma = \sqrt{\frac{4}{N (1-r)} + \frac{4}{Nr}}\)
making it easy to specify a standard deviation \(\sigma\) for a prior to express no difference in a trial with a certain event proportion and of a certain size \(N\)
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 1).
Chain 1:
Chain 1: Gradient evaluation took 0.000146 seconds
Chain 1: 1000 transitions using 10 leapfrog steps per transition would take 1.46 seconds.
Chain 1: Adjust your expectations accordingly!
Chain 1:
Chain 1:
Chain 1: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 1: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 1: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 1: Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 1: Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 1: Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 1: Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 1: Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 1: Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 1: Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 1: Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 1: Iteration: 4000 / 4000 [100%] (Sampling)
Chain 1:
Chain 1: Elapsed Time: 0.187577 seconds (Warm-up)
Chain 1: 0.138795 seconds (Sampling)
Chain 1: 0.326372 seconds (Total)
Chain 1:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 2).
Chain 2:
Chain 2: Gradient evaluation took 1.9e-05 seconds
Chain 2: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
Chain 2: Adjust your expectations accordingly!
Chain 2:
Chain 2:
Chain 2: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 2: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 2: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 2: Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 2: Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 2: Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 2: Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 2: Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 2: Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 2: Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 2: Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 2: Iteration: 4000 / 4000 [100%] (Sampling)
Chain 2:
Chain 2: Elapsed Time: 0.119801 seconds (Warm-up)
Chain 2: 0.107132 seconds (Sampling)
Chain 2: 0.226933 seconds (Total)
Chain 2:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 3).
Chain 3:
Chain 3: Gradient evaluation took 1.9e-05 seconds
Chain 3: 1000 transitions using 10 leapfrog steps per transition would take 0.19 seconds.
Chain 3: Adjust your expectations accordingly!
Chain 3:
Chain 3:
Chain 3: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 3: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 3: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 3: Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 3: Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 3: Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 3: Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 3: Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 3: Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 3: Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 3: Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 3: Iteration: 4000 / 4000 [100%] (Sampling)
Chain 3:
Chain 3: Elapsed Time: 0.137662 seconds (Warm-up)
Chain 3: 0.109331 seconds (Sampling)
Chain 3: 0.246993 seconds (Total)
Chain 3:
SAMPLING FOR MODEL 'continuous' NOW (CHAIN 4).
Chain 4:
Chain 4: Gradient evaluation took 1.4e-05 seconds
Chain 4: 1000 transitions using 10 leapfrog steps per transition would take 0.14 seconds.
Chain 4: Adjust your expectations accordingly!
Chain 4:
Chain 4:
Chain 4: Iteration: 1 / 4000 [ 0%] (Warmup)
Chain 4: Iteration: 400 / 4000 [ 10%] (Warmup)
Chain 4: Iteration: 800 / 4000 [ 20%] (Warmup)
Chain 4: Iteration: 1200 / 4000 [ 30%] (Warmup)
Chain 4: Iteration: 1600 / 4000 [ 40%] (Warmup)
Chain 4: Iteration: 2000 / 4000 [ 50%] (Warmup)
Chain 4: Iteration: 2001 / 4000 [ 50%] (Sampling)
Chain 4: Iteration: 2400 / 4000 [ 60%] (Sampling)
Chain 4: Iteration: 2800 / 4000 [ 70%] (Sampling)
Chain 4: Iteration: 3200 / 4000 [ 80%] (Sampling)
Chain 4: Iteration: 3600 / 4000 [ 90%] (Sampling)
Chain 4: Iteration: 4000 / 4000 [100%] (Sampling)
Chain 4:
Chain 4: Elapsed Time: 0.107725 seconds (Warm-up)
Chain 4: 0.098684 seconds (Sampling)
Chain 4: 0.206409 seconds (Total)
Chain 4:
Default weak priors: \(N(0, 2.5)\) for the intercept and \(N(0, 2.5)\) otherwise.
Priors for model 'f1'
------
Intercept (after predictors centered)
Specified prior:
~ normal(location = 30, scale = 2.5)
Adjusted prior:
~ normal(location = 30, scale = 41)
Coefficients
Specified prior:
~ normal(location = 0, scale = 2.5)
Adjusted prior:
~ normal(location = 0, scale = 6.4)
Auxiliary (sigma)
Specified prior:
~ exponential(rate = 1)
Adjusted prior:
~ exponential(rate = 0.061)
------
See help('prior_summary.stanreg') for more details
Model Info:
function: stan_glm
family: gaussian [identity]
formula: Volume ~ Height
algorithm: sampling
sample: 8000 (posterior sample size)
priors: see help('prior_summary')
observations: 31
predictors: 2
Estimates:
mean sd 2.5% 50% 97.5%
(Intercept) -86.4 30.2 -146.8 -86.7 -25.7
Height 1.5 0.4 0.7 1.5 2.3
sigma 13.8 1.8 10.8 13.6 18.0
Fit Diagnostics:
mean sd 2.5% 50% 97.5%
mean_PPD 30.1 3.5 23.2 30.1 37.0
The mean_ppd is the sample average posterior predictive distribution of the outcome variable (for details see help('summary.stanreg')).
MCMC diagnostics
mcse Rhat n_eff
(Intercept) 0.4 1.0 7236
Height 0.0 1.0 7357
sigma 0.0 1.0 5966
mean_PPD 0.0 1.0 7395
log-posterior 0.0 1.0 3235
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
For all their usefulness, probabilistic results and posterior draws are all conditional on the assumptions you put into the full model.
No different from frequentist results—but because Bayesian analyses explicate these assumptions, these results tend to come across as more prone to the whim of the researcher.
control
argument in ?rstan::stan
)expit <- function(x) exp(x) / (1 + exp(x))
N <- 1000
N_women <- 0.4 * N
N_men <- N - N_women
set.seed(4131)
tibble(
id = seq_len(N),
treatment = sample(0:1, N, replace = TRUE),
age = round(runif(N, 18, 85)),
woman = c(rep(1, N_women), rep(0, N_men)),
sepsis = c(sample(0:1, N_women, TRUE, c(0.3, 0.7)), sample(0:1, N_men, TRUE)),
icu_los = round(c(
rlnorm(N_women, log(5 + sepsis), log(5)),
rlnorm(N_men, log(7 + sepsis), log(4))
), 2),
hosp_los = round(pmax(icu_los, icu_los * (3.5 + rnorm(N, 0, 0.5))), 2)
) %>%
filter(icu_los < 365 | hosp_los < 365) %>%
mutate(
dead_at_1y = 0L + (0.5 < expit(-18 - 0.5 * treatment + 0.3 * age - 0.2 * woman + 0.4 * sepsis))
) %>%
# summarise(mean(dead_at_1y))
write_tsv("data/icu.tsv")
Description: We would like to estimate the association between length of stay at the ICU (ICO-LOS) and the total hospital length of stay (hosp-LOS). We know the biological sex of the patient and whether they were admitted with sepsis or not.