Bayesian statistics and inference

To read: Chapter 2 and Chapter 3 in Computer Age Statistical Inference. You could also glance at Chapters 6 and 13.
- Credibility intervals, prior and posterior distributions
- Bayesian classifiers
- Markov-chain Monte Carlo (MCMC) estimation
- Empirical Bayes estimators
- R code to see Bayesian updates in the beta-binomial model
- Slides

Exercises

All of these exercises use the R program package. We will be needing the MESS, isdals, rstan, brms, rstanarm, and bayesplot packages, which can all be installed using the install.packages() function.

install.packages(c("isdals", "rstan", "brms", "rstanarm", "bayesplot", "MESS"))

The exercises are build up over two types of problems: introductory text that you should run through to make sure you understand how to use the function(s) and interpret the output and a few additional questions for you to explore and discuss.

Understanding Bayesian analysis “by hand”

In this exercise we will take the example with the horoscopes from the lecture and investigate what happens to the conclusions from a Bayesian analysis when we modify the prior, and let the number of observations increase. We will also investigate why this straight-forward might not be the most efficient approach for Bayesian analysis and we need specialized software.

Run the code below and make sure you understand what is going on in each of the single lines.

nsim <- 1000           # The number of samples to generate
ssize <- 84            # The sample size (horoscopes)
observed_succes <- 27  # The number of obseved correct guesses

theta <- runif(nsim)   # Draw from prior distribution
res <- rbinom(nsim, size=ssize, prob=theta)
keep <- theta[res==observed_succes]

Plot a histogram of the \(\theta\)s that you have kept (i.e., the thetas that gave rise to an outcome matching our original outcome).
Compute the median posterior and provide a guess of the MAP.
Now change your prior distribution such that it is not uniform. You are free to pick any non-uniform distribution, but note that it should only result in values between 0 and 1 since the value will be used subsequently as a probability. [Hint: a beta-distribution could be interesting. Use the rbeta() function and select some interesting shape parameters.]
Draw a histogram or density of your prior distribution.
Run the same algorithm as above. Plot the posterior distribution, and guess the MAP and median posterior value.
State a conclusion of your analysis in words.
How did the results change compared to the initial analysis? Why?
How many thetas are part of the posterior distribution? How does that compare to nsim? What is the problem here?
Now assume that we had had 10 times the number of observations but with the same rate of success: ssize <- 840 ; observed_succes <- 270. Rerun the analysis with your own prior and compare the results from this larger dataset to the previous analysis. How are they similar? How are they different?

Using rstanarm

We will now perform the same analysis as we did above but using the rstanarm package. We will start by loading the relevant package and create a dataset that can be used for logistic regression.

library("rstanarm")   # Load package
library("bayesplot")  # Load package
DF <- data.frame(correct=rep(0:1, times=c(84-27, 27)))

Use the stan_glm() function to fit a logistic model. Remember that a logistic model fits parameters on the log odds scale. [Hint: remember that glm() requires argument family=binomial to run a logistic regression]
Extract the posterior values and plot the posterior distribution of the intercept. [Hint: use as.matrix() and mcmc_areas()]
Compute the MAP and median posterior on the probability scale. [Hint: recall that \(g(x) = \frac{\exp(x)}{(1 + \exp(x))}\) is the inverse function to the logistic transform.]
Note that stan_glm() uses a zero-mean normal distribution as prior for the intercept. Thus, we might not get the same posterior distribution as we did above. How could we modify the prior distribution to make it resemble a uniform distribution? Try to refit the model with this uniform distribution [Hint: look at the prior_intercept argument and this website]
Try to change the prior distribution to, say, normal(0, .1) and rerun the analysis. How does the conclusions change? Why?
Try your own prior and see how it influences the result.
Now assume that we had had 10 times the number of observations but with the same rate of success: ssize <- 840 ; observed_succes <- 270. Rerun the analysis with your own prior and compare the results from this larger dataset to the previous analysis. How are they similar? How are they different?

Relying on the prior

Assume that you are working a production faciclity with quality control of a particular drug. Your want to estimate the probability that the production will fail. In the previous year 365 batches were produced and none of them failed.

Estimate the probability of failure based on a frequentist analysis of the data. What is your estimated proportion? What is your forecast of failures for the next year? Can any happen?
Use a Bayesian modeling approach to estimte the probability of failure for the next year. How/why can the conclusion change? Which result is more reasonable?

Multiple linear regression

In this exercise we will use the fev dataset from the isdals package. The data frame contains results from a study of 600+ children and their lung capacity. The primary hypothesis is to investigate how smoking influences the lung capacity, FEV. The data also contains information on Age, height (Ht) in inches, Gender, and Smoke

Load the package and get the data:
```
library("isdals")
data(fev)
```
Analyze the data with a regular frequentist model. What are your overall conclusions? What are your conclusions about the impact of smoking on lung capacity?
Analyze the same model using a Bayesian analysis with the stan_glm() function. Make sure you have a reasonable prior.
Examine the fit of the model. Convergence okay?
Use shinystan to investigate the model.
Increase the number of chains to, say, 8 and the double the number of iterations. Examine the fitted model (for example with ’´shinystan`). How did the modifying the number of chains and iterations change the results. What could changing these potentially fix?
Look at the posterior distribution of the smoking parameters. What is your conclusion about smoking. What is the probability that the smoking effect is greater than zero?
Now fit the same model with the brm() function from the brms package. Look at the output from this model, and how it differs from the stan_glm model. What extra information do you get here?
Use the get_prior() function on a model formula to see the parameters that can have their priors changed. Class b is about the mean parameters.

If we first define the priors then they can be included in brm using the prior argument. Tweak the following code to work with the fev data:

## define some priors          
prior <- c(prior_string("normal(0,10)", class = "b"),   # Default example
           prior(normal(1,2), class = b, coef = treat), # Specific for treat
           prior(cauchy(0,2), class = "sigma"))         # For the variance

Rerun the analysis with the updated priors (of your choice). Check the model convergence.
We can use leave-one-out-cross-validation information criterion (IC) to see how well different models predict the outcome. These LOO can then be compared. Like other IC, lower is better. Importantly, the LOOIC comes with a standard error. Fit a model without Smoke and compare the LOOIC between the original model and the reduced model.

Analyzing data from the Titanic

Load the data. This data frame contains information on the Class, Sex, Age (binary), and Survived. The data is stored as a 4-dimensional array so we need to expand it to have one row per individuals on the ship. This can be accomplished with the expand_table() from the MESS package.
```
library("MESS")
data(Titanic)
ship <- expand_table(Titanic)
```
Analyze the data with a regular frequentist logistic regression model. What are your conclusions? [Hint: you can use the glm() function.]
Analyze the model using a Bayesian approach. Remember to consider your priors and check model convergence.
What are the conclusions from the Bayesian analysis?

(Quasi)-complete separation

Complete separation happens when the outcome variable separates a predictor variable or a combination of predictor variables completely. This is a common problem in logistic regression. When such a dataset is analyzed using standard frequentist meethods with maximum likelihood then the estimate for the separating variable does not exist. In this exercise we will see what happens when the data is analyzed using Bayesian statistics.

Load the data frame qsep using the command

load(url("https://biostatistics.dk/teaching/advtopicsB/data/qsep.rda"))

Analyze y as a function of x and g using a standard logistic regression model. Notice the size and standard error of the variable x.
Analyze the same data with Bayesian model with a vague prior and look at the estimate and standard error. How (and why) does the Bayesian analysis circumvent the problem with complete separation.

stan

Analyze the full data with the lung capacity as a stan model and run it. Argue for all the choices that you have made so far regarding priors etc. Recall from the lecture that a basic stan program contains three sections: data, parameters, and model.

Below is the sample code for a standard linear regression analysis. Modify this code to run the full analysis (see hints below).

data {
  int<lower=1> N;
  vector[N] y;
  vector[N] x;
}

parameters {
  real alpha;
  real beta;
  real<lower=0> sigma;      // Note lower limit
}

model {
  alpha ~ normal(0, 10);    // Prior 
  beta ~ normal(0, 10);     // distributions 
  sigma ~ cauchy(0, 2.5);   // defined here
  y ~ normal(alpha + beta * x, sigma);
}

Hint:

Start by including and using the two numeric variables first.
The include the categorical variables as numeric 0-1 variables. This works because the categorical predictors are binary.

2024

Introduction to Bayesian analysis