This is the 2019 website for the course Advanced Statistical Topics in Health Research B held by the University of Copenhagen.

## Practical information

• The course will start on Monday, May 5th and end on Thursday, May 8th.
• Each day will take place at CSS (Kommunehospitalet), and we will be in room 7.0.01 (building 7, ground floor, room 01) (see map at the end of this document). Final information about the room will be added Monday, April 29th.
• The course will generally run every day from 8.15 to 15 with a lunch break in between (see the programme for more information).
• Teachers for the course will be Claus Thorn Ekstrøm, Benoit Liquet, and Anne Helby Petersen.

Additional information will be given on the first day.

We will be following chapters in Computer Age Statistical Inference: Algorithms, Evidence, and Data Science somewhat closely. A pdf-copy of the book can be downloaded from the author’s website.

## Learning objectives

Many modern research projects collect data and use experimental designs that require advanced statistical methods beyond what is taught as part of the curriculum in introductory statistical courses. This course covers some of the more general statistical models and methods suitable for analyzing more complex data and designs encountered in health research such as methods for high-dimensional data, classification, imputation, and dimension reduction.

The course will contain equal parts theory and applications and consists of four full days of teaching and computer lab exercises. It is the intention that the participants will have a thorough understanding of the statistical methods presented and are able to apply them in practice after having followed the course. This course is aimed at health researchers with previous knowledge of statistics and the computer language R who need of an overview about appropriate analytical methods and discussions with statisticians to be able to solve their problem.

A student who has met the objectives of the course will be able to:

• Analyze data using the methods presented and be able to draw valid conclusions based on the results obtained.
• Understand the advantages/disadvantages of the methods presented and be able to discuss potential pitfalls from using these methods.

## Laptops and software

You must bring a laptop as we will not have access to the computer rooms at the university.

### Installation

Before the course starts you should make sure that you have installed the latest version of:

• R

• R Studio is also highly recommended but is not necessary.

• Download and install Anaconda. During the installation, you don’t need to check any boxes or choose anything but the default installation options.

• We will be working with a dataset about patients suffering from metastatic castrate-restistant prostate cancer. In order to access the data, you need to register at Project Data Sphere. Fill out a request for data access. This is quite important to do this early (as soon as possible) in order to get access to the data we will be using for the exercises.

Note that you do not have to fill out “Research Description” and “Research Goals”. If you do wish to describe what you will use the data for, you can mention that you will be utilizing machine learning methods to predict survival and treatment discontinuation for prostate cancer patients in lines with the Prostate Cancer DREAM Challenge goals.

• We will be using a few specialized R packages that you need to install prior to arriving. Those are necessary for us to run the exercises. The code below should be run in R (e.g. by opening RStudio and copying them into the console) to install the packages that we will need. You need to be connected to the internet to install the packages. If you are a Windows user, you may need to run RStudio as an administrator in order to install the packages. This can be done by right-clicking the RStudio program icon and choosing “Run as administrator”.

install.packages(tidyverse)
install.packages(devtools)
devtools::install_github("rstudio/keras")

If all of the above lines of code have run succesfully, you must continue to run the final two lines:

library("keras")
install_keras()

and now you should be able to run the following line of code without error messages:

model <- keras_model_sequential()

If you are a Windows user and the install_keras() command produces an error message, it may be because it has problems finding your Anaconda-installation. In that case, the following might work:

library(keras)
install_keras(conda = "C:/ProgramData/Anaconda3/condabin/conda.bat")
model <- keras_model_sequential()

If no error messages are returned, you are all set for installing keras!

Next step is installing stan for R. Install the rstan package for R as per these instructions.

Once rstan is installed we need the rethinking package. That package is not on CRAN so we will install it directly from github. You do that by running the following three lines:

install.packages(c("devtools","mvtnorm","loo","coda"), repos="https://cloud.r-project.org/",dependencies=TRUE)
library("devtools")
install_github("rmcelreath/rethinking")

There will be wireless internet access for the participants. If you already have an eduroam account then it will work throughout University of Copenhagen.

## Map of CSS

See the following map for the location of CSS.

Claus Ekstrøm 2019