Statistical methods in bioinformatics

Welcome to the 2025 version of the Statistical methods in bioinformatics course held at the University of Copenhagen. This website serves as your central hub for all course-related information, resources, and materials.

In this course, we will explore statistical and analytical methods applicable for both generic and specific problems in bioinformatics. Through a combination of lectures and hands-on activities, you will gain a deep understanding of statistical methods used in bioinformatics research and develop practical skills that you can apply in various contexts.

Schedule

Please refer to the schedule below for an outline of topics covered each day. This schedule is subject to change, so be sure to check back regularly for updates. The course will start on Monday, April 28th and end on Friday, May 2nd.

Day 1	Introduction to statistical methods for high-dimensional data. Multiple testing, linear models and regularization methods
Day 2	Analysis of RNA sequencing data
Day 3	Genome-wide association studies
Day 4	Network biology
Day 5	Advanced correlations, zero-inflated and hurdle models compositional data, and integrated data analysis

Each day will take place at CSS (Kommunehospitalet), and we will be in room 7.0.01 (building 7, ground floor, room 01) on all days (see map at the end of this document).
The course will generally run every day from 8.15 until around 15. The format of the individual days might vary slightly so be prepared to be flexible as possible.
Teachers for the course will be Claus Thorn Ekstrøm, Nadezhda Tsankova Doncheva, and Stefan Seemann.

Course Materials:

You can access the syllabus, lecture slides, assigned readings, and additional resources under each of the days. We will not be following a specific textbook closely but recommend Advances in Statistical Bioinformatics: Models and Integrative Inference for High-Throughput Data and Regression with linear predictors.

Learning objectives

Bioinformatics is concerned with the study of inherent structure of biological information and statistical methods are the workhorses in many of these studies. Some of this inherent structure is very obvious and can be observed directly through correlations of patterns in high-dimensional data, while other patterns arise through more complicated underlying relationships.

This course covers some of the statistical models and methods suitable for analyzing high dimensional data - in particular high dimensional data that rely heavily on statistical methods The course will contain of equal parts theory and applications and consists of five full days of teaching and computer lab exercises. It is the intention that the participants will have a thorough understanding of the statistical methods and are able to apply them in practice after having followed this course.

A student who has met the objectives of the course will be able to:

Analyze data from a bioinformatics experiment using the methods described below and draw valid conclusions based on the results obtained.
Understand the advantages/disadvantages of the methods presents and be able to discuss potential pitfalls from using these methods.
Discuss and develop new methods that can be used to analyze novel types of bioinformatics data.

Get started

Before the course starts you should make sure that you have installed the latest version of:

R and have installed the packages: kinship2, coxme, glmnet, MASS, MESS, cluster, data.table, lme4, minerva, and PTAk. This can be done from inside R using the command:
```
install.packages(c("kinship2", "glmnet", "coxme", "MESS", "MASS",
                   "cluster", "data.table", "lme4", "minerva", "PTAk"))
```
R Studio is also highly recommended but is not necessary.

We also need some packages from BioConductor, namely edgeR, DESeq2,vsn,org.Hs.eg.db, and RCy3. These can be installed by running the following R chunk:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
requiredPackages = c('edgeR','DESeq2','vsn','org.Hs.eg.db', 'RCy3')
for (p in requiredPackages) {
    if(p %in% rownames(installed.packages()) == FALSE) {BiocManager::install(p)}
}
requiredPackages = c('ggplot2','dplyr','NMF','grDevices')
for (p in requiredPackages) {
   if(p %in% rownames(installed.packages()) == FALSE) {install.packages(p)}
}

Cytoscape
You could also try to install plink but that is not strictly necessary.

Installation instructions are available on the pages above.

Extra software might be installed through the course so make sure you have administrator/root access to your computer.

Map of CSS

See the following map for the location of CSS. The room we will be in will be roughly located under the “C” in the text “Københavns Universitet Center for Sundhed og Samfund” (at least if you do not zoom in or out).

Claus Ekstrøm 2025