Exercise session 3

Overview

The goals of this exercise session are to:

Try fitting a neural network
Comparing performance on training data and test data

Don’t forget to add your results to the score board!

Preparing the data for neural networks

If you are working with neural networks, you need a bit of extra data prepping: - Scale the data (for each variable, subtract the training data mean, divide by the training data standard deviation) - Convert input data to format - Convert labels to “one hot deck encoding” (two dummies).

All this is done for you if you run:

source("prepDataForNNs.R")

and now use NN_traindata_x, NN_traindata_DEATH2YRS, NN_testdata_x, NN_testdata_DEATH2YRS, … in the following exercises.

3.1. The first neural network

Below, we fit a “simple” neural network, Mindy, that uses all of the available features (i.e. x-variables) as input and contains a single hidden layer. More specifically, she will have the following structure:

An input layer that takes in the 91 variables in the training data.
A hidden layer that consists of 91 neurons and that uses a sigmoid activation function to pass its output on to the next layer.
An output layer that gives each observation two probabilities. The first probability is the estimated probability of that observation having label 0, the second probability is the estimated probability of that observation having label 1.

We will use 30 epochs to train her and will use batches of 10 observations for each update of the weights.

3.1.1. Go through the code below

Below, we define Mindy. Run the code line by line and make sure you understand roughly what is happening in each step.

#Open the keras package for neural networks
library(keras)

#define Mindy and compile her (i.e. make her ready to be trained) 
mindy <- keras_model_sequential()
  
  #Build model structure
  mindy %>%
    layer_dense(units = 91, 
                activation = 'sigmoid', input_shape = 91) %>% 
    layer_dense(units = 2, activation = "softmax")
  
  #Compile: choose settings for how she will be trained
  mindy %>% compile(loss = "binary_crossentropy",
                    optimizer = "rmsprop",
                    metrics = c("accuracy"))
  
#Look at the model
summary(mindy)

#train Mindy on the training data
#note: Mindy needs to use the "NN_"-data
mindy_history <- mindy %>% fit(x = NN_traindata_x, 
                               y = NN_traindata_DEATH2YRS,
                               epochs = 30,
                               batch_size = 10)

#measure her performance 
mindy_perf <-  mindy %>% evaluate(NN_testdata_x, 
                                  NN_testdata_DEATH2YRS)


#Make predictions from Mindy (probabilities), look at the first ten
mindy_preds <- mindy %>% predict(NN_testdata_x)
head(mindy_preds, 10)

#Predict labels from Mindy and make a confusion matrix
#Note: because of the one hot deck encoding of the NN_testdata
#we need to pick the second column to get the dummy variable 
#for the "1" label
mindy_predLabels <- mindy %>% predict_classes(NN_testdata_x)
table(mindy_predLabels, NN_testdata_DEATH2YRS[,2])

#Compute AUC for Mindy
#Note: we need to the choose second column of mindy_preds, as these
#are the probabilities of label 1. Similarly, we choose second
#column of NN_testdata_DEATH2YRS because these are indications
#of whether the label is 1. 

library(pROC)

#compute the AUC (area under ROC curve)
mindy_roc <- roc(NN_testdata_DEATH2YRS[,2], mindy_preds[,2])  
mindy_roc

Congratulations! You have now fitted and evaluated your first neural network.

3.1.2. More Mindys

Try running the Mindy code a few more times. It is important that you run all the lines again - otherwise she will remember the weights she computed for your last try and use them as starting values for her next run. How much does the accuracy on the test data vary?

3.1.3. A closer look at Mindy

Run Mindy’s code again, but now use the following code for the fitting:

mindy_history <- mindy %>% fit(x = NN_traindata_x, 
                               y = NN_traindata_DEATH2YRS,
                               epochs = 30,
                               batch_size = 10,
                               validation_data = list(NN_testdata_x,
                                                      NN_testdata_DEATH2YRS))

How does Mindy’s performance on the training and test data compare?
Why do you think there is a difference?

3.2 Adding layers

We will now experiment a bit with adding layers. Using Mindy as a template, you will build a new NN, Brad, that differs from Mindy only in the following aspects:

His layer structure (specified below)
He will use 100 epochs and batch size 10 for training. More epochs means that he gets more tries at changing his weights.

Brad should have the following layers:

An input layer that takes in all the features from the traindata.
A hidden layer with 91 nodes that uses the sigmoid activation function.
A hidden layer with 91 nodes that uses the sigmoid activation function.
A hidden layer with 91 nodes that uses the sigmoid activation function.
A hidden layer with 91 nodes that uses the sigmoid activation function.
A hidden layer with 91 nodes that uses the sigmoid activation function.
An output layer with 2 nodes that uses the softmax activation function.

Here’s what you should do:

Define and train Brad.
Evaluate Brad’s performance and compare it with Mindy’s.

3.3. Freestyle NN building

Build a neural network and see if you can beat Mindy and previous models from this morning in terms of accuracy. You can experiment with the following:

Changing the structure.
- Try different numbers of layers and varying numbers of nodes.
- Try different activation functions (sigmoid, relu, tanh) for the hidden layers.
Changing the batch size.
Changing the number of epochs.
- Choose more or less epochs, see how it impacts performance.
- Try controlling the number of epochs via early stopping. Add the argument below to your fit() call, and see if you can make sense of what it does. Don’t forget to redefine Brad so that you train fresh weights. Try changing the patience and restore_best_weights arguments and see what happens.
```
  # Extra argument to try in fit() call:
  callbacks = callback_early_stopping(patience = 5, restore_best_weights = TRUE)
```