The goals of this exercise session is to:
Don’t forget to add your results to the score board!
We will use Brad from exercise 3.2. as an starting point to see how dropout may help us in avoiding overfitting. If you didn’t get to defining him, you can use this code as a starting point for Brad:
#define Brad and compile him
brad <- keras_model_sequential()
#Build model structure
brad %>%
layer_dense(units = 91, activation = 'sigmoid',
input_shape = 91) %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dense(units = 2, activation = "softmax")
#Compile: choose settings for how he will be trained
brad %>% compile(
loss = "binary_crossentropy",
optimizer = "rmsprop",
metrics = c("accuracy")
)
Note that you can insert a dropout layer between any two layers like this:
model %>% layer_dense(units = 10, activation = "sigmoid", input_shape = 20) %>%
layer_dropout(0.3) %>%
layer_dense(units = 2, activiation = "softmax")
This means that for each node in the first (and only) hidden layer, there is a 30% chance that all its outgoing weights will be set to zero.
Between each pair of layers of Brad, introduce a dropout layer with 15% dropout. Compare the performance of this model with the performance of Brad without dropout.
We will now systematically compare different models in order to tune the dropout ratio parameter. We will look at the following possible values for the dropout rate: \(\phi \in \{0, 0.05, 0.10, ..., 0.90, 0.95\}\)
A first idea here could be to simply run Brad with all these different choices of dropout rate and then choose the dropout rate that results in the largest accuracy on the test data. But then we would be using the testdata to make model decisions - i.e. we would learn from the testdata - and thus we would be breaking the 2 rules of machine learning (and very likely to overfit to the testdata).
Instead, we will split our training data in two and measure the performance on this “new” testdata. This can be done by using the validation_split
argument in the fit()
function:
brad2_history <- brad2 %>% fit(x = NN_traindata_x,
y = NN_traindata_DEATH2YRS,
epochs = 20,
batch_size = 10,
validation_split = 0.2)
Setting validation_split = 0.2
means that only the first 80% of the data (i.e. observations number 1 to 962) are used in training, while the remaining 20% are only used for testing.
#initilize two vectors where we will store the accuracies and the
#drop out rates
accuracies <- numeric(20)
dops <- seq(0, 0.95, 0.05)
#for-loop: inside the brackets {}, everything is repeated 20 times
#first time, i is set to 1. Second time i is set to 2 , ...
for (i in 1:20) {
#choose the ith value of dropout rates
dop <- dops[i]
#build a Brad with this dropout rate
thisBrad <- keras_model_sequential()
thisBrad %>%
layer_dense(units = 91, activation = 'sigmoid',
input_shape = 91) %>%
layer_dropout(dop) %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dropout(dop) %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dropout(dop) %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dropout(dop) %>%
layer_dense(units = 91, activation = 'sigmoid') %>%
layer_dropout(dop) %>%
layer_dense(units = 2, activation = "softmax")
thisBrad %>% compile(
loss = "binary_crossentropy",
optimizer = "rmsprop",
metrics = c("accuracy")
)
#train Brad on the training data
#note: verbose = 0 turns off information being printed and plots
#being made.
thisBrad_history <- thisBrad %>% fit(x = NN_traindata_x,
y = NN_traindata_DEATH2YRS,
epochs = 20,
batch_size = 10,
validation_split = 0.2,
verbose = 0)
#choose validation accuracy from Brads 20th (i.e. last) epoch and save it
accuracies[i] <- thisBrad_history$metrics$val_acc[20]
#print result to the screen
print(paste("Dropout rate ", dop, "resulted in an accuracy of ", accuracies[i]))
}
Here is your last chance to build the best possible NN you can. Try out ideas that you have gotten throughout the day and see if you can beat your previous best model. No rules here, except that you have to leave the test data alone while choosing how to build your model.