by Ilina Mitra, Jackson Brandberg and Alaena Roberds
In the first post of this series, we introduced the Cassava Leaf Dataset from Kaggle and our goal to use Convolution Neural Nets (CNNs) to classify images of cassava leaves as either healthy or as possessing a certain disease (of five different disease types). By the end of that post, we had loaded our data using the Kaggle API, done exploratory data analysis and created a baseline model which predicted every image as possessing Cassava Mosaic Disease (CMD) — the most populous class in our dataset. This model yielded an accuracy of 61.5%. In this post, we will detail our exploration of more complex CNNs for this classification problem. More specifically, we implemented three different model architectures, comparing their results to our baseline and to each other. In order to implement these models effectively, we had to appropriately preprocess our data. This blog post will cover the preprocessing, the implementation, reasoning, hyperparameter tuning and results of each of our three models and will finish by describing our next steps in this multi-class image classification problem.
In the baseline model, because we were using a majority classifier, we did not have to do any processing of the input data. That is, we could just take the data as a path and still predict the most populous class for each input. Now that we are beginning to build true models, we need to format our images into the correct dimensions. To do this, we looped over every image in the dataset, and used cv2 — a library for reading images — to resize our images through appropriate interpolation. We chose to resize our images to (100, 100, 3). While this is quite small and may lead to distorted images which are difficult for the model to train, more optimal dimensions (such as (224,224,3)) were too difficult for our program to run even in Google Colab Pro. If we continue to achieve low validation accuracy scores, we may revisit our preprocessing of our images to see if doing image windowing or other interpolation methods would appropriately address this.
We then had to address our labels. Our initial dataset assigns a numerical label (one-to-five) for each class. However, our model will need a 5-length vector as an output. Therefore, we used one-hot-encoder with a depth of five in order to return our labels into an appropriate form for the model outputs. Finally, we converted our one-hot labels and our processed images into numpy arrays so that it could be appropriately split by sklearn’s train_test_split function.
With this, our data was ready to run through a model!
For our first model, we wanted to produce something relatively simple and basic to work off as we progressed through the project. As we have learned in class, convolutional neural networks are fundamental building blocks for neural networks that are trained in image classifications. Given this, we thought it would be prudent to implement a model with one convolutional layer would be a good starting point for us. As is evident from the code snippet below, we initiated a convolulational step with a rectified linear activation (ReLU) function. The ReLU function is a default activation function due to the fact that it often achieves comparatively higher accuracy scores than the sigmoid/tanh functions.
After initializing the convolutional layers, we added batch normalization and dropout layers to ensure that the model does not overfit. More specifically, batch normalization is a standardization technique that standardizes the inputs to a network. On the other hand, dropout more wholly prevents neural networks from overfitting by decreasing interdependent learning between neurons by dropping out random individual nodes throughout training.
Somewhat surprisingly, this model took both a long time to run (approximately 3 hours) and did achieve high accuracy (approximately 94%) after ten epochs of training, despite the simplicity. Please see the graph below that shows both the training accuracy and training loss for this first model. We can see that our loss decreases as our epochs increase and the accuracy nears 95% on our training set. This high accuracy is often an indication of overfitting and as such, we ran this model and checked our validation accuracy, which we can see is lower than that of our baseline model. This confirms our prediction of overfitting and is evident in the Loss and Accuracy Plot below as well as the screenshot of our model fit. There are a number of techniques to combat overfitting such as early stopping and dropout, and we plan to implement them moving forward.
The next model we tried building was a GoogleNet CNN. This network uses parallel concatenations in order to achieve better results. The parallel concatenations are achieved through inception blocks (see figure below):
Through this structure, the input is explored through a variety of filter sizes. The filters of different sizes can pick up on different subtleties during training and ultimately lead to more accurate results. Appropriate padding is used to ensure that the output from these different paths matches the input size. Our model summary is outlined below
Each sequential layer consists of these inception blocks described earlier. After fitting this model to our training set, we saw the following results:
As we can see, our validation accuracy remains the same throughout every epoch. This suggests that the model is always predicting the same things for the validation set, despite how it is ‘training.’ Clearly, this is not what we want — while the accuracies of the training and validation set are quite similar (so we are not in the case of overfitting), we want the model to be learning and improving. Therefore, in the next installment of this series, we will play around with the learning rate as well as the architecture of the model itself.
Our third model uses transfer learning, specifically ResNet50, to help train our model. Let’s first begin by briefly covering transfer learning and why we found it applicable in the context of our problem. Transfer learning takes ‘knowledge’ from previously trained machine learning models for similar problems. Rather than starting completely from scratch, we can look at some patterns that other models have learned to recognize. This is obviously very appealing because it can save a lot of energy by freezing layers, which is particularly necessary for us considering we are implementing our models in Google Colab where we do not have lots of disk space or large RAM.
Keras has many source models to select from for transfer learning, however we chose ResNet50 because we felt that it’s roughly the right size and has been widely popular in other similar image classification problems. Additionally, it is a transfer learning model that we had experience using before.
Now that we have selected a base model, we need to build the head architecture which will learn features specific to our cassava leaf problem. We implemented the following head architecture:
Then, we compiled our model by adding the head layers to our base model. We started by freezing all of the base-layers, and when training on our training set, we saw the following results:
We plotted these results as seen below:
Looking at the accuracy and validation accuracy, we can see that the model is underfitting. The accuracy scores are only marginally higher than the baseline accuracy. In order to improve this, we can unfreeze some of the base layers (ResNet50) in order to incorporate some additional training. Additionally, depending on our results we may also want to increase the learning rate, and try ‘adam’ as our optimizer function.
Overall Performance and Next Steps:
As seen from all the results above, our current best model is either the ResNet50 or the GoogleNet. The GoogleNet has slightly higher validation accuracy but because it’s constant, we decided to upload ResNet50 to Kaggle, however we experienced errors when trying to load our saved model to Kaggle. We used the following callback to save our model
and then uploaded and subsequently loaded that model in Kaggle. Nevertheless, we received this error:
Therefore, in the next installment of this series, we will definitely address these errors so that our models can be uploaded and submitted to Kaggle.*
For now, we can track our progress through our validation accuracy scores. We know that we have many things to do to improve our validation accuracy. From our first blog post, we have made major strides in the right direction — we have running CNNs, even though some are underfitting and others are overfitting. We have a clear foundation, and direction, to improve our models through hyperparameter tuning. Many of these specifics were discussed above, and in the third and final installment of this series we will compare the results of our most effective models when adding regularization, changing our learning rates, experimenting with different loss functions, and unfreezing some of the pre-trained layers.
We look forward to seeing you in the final installment of this series!
*We met with T.A. to go over our Kaggle submission coding errors, and the T.A. also couldn’t figure out our issue. He recommended detailing our issues in this blog post.