Detecting Cassava Leaf Disease, Part 2

by Ilina Mitra, Jackson Brandberg and Alaena Roberds


In the first post of this series, we introduced the Cassava Leaf Dataset from Kaggle and our goal to use Convolution Neural Nets (CNNs) to classify images of cassava leaves as either healthy or as possessing a certain disease (of five different disease types). By the end of that post, we had loaded our data using the Kaggle API, done exploratory data analysis and created a baseline model which predicted every image as possessing Cassava Mosaic Disease (CMD) — the most populous class in our dataset. This model yielded an accuracy of 61.5%. In this post, we will detail our exploration of more complex CNNs for this classification problem. More specifically, we implemented three different model architectures, comparing their results to our baseline and to each other. In order to implement these models effectively, we had to appropriately preprocess our data. This blog post will cover the preprocessing, the implementation, reasoning, hyperparameter tuning and results of each of our three models and will finish by describing our next steps in this multi-class image classification problem.


In the baseline model, because we were using a majority classifier, we did not have to do any processing of the input data. That is, we could just take the data as a path and still predict the most populous class for each input. Now that we are beginning to build true models, we need to format our images into the correct dimensions. To do this, we looped over every image in the dataset, and used cv2 — a library for reading images — to resize our images through appropriate interpolation. We chose to resize our images to (100, 100, 3). While this is quite small and may lead to distorted images which are difficult for the model to train, more optimal dimensions (such as (224,224,3)) were too difficult for our program to run even in Google Colab Pro. If we continue to achieve low validation accuracy scores, we may revisit our preprocessing of our images to see if doing image windowing or other interpolation methods would appropriately address this.

Model 1:

For our first model, we wanted to produce something relatively simple and basic to work off as we progressed through the project. As we have learned in class, convolutional neural networks are fundamental building blocks for neural networks that are trained in image classifications. Given this, we thought it would be prudent to implement a model with one convolutional layer would be a good starting point for us. As is evident from the code snippet below, we initiated a convolulational step with a rectified linear activation (ReLU) function. The ReLU function is a default activation function due to the fact that it often achieves comparatively higher accuracy scores than the sigmoid/tanh functions.

Model 2:

The next model we tried building was a GoogleNet CNN. This network uses parallel concatenations in order to achieve better results. The parallel concatenations are achieved through inception blocks (see figure below):

Model 3:

Our third model uses transfer learning, specifically ResNet50, to help train our model. Let’s first begin by briefly covering transfer learning and why we found it applicable in the context of our problem. Transfer learning takes ‘knowledge’ from previously trained machine learning models for similar problems. Rather than starting completely from scratch, we can look at some patterns that other models have learned to recognize. This is obviously very appealing because it can save a lot of energy by freezing layers, which is particularly necessary for us considering we are implementing our models in Google Colab where we do not have lots of disk space or large RAM.

Overall Performance and Next Steps:

As seen from all the results above, our current best model is either the ResNet50 or the GoogleNet. The GoogleNet has slightly higher validation accuracy but because it’s constant, we decided to upload ResNet50 to Kaggle, however we experienced errors when trying to load our saved model to Kaggle. We used the following callback to save our model