Detecting Cassava Leaf Disease, Part 3

by Jackson Brandberg, Ilina Mitra, and Alaena Roberds

View our Google Colab notebook!


Adding a VGG model:

Therefore, we imported the VGG16 base model and added three dense layers,with dropout to improve performance. See the implementation of the model below:

First we defined the build_vgg as follows:

Immediately, we were able to break the 70% accuracy threshold we had previously been struggling to achieve. From our original model, we made each of the following changes; we added batch normalization, we removed dropout in the layer where batch normalization was added, we changed our batch size, we changed our learning rate, we tried adding a larger dense layer, and then removing a dense layer. Unfortunately, we remained at a standstill. Each of these changes only narrowly changed (for better or worse) our validation accuracy score.

With little progress and consequently low spirit towards VGG16, we turned our efforts to fine tuning other models.

Fine-Tuning our other models (again, with little success):

While we won’t detail every change we made, and their respective changes to evaluation metrics, the example below with our ResNet50 model was a classic example of the roadblock in which we were situated:

Initially we implemented Flatten() at the start of our head architecture:

This yielded the following results:

We then replaced Flatten() with GlobalAveragePooling2D(), which yielded the following results:

We tried replacing our batch-normalization with dropout, and saw validation accuracy as follows:

At best, our validation accuracy scores from this model tuning were in the mid-60s, and far, far lower at their worst. It became pretty clear that in order to achieve the jump in validation accuracy scores we were seeking, we needed to rethink a fundamental aspect about the way we were approaching this problem.

The break-through!

We switched from our inefficient for-loop method to an image data generator, which preprocesses the images directly as it reads them using the flow feature. This allowed us to resize our images to 224 x 224, achieving a much higher level of granularity and hopefully leading to an increased ability in the models to detect the differences between various diseases.

In tandem with this change, we unfroze our base layers in our VGG16 model. We needed to do this because changing our image resolution alone did not provide a large enough accuracy jump (as seen in the results above), though we still felt confident in the potential of our VGG16 model. Therefore, we unfroze the base layer weights and re-ran our model with our saved weights from the fitting above. At this point, our VGG16 model architecture looked as follows:

Our final model / submission:

We also ran a Tensorboard, which is a useful visualization tool. Here are some of the graphs we found most interesting (note that in the first two plots, the validation scores are measured by the line in blue and training scores are measured by the line in orange):

Graph 1: Accuracy over Epochs

Graph 2: Loss over Epochs

Graph 3: Model Visualization

Lessons learned and takeaways:

Our EDA was critical in understanding that some of the cassava pictures were mislabelled, suggesting that a perfect model is unattainable. Furthemore, it reinforced the fact that not all problems will lend themselves to 100% accuracy. As we learned in class with Sam Watson, what matters is the underlying distribution of the data. The EDA we conducted proved this was the case in the Cassava Leaf Dissection problem.

As discussed above, our breakthrough came from preprocessing using an image data generator. This not only allowed us to run more complex models without crashing our RAM, but also allowed us to increase our accuracy by increasing our input image pixel size. While 100x100 image sizes did give us a fairly solid accuracy (approximately 75%), increasing the pixel size allows for less image distortion. Because there is no consistency of image type in the set (some are of the whole plant while others are close up of specific leaves), a smaller pixel resolution muddies the clarity and decreases predictive power. As such, it was critical that we refined our preprocessing to account for this as we worked through the project.

Potential Improvements:

If given more time, we would also hope to implement TFRecords, TensorFlow’s proprietary binary storage format. When working with large datasets such as this, using a binary storage format can improve both the performance of the import pipeline and consequently, training time. Our hope moving forward is to have a better understanding of TFRecords and use that in tandem with the suggestions mentioned above to build a more robust model.