Dim Sum Classifier – from Data to App part 1

In a typically machine learning lifecycle, we will need to acquire data, process data, train and validate/test models and finally deploy the trained models in applications/services. In this first part of two post, inspired by fast.ai 2019 lesson 2, we shall build a Dim Sum (a Cantonese bite-size style of cuisine with many yummy choices) classifier application by leveraging on Google Images as a data source. Due to the wide variety of choices, we shall focus on 5 common dim sum dishes below, with links for your interest:

The image links are curated with gi2ds (Google Image Search to Dataset), a very convenient JavaScript snippet created by Christoffer Björkskog. Details can be found at the blog link here and on Github.

After using the tool, 5 text files with 200 image links corresponding to each of the dishes are prepared and saved in Google Drive for import into Google Colab, which will be used for processing and training.

Download and verify data from Google Images

We shall proceed to setup the Google Colab environment, import the file containing image links from Google Drive and download the images using the download_images function. Fast.AI library also provides a very handy verify_images function that helps you to check for valid images and prunes off files that cannot be used.

!curl -s https://course.fast.ai/setup/colab | bash
Updating fastai...
Done.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
dirs_dimsum = ['hargow', 'siumai', 'charsiusou', 'cheecheongfun','lobakgo']
files_dimsum = ['urls_hargow200.txt', 'urls_siumai200.txt', 
                'urls_charsiusou200.txt', 'urls_cheecheongfun200.txt',
                'urls_lobakgo200.txt']
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = '/content/gdrive/My Drive/'
base_dir = root_dir + 'fastaiv3/'
from shutil import copy2
path = Path('data/dimsum')
for folder, file in list(zip(dirs_dimsum, files_dimsum)):
    dest = path/folder
    dest.mkdir(parents=True, exist_ok=True)
    copy2(base_dir+'dimsum_class/'+file, dest)
    
path.ls()
[PosixPath('data/dimsum/charsiusou'),
 PosixPath('data/dimsum/siumai'),
 PosixPath('data/dimsum/hargow'),
 PosixPath('data/dimsum/lobakgo'),
 PosixPath('data/dimsum/cheecheongfun')]
path = Path('data/dimsum')
for folder, file in list(zip(dirs_dimsum, files_dimsum)):
    download_images(path/folder/file, path/folder, max_pics=200)
classes = dirs_dimsum
for c in classes:
    print(c)
    verify_images(path/c, delete=True, max_size=200)

Training the model

Once we have our dataset, we will use the ImageDataBunch.from_folder method to load the images from the folder and preview the images.

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train='.', valid_pct=0.2,
                                  ds_tfms=get_transforms(), size=224,
                                 num_workers=4).normalize(imagenet_stats)
data.classes
['charsiusou', 'cheecheongfun', 'hargow', 'lobakgo', 'siumai']
data.show_batch(rows=3, figsize=(7,8))

From the batch, it seems that they are fine. There is one mislabeled char siu sou (叉烧酥) but we leave it for now since a small level of noisy data does not typically affect the model much. We are using transfer learning with the ResNet 34 model.

data.classes, data.c, len(data.train_ds), len(data.valid_ds)
(['charsiusou', 'cheecheongfun', 'hargow', 'lobakgo', 'siumai'], 5, 738, 184)
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /home/yoke/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:03<00:00, 24127371.58it/s]
learn.fit_one_cycle(10)
epoch train_loss valid_loss error_rate time
0 1.892826 1.363298 0.516304 00:07
1 1.473497 0.831715 0.282609 00:07
2 1.158775 0.728580 0.244565 00:07
3 0.939784 0.627291 0.217391 00:07
4 0.785094 0.622544 0.211957 00:07
5 0.670714 0.568230 0.157609 00:07
6 0.598721 0.547424 0.163043 00:07
7 0.527969 0.552321 0.173913 00:07
8 0.476596 0.549852 0.179348 00:07
9 0.443554 0.550588 0.179348 00:07
learn.save('stage-1')

After 10 epochs, we have an error rate (1-Accuracy) of 17.9% and proceed to save the model.

Finetuning the model

We shall now fine tune the model by unfreezing the pre-trained layers for training, along with using the learning rate finder to find optimal learning rates.

learn.unfreeze()
learn.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.recorder.plot()
learn.fit_one_cycle(5, max_lr=slice(5e-4, 1e-3))
epoch train_loss valid_loss error_rate time
0 0.296187 0.695414 0.211957 00:07
1 0.390351 1.626430 0.309783 00:08
2 0.425384 2.619203 0.429348 00:08
3 0.367741 0.679681 0.168478 00:08
4 0.326681 0.530172 0.146739 00:08

After fine-tuning, the error rate drops to 14.6%. We shall save this model and use ClassificationInterpretation to examine the models top losses and confusion matrix.

learn.save('stage-2')
interpret = ClassificationInterpretation.from_learner(learn)
interpret.plot_confusion_matrix()
interpret.plot_top_losses(9, figsize=(15,11))

From the top losses, it seems that there are several images that are composite pictures that confuses the model, since we are only predicting one class per picture. We can use the ImageCleaner widget to view and cleanse the dataset of unwanted pictures.

Unfortunately, Google Colab does not support ipywidgets and hence we need to run some portions of the notebook on a local runtime, which is described in the follow section. We shall zip the images and saved models for download.

!zip -r download_colab.zip /content/data/dimsum

Run in local runtime for ImageCleaner

This section to be run only locally aims to prune misleading data and labels.

Please note that DatasetFormatter does not differentiate train/validation set, hence we need to load the images using DataBlock API with explicit command to not split into training and validation set.

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
#local setup rerun
path = Path('content/data/dimsum')
np.random.seed(42)
db = (ImageList.from_folder(path)
                   .split_none()
                   .label_from_folder()
                   .transform(get_transforms(), size=224)
                   .databunch()
     )
db.show_batch(rows=3, figsize=(7,8))
learn_cleaning = cnn_learner(db, models.resnet34, metrics=error_rate)
learn_cleaning.load('stage-2')
from fastai.widgets import *
ds, idxs = DatasetFormatter().from_toplosses(learn_cleaning) 
ImageCleaner(ds, idxs, path)
HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…



Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())





<fastai.widgets.image_cleaner.ImageCleaner at 0x7f7590297860>

Below is a screenshot of the widget when running locally.

From the cleaning exercise, quite a few images have been cleaned up. Typically invalid images are:

  • Expected char siu sou (叉烧酥) but images are char siu bao (叉烧包), another dim sum type or the char siu roasted meat (叉烧) itself
  • composite images with irrelevant images
  • Animated images

Training the model, Round 2

After running the image cleaner, a cleaned.csv will be generated. Upload this file to Google Colab, reload and then retrain the model. Please note that no images have been deleted, hence we need to reload the cleaned data using the cleaned.csv file as reference.

!mv /content/cleaned.csv /content/data/dimsum
np.random.seed(42)

data2 = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, 
                                csv_labels='cleaned.csv', 
                                ds_tfms=get_transforms(), 
                                size=224, 
                                num_workers=4).normalize(imagenet_stats)
data2.classes, data2.c, len(data2.train_ds), len(data2.valid_ds)
(['charsiusou', 'cheecheongfun', 'hargow', 'lobakgo', 'siumai'], 5, 472, 118)
learn_cleaned = cnn_learner(data2, models.resnet34, metrics=error_rate)
learn_cleaned.fit_one_cycle(10)
epoch train_loss valid_loss error_rate time
0 1.958007 1.473040 0.644068 00:05
1 1.535733 0.663816 0.161017 00:05
2 1.177153 0.491802 0.152542 00:05
3 0.915417 0.515325 0.152542 00:05
4 0.747979 0.513231 0.169492 00:05
5 0.633824 0.520064 0.169492 00:05
6 0.544484 0.516060 0.169492 00:05
7 0.475155 0.514972 0.169492 00:05
8 0.413666 0.512496 0.169492 00:05
9 0.370536 0.515782 0.169492 00:05

After 10 epochs, the error rate dropped by 1% as compared to the first part of training before finetuning in round 1.

Finetuning the model – Round 2

We then proceed with finetuning the model in round 2.

learn_cleaned.unfreeze()
learn_cleaned.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn_cleaned.recorder.plot()
learn_cleaned.fit_one_cycle(5, max_lr=slice(1e-4, 1e-3))
epoch train_loss valid_loss error_rate time
0 0.150190 0.461228 0.169492 00:05
1 0.122265 0.665217 0.177966 00:05
2 0.118568 0.567386 0.152542 00:05
3 0.104063 0.560797 0.127119 00:05
4 0.088834 0.486471 0.127119 00:05

After finetuning, the error rate is now at 12.7% as compared to that of 14.6% before cleaning. It is also noted that validation loss much higher than training loss – likely to improve with more data.

We can examine the model with ClassificationIntepretation again.

learn_cleaned.save('stage-3')
interpret_cleaned = ClassificationInterpretation.from_learner(learn_cleaned)
interpret_cleaned.plot_confusion_matrix()
interpret_cleaned.most_confused()
[('siumai', 'cheecheongfun', 4),
 ('lobakgo', 'cheecheongfun', 3),
 ('charsiusou', 'cheecheongfun', 2),
 ('cheecheongfun', 'hargow', 2),
 ('lobakgo', 'siumai', 2),
 ('hargow', 'cheecheongfun', 1),
 ('siumai', 'charsiusou', 1)]
interpret_cleaned.plot_top_losses(9, figsize=(15,11))

Try with ResNet 50

We can also attempt with a larger model like ResNet 50.

learn_cleaned50 = cnn_learner(data2, models.resnet50, metrics=error_rate)
learn_cleaned50.fit_one_cycle(10)
epoch train_loss valid_loss error_rate time
0 1.655776 1.074644 0.423729 00:07
1 1.060038 0.426286 0.144068 00:06
2 0.765334 0.462426 0.161017 00:06
3 0.602831 0.449330 0.144068 00:06
4 0.478973 0.428820 0.127119 00:06
5 0.397765 0.433457 0.118644 00:06
6 0.332555 0.437746 0.127119 00:06
7 0.288448 0.437288 0.135593 00:06
8 0.253740 0.432407 0.135593 00:06
9 0.220242 0.432829 0.135593 00:06
learn_cleaned50.unfreeze()
learn_cleaned50.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn_cleaned50.recorder.plot()
learn_cleaned50.fit_one_cycle(5, max_lr=slice(1e-4, 5e-3))
epoch train_loss valid_loss error_rate time
0 0.109317 0.682296 0.169492 00:07
1 0.277890 2.939289 0.262712 00:07
2 0.331545 2.263224 0.245763 00:08
3 0.309646 0.597230 0.127119 00:08
4 0.264014 0.429430 0.127119 00:07

After training and finetuning, the same error rate, which isn’t much of a surprise but validation loss is smaller and smaller difference between train and validation loss.

learn_cleaned50.export('export.pkl')
interpret_cleaned50 = ClassificationInterpretation.from_learner(learn_cleaned50)
interpret_cleaned50.plot_confusion_matrix()
interpret_cleaned50.most_confused()
[('siumai', 'hargow', 3),
 ('siumai', 'lobakgo', 3),
 ('cheecheongfun', 'hargow', 1),
 ('cheecheongfun', 'lobakgo', 1),
 ('cheecheongfun', 'siumai', 1),
 ('hargow', 'charsiusou', 1),
 ('hargow', 'cheecheongfun', 1),
 ('lobakgo', 'charsiusou', 1),
 ('lobakgo', 'cheecheongfun', 1),
 ('siumai', 'charsiusou', 1),
 ('siumai', 'cheecheongfun', 1)]

Takeaways

In this post we created an image classifier for dim sums using google images as data source. We have also demonstrated transfer learning, ImageCleaner widget and model export using the fast.ai library. We shall be using the exported model for deployment in a web application in our next and final part – part 2.

The corresponding notebook can be found here for your review in Google Colab. One thing to note is that the image data acquired might not be fully reproducible since some links might expire.

Rock, paper, scissors – vision transfer learning with fast.ai

In the previous post, we used the Rock, Paper Scissors notebook that trained a custom image classification model from scratch.

While the notebook is demonstrates building custom layers, for such a task, we can also leverage on Transfer Learning using models trained on similar image classification tasks that can often reduce time in training and experimentation and yet achieve results fairly good results, which will be shown here using the fastai v1 library as demonstrated by Jeremy Howard in his awesome Practical Deep Learning for Coders 2019 course.

For such a task, we can also leverage on Transfer Learning using models trained on similar image classification tasks that can often reduce time in training and experimentation and yet achieve results fairly good results, which will be shown here using the fastai v1 library as demonstrated by Jeremy Howard in his awesome Practical Deep Learning for Coders 2019 course.

As per last post, we start with the same dataset, using Google Colab. We shall also be updating the fastai library version in Colab and then importing the fast.ai vision module and the accuracy metric.

Setup, load and explore

!curl -s https://course.fast.ai/setup/colab | bash
Updating fastai...
Done.
!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps.zip \
    -O /tmp/rps.zip
  
!wget --no-check-certificate \
    https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps-test-set.zip \
    -O /tmp/rps-test-set.zip
--2019-07-11 14:30:57--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.76.128, 2a00:1450:400c:c07::80
Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.76.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 200682221 (191M) [application/zip]
Saving to: ‘/tmp/rps.zip’

/tmp/rps.zip        100%[===================>] 191.38M   125MB/s    in 1.5s    

2019-07-11 14:30:59 (125 MB/s) - ‘/tmp/rps.zip’ saved [200682221/200682221]

--2019-07-11 14:31:00--  https://storage.googleapis.com/laurencemoroney-blog.appspot.com/rps-test-set.zip
Resolving storage.googleapis.com (storage.googleapis.com)... 173.194.76.128, 2a00:1450:400c:c07::80
Connecting to storage.googleapis.com (storage.googleapis.com)|173.194.76.128|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29516758 (28M) [application/zip]
Saving to: ‘/tmp/rps-test-set.zip’

/tmp/rps-test-set.z 100%[===================>]  28.15M  93.4MB/s    in 0.3s    

2019-07-11 14:31:01 (93.4 MB/s) - ‘/tmp/rps-test-set.zip’ saved [29516758/29516758]
import os
import zipfile

local_zip = '/tmp/rps.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/')
zip_ref.close()

local_zip = '/tmp/rps-test-set.zip'
zip_ref = zipfile.ZipFile(local_zip, 'r')
zip_ref.extractall('/tmp/')
zip_ref.close()
!mv /tmp/rps /tmp/train
!mv /tmp/rps-test-set/ /tmp/valid
from fastai.vision import *
from fastai.metrics import error_rate

We shall then set the batch size (bs) to 64 and load the data using the ImageDataBunch.from_folder method. This method helps to easily load training and validation datasets that are stored in separate sub-folders. This method also performs data augmentation as defined by the get_transforms() function. Last but not least, since we are using a pre-trained model, we need to normalize the data using imagenet_stats that is used by ResNet34 model (shown later). We shall then preview a small batch of the data and the data classes.

bs = 64
data = ImageDataBunch.from_folder('/tmp', ds_tfms=get_transforms(), size=224, bs=bs).normalize(imagenet_stats)
data.show_batch(rows=3, figsize=(7,6))
print(data.classes)
len(data.classes),data.c
['paper', 'rock', 'scissors']

(3, 3)

Training the model

We shall now call the cnn_learner method that downloads the specified pre-trained model (ResNet 34) upon first-time use and trains shows the accuracy metric. You can also see the structure of ResNet 34 using learn.model. Finally, we kick start the learning process using learn.fit_one_cycle for 4 epochs using the one cycle policy that enables training with very high learning rates.

learn = cnn_learner(data, models.resnet34, metrics=accuracy)
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /root/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:00<00:00, 113309789.33it/s]
learn.model
Sequential(
  (0): Sequential(
    (0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace)
    (3): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
    (4): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (1): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (5): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (6): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (3): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (4): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (5): BasicBlock(
        (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (7): Sequential(
      (0): BasicBlock(
        (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (downsample): Sequential(
          (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
          (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        )
      )
      (1): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
      (2): BasicBlock(
        (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (relu): ReLU(inplace)
        (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
        (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
  )
  (1): Sequential(
    (0): AdaptiveConcatPool2d(
      (ap): AdaptiveAvgPool2d(output_size=1)
      (mp): AdaptiveMaxPool2d(output_size=1)
    )
    (1): Flatten()
    (2): BatchNorm1d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (3): Dropout(p=0.25)
    (4): Linear(in_features=1024, out_features=512, bias=True)
    (5): ReLU(inplace)
    (6): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): Dropout(p=0.5)
    (8): Linear(in_features=512, out_features=3, bias=True)
  )
)
learn.fit_one_cycle(4)
epoch train_loss valid_loss accuracy time
0 0.408251 0.289994 0.879032 00:33
1 0.151992 0.190768 0.932796 00:32
2 0.078730 0.166957 0.935484 00:32
3 0.044160 0.151344 0.946237 00:32

In just 4 epochs, we can reach an accuracy of 94% as compared to the 25 epochs used in the previous notebook! We shall now have a closer look at the results using the ClassificationInterpretation method. This enable us to look at the top losses, most confused labels and the confusion matrix.

interp = ClassificationInterpretation.from_learner(learn)

losses,idxs = interp.top_losses()

len(data.valid_ds)==len(losses)==len(idxs)
True
interp.plot_top_losses(9, figsize=(15,11))

interp.plot_confusion_matrix(figsize=(12,12), dpi=60)
interp.most_confused(min_val=2)
[('scissors', 'paper', 11), ('paper', 'scissors', 6), ('paper', 'rock', 3)]

Finetuning the model

By default, the learner created have the pre-trained model weights frozen (not changed during the 4 epochs of training) and only trains the additional neural network layer added by the cnn_learner method. We can unfreeze the pre-trained model weights for training.

learn.unfreeze()
learn.fit_one_cycle(1)
epoch train_loss valid_loss accuracy time
0 0.042922 0.574411 0.862903 00:33

It turns out that just continuing training the the model after unfreezing lowers the accuracy! To ensure that we can continue to train using optimal learning rates, we use the learn.lr_find method to find them before training again.

learn.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.recorder.plot()

From the learning rate plot above, it is determined that a rate between 5e-4 and 3e-3 has the "steepest" slope, indicating a good range for max learning rates.

learn.fit_one_cycle(2, max_lr=slice(5e-4,3e-3))
epoch train_loss valid_loss accuracy time
0 0.112085 0.033667 0.981183 00:33
1 0.039996 0.010633 1.000000 00:34
This time, we reached an accuracy of 100%, with the confusion matrix as below.
interp2 = ClassificationInterpretation.from_learner(learn)

losses2,idxs2 = interp2.top_losses()

len(data.valid_ds)==len(losses2)==len(idxs2)
True
interp2.plot_confusion_matrix(figsize=(12,12), dpi=60)

Hence with transfer learning and utilizing methods like one-cycle policy and learning rate finder, we can train the model with less time (less epochs) and leverage on established pre-trained models (ResNet 34) rather than training a model from scratch.

You can find the corresponding notebook for this post here.