Dim Sum Classifier – from Data to App part 1

In a typically machine learning lifecycle, we will need to acquire data, process data, train and validate/test models and finally deploy the trained models in applications/services. In this first part of two post, inspired by fast.ai 2019 lesson 2, we shall build a Dim Sum (a Cantonese bite-size style of cuisine with many yummy choices) classifier application by leveraging on Google Images as a data source. Due to the wide variety of choices, we shall focus on 5 common dim sum dishes below, with links for your interest:

The image links are curated with gi2ds (Google Image Search to Dataset), a very convenient JavaScript snippet created by Christoffer Björkskog. Details can be found at the blog link here and on Github.

After using the tool, 5 text files with 200 image links corresponding to each of the dishes are prepared and saved in Google Drive for import into Google Colab, which will be used for processing and training.

Download and verify data from Google Images

We shall proceed to setup the Google Colab environment, import the file containing image links from Google Drive and download the images using the download_images function. Fast.AI library also provides a very handy verify_images function that helps you to check for valid images and prunes off files that cannot be used.

!curl -s https://course.fast.ai/setup/colab | bash
Updating fastai...
Done.
%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
dirs_dimsum = ['hargow', 'siumai', 'charsiusou', 'cheecheongfun','lobakgo']
files_dimsum = ['urls_hargow200.txt', 'urls_siumai200.txt', 
                'urls_charsiusou200.txt', 'urls_cheecheongfun200.txt',
                'urls_lobakgo200.txt']
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = '/content/gdrive/My Drive/'
base_dir = root_dir + 'fastaiv3/'
from shutil import copy2
path = Path('data/dimsum')
for folder, file in list(zip(dirs_dimsum, files_dimsum)):
    dest = path/folder
    dest.mkdir(parents=True, exist_ok=True)
    copy2(base_dir+'dimsum_class/'+file, dest)
    
path.ls()
[PosixPath('data/dimsum/charsiusou'),
 PosixPath('data/dimsum/siumai'),
 PosixPath('data/dimsum/hargow'),
 PosixPath('data/dimsum/lobakgo'),
 PosixPath('data/dimsum/cheecheongfun')]
path = Path('data/dimsum')
for folder, file in list(zip(dirs_dimsum, files_dimsum)):
    download_images(path/folder/file, path/folder, max_pics=200)
classes = dirs_dimsum
for c in classes:
    print(c)
    verify_images(path/c, delete=True, max_size=200)

Training the model

Once we have our dataset, we will use the ImageDataBunch.from_folder method to load the images from the folder and preview the images.

np.random.seed(42)
data = ImageDataBunch.from_folder(path, train='.', valid_pct=0.2,
                                  ds_tfms=get_transforms(), size=224,
                                 num_workers=4).normalize(imagenet_stats)
data.classes
['charsiusou', 'cheecheongfun', 'hargow', 'lobakgo', 'siumai']
data.show_batch(rows=3, figsize=(7,8))

From the batch, it seems that they are fine. There is one mislabeled char siu sou (叉烧酥) but we leave it for now since a small level of noisy data does not typically affect the model much. We are using transfer learning with the ResNet 34 model.

data.classes, data.c, len(data.train_ds), len(data.valid_ds)
(['charsiusou', 'cheecheongfun', 'hargow', 'lobakgo', 'siumai'], 5, 738, 184)
learn = cnn_learner(data, models.resnet34, metrics=error_rate)
Downloading: "https://download.pytorch.org/models/resnet34-333f7ec4.pth" to /home/yoke/.cache/torch/checkpoints/resnet34-333f7ec4.pth
100%|██████████| 87306240/87306240 [00:03<00:00, 24127371.58it/s]
learn.fit_one_cycle(10)
epoch train_loss valid_loss error_rate time
0 1.892826 1.363298 0.516304 00:07
1 1.473497 0.831715 0.282609 00:07
2 1.158775 0.728580 0.244565 00:07
3 0.939784 0.627291 0.217391 00:07
4 0.785094 0.622544 0.211957 00:07
5 0.670714 0.568230 0.157609 00:07
6 0.598721 0.547424 0.163043 00:07
7 0.527969 0.552321 0.173913 00:07
8 0.476596 0.549852 0.179348 00:07
9 0.443554 0.550588 0.179348 00:07
learn.save('stage-1')

After 10 epochs, we have an error rate (1-Accuracy) of 17.9% and proceed to save the model.

Finetuning the model

We shall now fine tune the model by unfreezing the pre-trained layers for training, along with using the learning rate finder to find optimal learning rates.

learn.unfreeze()
learn.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn.recorder.plot()
learn.fit_one_cycle(5, max_lr=slice(5e-4, 1e-3))
epoch train_loss valid_loss error_rate time
0 0.296187 0.695414 0.211957 00:07
1 0.390351 1.626430 0.309783 00:08
2 0.425384 2.619203 0.429348 00:08
3 0.367741 0.679681 0.168478 00:08
4 0.326681 0.530172 0.146739 00:08

After fine-tuning, the error rate drops to 14.6%. We shall save this model and use ClassificationInterpretation to examine the models top losses and confusion matrix.

learn.save('stage-2')
interpret = ClassificationInterpretation.from_learner(learn)
interpret.plot_confusion_matrix()
interpret.plot_top_losses(9, figsize=(15,11))

From the top losses, it seems that there are several images that are composite pictures that confuses the model, since we are only predicting one class per picture. We can use the ImageCleaner widget to view and cleanse the dataset of unwanted pictures.

Unfortunately, Google Colab does not support ipywidgets and hence we need to run some portions of the notebook on a local runtime, which is described in the follow section. We shall zip the images and saved models for download.

!zip -r download_colab.zip /content/data/dimsum

Run in local runtime for ImageCleaner

This section to be run only locally aims to prune misleading data and labels.

Please note that DatasetFormatter does not differentiate train/validation set, hence we need to load the images using DataBlock API with explicit command to not split into training and validation set.

%reload_ext autoreload
%autoreload 2
%matplotlib inline
from fastai.vision import *
#local setup rerun
path = Path('content/data/dimsum')
np.random.seed(42)
db = (ImageList.from_folder(path)
                   .split_none()
                   .label_from_folder()
                   .transform(get_transforms(), size=224)
                   .databunch()
     )
db.show_batch(rows=3, figsize=(7,8))
learn_cleaning = cnn_learner(db, models.resnet34, metrics=error_rate)
learn_cleaning.load('stage-2')
from fastai.widgets import *
ds, idxs = DatasetFormatter().from_toplosses(learn_cleaning) 
ImageCleaner(ds, idxs, path)
HBox(children=(VBox(children=(Image(value=b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x01\x00d\x00d\x00\x00\xff…



Button(button_style='primary', description='Next Batch', layout=Layout(width='auto'), style=ButtonStyle())





<fastai.widgets.image_cleaner.ImageCleaner at 0x7f7590297860>

Below is a screenshot of the widget when running locally.

From the cleaning exercise, quite a few images have been cleaned up. Typically invalid images are:

  • Expected char siu sou (叉烧酥) but images are char siu bao (叉烧包), another dim sum type or the char siu roasted meat (叉烧) itself
  • composite images with irrelevant images
  • Animated images

Training the model, Round 2

After running the image cleaner, a cleaned.csv will be generated. Upload this file to Google Colab, reload and then retrain the model. Please note that no images have been deleted, hence we need to reload the cleaned data using the cleaned.csv file as reference.

!mv /content/cleaned.csv /content/data/dimsum
np.random.seed(42)

data2 = ImageDataBunch.from_csv(path, folder=".", valid_pct=0.2, 
                                csv_labels='cleaned.csv', 
                                ds_tfms=get_transforms(), 
                                size=224, 
                                num_workers=4).normalize(imagenet_stats)
data2.classes, data2.c, len(data2.train_ds), len(data2.valid_ds)
(['charsiusou', 'cheecheongfun', 'hargow', 'lobakgo', 'siumai'], 5, 472, 118)
learn_cleaned = cnn_learner(data2, models.resnet34, metrics=error_rate)
learn_cleaned.fit_one_cycle(10)
epoch train_loss valid_loss error_rate time
0 1.958007 1.473040 0.644068 00:05
1 1.535733 0.663816 0.161017 00:05
2 1.177153 0.491802 0.152542 00:05
3 0.915417 0.515325 0.152542 00:05
4 0.747979 0.513231 0.169492 00:05
5 0.633824 0.520064 0.169492 00:05
6 0.544484 0.516060 0.169492 00:05
7 0.475155 0.514972 0.169492 00:05
8 0.413666 0.512496 0.169492 00:05
9 0.370536 0.515782 0.169492 00:05

After 10 epochs, the error rate dropped by 1% as compared to the first part of training before finetuning in round 1.

Finetuning the model – Round 2

We then proceed with finetuning the model in round 2.

learn_cleaned.unfreeze()
learn_cleaned.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn_cleaned.recorder.plot()
learn_cleaned.fit_one_cycle(5, max_lr=slice(1e-4, 1e-3))
epoch train_loss valid_loss error_rate time
0 0.150190 0.461228 0.169492 00:05
1 0.122265 0.665217 0.177966 00:05
2 0.118568 0.567386 0.152542 00:05
3 0.104063 0.560797 0.127119 00:05
4 0.088834 0.486471 0.127119 00:05

After finetuning, the error rate is now at 12.7% as compared to that of 14.6% before cleaning. It is also noted that validation loss much higher than training loss – likely to improve with more data.

We can examine the model with ClassificationIntepretation again.

learn_cleaned.save('stage-3')
interpret_cleaned = ClassificationInterpretation.from_learner(learn_cleaned)
interpret_cleaned.plot_confusion_matrix()
interpret_cleaned.most_confused()
[('siumai', 'cheecheongfun', 4),
 ('lobakgo', 'cheecheongfun', 3),
 ('charsiusou', 'cheecheongfun', 2),
 ('cheecheongfun', 'hargow', 2),
 ('lobakgo', 'siumai', 2),
 ('hargow', 'cheecheongfun', 1),
 ('siumai', 'charsiusou', 1)]
interpret_cleaned.plot_top_losses(9, figsize=(15,11))

Try with ResNet 50

We can also attempt with a larger model like ResNet 50.

learn_cleaned50 = cnn_learner(data2, models.resnet50, metrics=error_rate)
learn_cleaned50.fit_one_cycle(10)
epoch train_loss valid_loss error_rate time
0 1.655776 1.074644 0.423729 00:07
1 1.060038 0.426286 0.144068 00:06
2 0.765334 0.462426 0.161017 00:06
3 0.602831 0.449330 0.144068 00:06
4 0.478973 0.428820 0.127119 00:06
5 0.397765 0.433457 0.118644 00:06
6 0.332555 0.437746 0.127119 00:06
7 0.288448 0.437288 0.135593 00:06
8 0.253740 0.432407 0.135593 00:06
9 0.220242 0.432829 0.135593 00:06
learn_cleaned50.unfreeze()
learn_cleaned50.lr_find()
LR Finder is complete, type {learner_name}.recorder.plot() to see the graph.
learn_cleaned50.recorder.plot()
learn_cleaned50.fit_one_cycle(5, max_lr=slice(1e-4, 5e-3))
epoch train_loss valid_loss error_rate time
0 0.109317 0.682296 0.169492 00:07
1 0.277890 2.939289 0.262712 00:07
2 0.331545 2.263224 0.245763 00:08
3 0.309646 0.597230 0.127119 00:08
4 0.264014 0.429430 0.127119 00:07

After training and finetuning, the same error rate, which isn’t much of a surprise but validation loss is smaller and smaller difference between train and validation loss.

learn_cleaned50.export('export.pkl')
interpret_cleaned50 = ClassificationInterpretation.from_learner(learn_cleaned50)
interpret_cleaned50.plot_confusion_matrix()
interpret_cleaned50.most_confused()
[('siumai', 'hargow', 3),
 ('siumai', 'lobakgo', 3),
 ('cheecheongfun', 'hargow', 1),
 ('cheecheongfun', 'lobakgo', 1),
 ('cheecheongfun', 'siumai', 1),
 ('hargow', 'charsiusou', 1),
 ('hargow', 'cheecheongfun', 1),
 ('lobakgo', 'charsiusou', 1),
 ('lobakgo', 'cheecheongfun', 1),
 ('siumai', 'charsiusou', 1),
 ('siumai', 'cheecheongfun', 1)]

Takeaways

In this post we created an image classifier for dim sums using google images as data source. We have also demonstrated transfer learning, ImageCleaner widget and model export using the fast.ai library. We shall be using the exported model for deployment in a web application in our next and final part – part 2.

The corresponding notebook can be found here for your review in Google Colab. One thing to note is that the image data acquired might not be fully reproducible since some links might expire.