Jupyter Notebooks are a great way to communicate your findings or demonstrating certain concepts or applications in the Data Science and Machine Learning world. While you can easily convert notebooks to static pages using nbconvert, there are challenges integrating them with existing publishing platforms like WordPress.
If you are just starting out or don’t mind migrating to another platform, there are awesome alternatives such as FastPages (Github and demo site) by Jeremy Howard and Hamel Husain.
However, if for various reasons, you want to continue using WordPress, several workarounds do exists, such as these very well-documented posts here by Mike Kale and here by Silver Ringvee. The downside is that they usually involve adding custom PHP functions and editing the CSS.
I came across this neat little WordPress plugin called Documents to Git (Plugin page and GitHub) that can help to convert Markdown or Jupyter Notebooks to blog posts and all it requires is a short code.
Currently, the Jupyter Notebooks need to be in a public repository and it still requires some CSS edits to make changes, it does so with less fuss (See the documentation in the Github Repository or Plugin page).
Below’s a notebook I used when presenting a TensorFlow 2 workshop in January 2020 with default settings. Have a look and try out the plugin!
Shortcode:
[git-github-jupyter url="https://github.com/yoke2/tf2_data_to_app_workshop/blob/master/notebooks/tf2_wksp_data_and_modeling.ipynb"]
Converted Notebook:
Colab Setup¶
Check Colab Accelerator Mode¶
If below command fails in Google Colab, please go to Runtime > Change Runtime Type and Change Hardware Accelerator Type to "GPU"
!nvidia-smi
Install Dependencies and import packages¶
Install tensorflow 2.0.0 and tensorflow-addons 0.6.0. You will need to restart Google Colab runtime as instructed.
Please note that tensorflow-addons supports only MacOS and Linux currently.
!pip install tensorflow-gpu==2.0.0 tensorflow-addons==0.6.0
Install supporting functions for use in the notebook and minimizes clutter
!pip install -U git+https://github.com/yoke2/suptools.git
%load_ext autoreload
%autoreload 2
%matplotlib inline
import gdown
import zipfile
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
print(tf.__version__)
from suptools.core import *
from suptools.imgtools import *
from suptools.tftools import *
Data Collection¶
Download Images¶
This is for demonstration purposes only. Please upload your own files.
#gdown.download('https://drive.google.com/uc?export=download&id=18QDpvCyO2BsX2ILMDWyFubvFzPBLkoAi', output='food_urls_3class.zip', quiet=False)
# with zipfile.ZipFile('food_urls_3class.zip', 'r') as zipObj:
# zipObj.extractall()
Upload your list of files here. Once done, please proceed to download.
results_cr = download_images('chicken_rice.txt', 'data/chicken_rice')
list(filter(lambda x:'Failed' in x, results_cr))
results_mg = download_images('mee_goreng.txt', './data/mee_goreng')
list(filter(lambda x:'Failed' in x, results_mg))
results_rp = download_images('roti_prata.txt', './data/roti_prata')
list(filter(lambda x:'Failed' in x, results_rp))
??download_images
??download_image
Verify Images¶
verify_images(Path('./data'), recurse=True)
??verify_images
??verify_image_tf
Optional: Save images to local¶
Due to large file size, you can download these verified images using croc efficiently.
Windows Install:
Download and unzip Windows binary from the Github release page
Alternative you can install using scoop
$ scoop bucket add schollz-bucket https://github.com/schollz/scoop-bucket.git
$ scoop install croc
Mac Install:
$ brew install schollz/tap/croc
Linux/Windows Subsystem for Linux Install:
$ curl https://getcroc.schollz.com | bash
Command to enter on desktop side:
croc -yes <provided-pass-code>
!zip -r food_data_verified.zip ./data
!curl https://getcroc.schollz.com | bash
!croc send food_data_verified.zip
Optional: Saving to images to Google Cloud Storage or Google Drive¶
Alternatively, you can save the images to Google Cloud Storage or Google Drive.
Save zipped images to Cloud Storage Bucket
Please refer to sample code snippets in Google Colab and modify accordingly. (Code snippets > Saving data with the Cloud Storage Python API)
Save zipped images to Google Drive
Please refer to sample code snippets in Google Colab and modify accordingly. (Code snippets > Saving data to Google Drive)
Optional: Download prepared data¶
In case that you are unable to follow through the previous steps, here is a zip file of images prepared prior.
# Delete any previous data
!rm -rf ./data
gdown.download('https://drive.google.com/uc?export=download&id=1ku2anpAgPkdOchsAeIz-rrFDgdY08BKy', output='food_data_verified.zip', quiet=False)
with zipfile.ZipFile('food_data_verified.zip', 'r') as zipObj:
zipObj.extractall()
Data processing using tf.data¶
datapath=Path('./data')
Note: We have to reuse CLASS_NAMES in the displayed order below later.
CLASS_NAMES = np.array([x.name for x in datapath.glob('*')])
CLASS_NAMES
BATCH_SIZE = 32
IMG_SIZE = 224
Create Train, Valid and Test sets¶
Note: This is done at filepath level
all_files = get_all_files(datapath, recurse=True)
train_filepaths, tmp_filepaths = train_test_split(all_files, valid_pct=0.3, seed=42)
valid_filepaths, test_filepaths = train_test_split(tmp_filepaths, valid_pct=0.5, seed=42)
len(train_filepaths),len(valid_filepaths),len(test_filepaths)
Image Augmentations¶
You can experiment with other image augmentation functions provided in suptools.tftools
train_aug = [random_crop, random_flip, random_rotate]
valid_aug = [central_crop]
aug = (train_aug, valid_aug)
Process data and create tf.data.Dataset¶
train_ds = read_img_dataset([str(x) for x in train_filepaths], CLASS_NAMES=CLASS_NAMES, shuffle_size=1024, img_size=IMG_SIZE, batch_size=BATCH_SIZE, n_parallel=4, augments=aug, mode='train')
valid_ds = read_img_dataset([str(x) for x in valid_filepaths], CLASS_NAMES=CLASS_NAMES, img_size=IMG_SIZE, batch_size=BATCH_SIZE, n_parallel=4, augments=aug, mode='valid')
test_ds = read_img_dataset([str(x) for x in test_filepaths], CLASS_NAMES=CLASS_NAMES, img_size=IMG_SIZE, batch_size=BATCH_SIZE, n_parallel=4, augments=aug, mode='test')
??read_img_dataset
??process_img_path
Observe a single batch across Train, Valid and Test sets¶
show_batch(train_ds,CLASS_NAMES)
show_batch(valid_ds,CLASS_NAMES)