Post Jupyter Notebooks to WordPress using plugin Documents from Git

< 1 min. read

Jupyter Notebooks are a great way to communicate your findings or demonstrating certain concepts or applications in the Data Science and Machine Learning world. While you can easily convert notebooks to static pages using nbconvert, there are challenges integrating them with existing publishing platforms like WordPress.

If you are just starting out or don’t mind migrating to another platform, there are awesome alternatives such as FastPages (Github and demo site) by Jeremy Howard and Hamel Husain.

However, if for various reasons, you want to continue using WordPress, several workarounds do exists, such as these very well-documented posts here by Mike Kale and here by Silver Ringvee. The downside is that they usually involve adding custom PHP functions and editing the CSS.

I came across this neat little WordPress plugin called Documents to Git (Plugin page and GitHub) that can help to convert Markdown or Jupyter Notebooks to blog posts and all it requires is a short code.

Currently, the Jupyter Notebooks need to be in a public repository and it still requires some CSS edits to make changes, it does so with less fuss (See the documentation in the Github Repository or Plugin page).

Below’s a notebook I used when presenting a TensorFlow 2 workshop in January 2020 with default settings. Have a look and try out the plugin!

Shortcode:

[git-github-jupyter url="https://github.com/yoke2/tf2_data_to_app_workshop/blob/master/notebooks/tf2_wksp_data_and_modeling.ipynb"]

Converted Notebook:

Colab Setup

Check Colab Accelerator Mode

If below command fails in Google Colab, please go to Runtime > Change Runtime Type and Change Hardware Accelerator Type to "GPU"

In [0]:
!nvidia-smi
Mon Dec 30 16:58:27 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   36C    P8     7W /  75W |      0MiB /  7611MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Install Dependencies and import packages

Install tensorflow 2.0.0 and tensorflow-addons 0.6.0. You will need to restart Google Colab runtime as instructed.

Please note that tensorflow-addons supports only MacOS and Linux currently.

In [0]:
!pip install tensorflow-gpu==2.0.0 tensorflow-addons==0.6.0

Install supporting functions for use in the notebook and minimizes clutter

Github link to supporting functions for the workshop

In [0]:
!pip install -U git+https://github.com/yoke2/suptools.git
In [0]:
%load_ext autoreload
%autoreload 2
%matplotlib inline
In [0]:
import gdown
import zipfile
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
print(tf.__version__)
2.0.0
In [0]:
from suptools.core import *
from suptools.imgtools import *
from suptools.tftools import *

Data Collection

Download Images

This is for demonstration purposes only. Please upload your own files.

In [0]:
#gdown.download('https://drive.google.com/uc?export=download&id=18QDpvCyO2BsX2ILMDWyFubvFzPBLkoAi', output='food_urls_3class.zip', quiet=False)
Downloading...
From: https://drive.google.com/uc?export=download&id=18QDpvCyO2BsX2ILMDWyFubvFzPBLkoAi
To: /content/food_urls_3class.zip
100%|██████████| 36.9k/36.9k [00:00<00:00, 26.1MB/s]
Out[0]:
'food_urls_3class.zip'
In [0]:
# with zipfile.ZipFile('food_urls_3class.zip', 'r') as zipObj:
#     zipObj.extractall()

Upload your list of files here. Once done, please proceed to download.

In [0]:
results_cr = download_images('chicken_rice.txt', 'data/chicken_rice')
list(filter(lambda x:'Failed' in x, results_cr))
Out[0]:
["Failed: Can't download https://casaveneracion.com/wp-content/uploads/2016/01/hainanese-chicken-rice-1.jpg",
 "Failed: Can't download https://i0.wp.com/www.guaishushu1.com/wp-content/uploads/2016/12/IMG_4829.jpg?ssl=1",
 "Failed: Can't download https://www.singaporenbeyond.com/wp-content/uploads/2019/01/Hainanese-Chicken-Rice.jpg. Exception: HTTPSConnectionPool(host='www.singaporenbeyond.com', port=443): Max retries exceeded with url: /wp-content/uploads/2019/01/Hainanese-Chicken-Rice.jpg (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))"]
In [0]:
results_mg = download_images('mee_goreng.txt', './data/mee_goreng')
list(filter(lambda x:'Failed' in x, results_mg))
Out[0]:
["Failed: Can't download https://casaveneracion.com/wp-content/uploads/2016/01/mir-goreng-1.jpg",
 "Failed: Can't download https://i1.wp.com/www.guaishushu1.com/wp-content/uploads/2016/04/IMG_3346.jpg?ssl=1",
 "Failed: Can't download x-raw-image:///b2f64b6cb4c7a85cbc3dfd0e93977710cacdb40273a1f1276ac88b5536e732ae. Exception: No connection adapters were found for 'x-raw-image:///b2f64b6cb4c7a85cbc3dfd0e93977710cacdb40273a1f1276ac88b5536e732ae'",
 "Failed: Can't download x-raw-image:///ad8756217c1af3ab15560e9f16ed250213589022eae2dec97b2f875f6ce1a8ef. Exception: No connection adapters were found for 'x-raw-image:///ad8756217c1af3ab15560e9f16ed250213589022eae2dec97b2f875f6ce1a8ef'",
 "Failed: Can't download https://i0.wp.com/www.guaishushu1.com/wp-content/uploads/2016/04/IMG_3361.jpg?ssl=1"]
In [0]:
results_rp = download_images('roti_prata.txt', './data/roti_prata')
list(filter(lambda x:'Failed' in x, results_rp))
Out[0]:
["Failed: Can't download http://www.makansutra.com/images/story/detail/03c84a_sdefregtrhy5.jpg. Exception: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))",
 "Failed: Can't download https://admin.havehalalwilltravel.com/wp-content/uploads/2017/06/14-julaiha_900.jpg. Exception: HTTPSConnectionPool(host='admin.havehalalwilltravel.com', port=443): Max retries exceeded with url: /wp-content/uploads/2017/06/14-julaiha_900.jpg (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:852)'),))",
 "Failed: Can't download https://i2.wp.com/i1296.photobucket.com/albums/ag5/thefifthparlour/6F36498C-A006-4791-84E6-A852D27081C6_zpstiouvd89.jpg",
 "Failed: Can't download https://mothership.sg/canornot/img/uploads/images/000/000/500/original/prata.jpg"]
In [0]:
??download_images
In [0]:
??download_image

Verify Images

In [0]:
verify_images(Path('./data'), recurse=True)
In [0]:
??verify_images
In [0]:
??verify_image_tf

Optional: Save images to local

Due to large file size, you can download these verified images using croc efficiently.

Link to Github for Croc

Windows Install:
Download and unzip Windows binary from the Github release page

Alternative you can install using scoop

$ scoop bucket add schollz-bucket https://github.com/schollz/scoop-bucket.git
$ scoop install croc

Mac Install:

$ brew install schollz/tap/croc

Linux/Windows Subsystem for Linux Install:

$ curl https://getcroc.schollz.com | bash

Command to enter on desktop side:

croc -yes <provided-pass-code>
In [0]:
!zip -r food_data_verified.zip ./data
In [0]:
!curl https://getcroc.schollz.com | bash
In [0]:
!croc send food_data_verified.zip

Optional: Saving to images to Google Cloud Storage or Google Drive

Alternatively, you can save the images to Google Cloud Storage or Google Drive.

Save zipped images to Cloud Storage Bucket
Please refer to sample code snippets in Google Colab and modify accordingly. (Code snippets > Saving data with the Cloud Storage Python API)

Save zipped images to Google Drive
Please refer to sample code snippets in Google Colab and modify accordingly. (Code snippets > Saving data to Google Drive)

Optional: Download prepared data

In case that you are unable to follow through the previous steps, here is a zip file of images prepared prior.

In [0]:
# Delete any previous data
!rm -rf ./data
In [0]:
gdown.download('https://drive.google.com/uc?export=download&id=1ku2anpAgPkdOchsAeIz-rrFDgdY08BKy', output='food_data_verified.zip', quiet=False)
Downloading...
From: https://drive.google.com/uc?export=download&id=1ku2anpAgPkdOchsAeIz-rrFDgdY08BKy
To: /content/food_data_verified.zip
279MB [00:01, 150MB/s]
Out[0]:
'food_data_verified.zip'
In [0]:
with zipfile.ZipFile('food_data_verified.zip', 'r') as zipObj:
    zipObj.extractall()

Data processing using tf.data

In [0]:
datapath=Path('./data')

Note: We have to reuse CLASS_NAMES in the displayed order below later.

In [0]:
CLASS_NAMES = np.array([x.name for x in datapath.glob('*')])
CLASS_NAMES
Out[0]:
array(['mee_goreng', 'roti_prata', 'chicken_rice'], dtype='<U12')
In [0]:
BATCH_SIZE = 32
IMG_SIZE = 224

Create Train, Valid and Test sets

Note: This is done at filepath level

In [0]:
all_files = get_all_files(datapath, recurse=True)
In [0]:
train_filepaths, tmp_filepaths = train_test_split(all_files, valid_pct=0.3, seed=42)
valid_filepaths, test_filepaths = train_test_split(tmp_filepaths, valid_pct=0.5, seed=42)
In [0]:
len(train_filepaths),len(valid_filepaths),len(test_filepaths)
Out[0]:
(709, 152, 153)

Image Augmentations

You can experiment with other image augmentation functions provided in suptools.tftools

List of image augmentation functions to experiment

In [0]:
train_aug = [random_crop, random_flip, random_rotate]
valid_aug = [central_crop]
aug = (train_aug, valid_aug)

Process data and create tf.data.Dataset

In [0]:
train_ds = read_img_dataset([str(x) for x in train_filepaths], CLASS_NAMES=CLASS_NAMES, shuffle_size=1024, img_size=IMG_SIZE, batch_size=BATCH_SIZE, n_parallel=4, augments=aug, mode='train')
valid_ds = read_img_dataset([str(x) for x in valid_filepaths], CLASS_NAMES=CLASS_NAMES, img_size=IMG_SIZE, batch_size=BATCH_SIZE, n_parallel=4, augments=aug, mode='valid')
test_ds = read_img_dataset([str(x) for x in test_filepaths], CLASS_NAMES=CLASS_NAMES, img_size=IMG_SIZE, batch_size=BATCH_SIZE, n_parallel=4, augments=aug, mode='test')
In [0]:
??read_img_dataset
In [0]:
??process_img_path

Observe a single batch across Train, Valid and Test sets

In [0]:
show_batch(train_ds,CLASS_NAMES)
In [0]:
show_batch(valid_ds,CLASS_NAMES)