< 1 min. read Jupyter Notebooks are a great way to communicate your findings or demonstrating certain concepts or applications in the Data Science and Machine Learning world. While you can easily convert notebooks to static pages using nbconvert, there are challenges integrating them with existing publishing platforms like WordPress. If you are just starting out or don’t mind […]
Dim Sum Classifier – from Data to App part 2

5 min. read Picture Credits here In the previous post, we see how we can acquire data, process, clean and train an Image Classifier to identify some yummy dim sums. In this post, we shall look at completing the loop by developing the web app using starlette (a framework similar to that of flask but supports asynchronous IO), […]
Dim Sum Classifier – from Data to App part 1

7 min. read Picture Credits here In a typically machine learning lifecycle, we will need to acquire data, process data, train and validate/test models and finally deploy the trained models in applications/services. In this first part of two post, inspired by fast.ai 2019 lesson 2, we shall build a Dim Sum (a Cantonese bite-size style of cuisine with […]
Rock, paper, scissors – vision transfer learning with fast.ai

7 min. read Picture Credits: Wikipedia In the previous post, we used the Rock, Paper Scissors notebook that trained a custom image classification model from scratch. While the notebook is demonstrates building custom layers, for such a task, we can also leverage on Transfer Learning using models trained on similar image classification tasks that can often reduce time […]
Find difference between CSV files using PowerShell
< 1 min. read If you are on Windows, you can use PowerShell to find the differences between two CSV files. Sample code below: $file1 = import-csv -Path "D:\path\to\file1.csv" $file2 = import-csv -Path "D:\path\to\file2.csv" Compare-Object $file1 $file2 -property column_to_identify_row -IncludeEqual You can refer to this article for more details.
Find version of python package installed
< 1 min. read Below are 3 methods we can try to find the version of an installed python package. We shall use scipy as an example. Using pip Method 1 – For pip 1.3 and above: pip show scipy Method 2 – Alternative (works with older versions of pip): pip freeze | grep scipy Using version attribute Method […]
Tool to read large text files
< 1 min. read Sometimes, you might need to view and search very large text files like logs or SQL dumps. On Windows and Linux, you can give glogg a try. Please note that this application is read-only.
Running Apache Spark with sparklyr and R in Windows
3 min. read RStudio recently released the sparklyr package that allows users to connect to Apache Spark instances from R. In addition, this package offers dplyr integration, allowing you to utilize Spark as you use dplyr functions like filter and select, which is very convenient. The package will also assist you in downloading and installing Apache Spark if […]