Coursera Data Science Specialization Projects

This page showcases the Data Science Projects that I’ve completed as part of the Johns Hopkins University Data Science Specialization Track offered through Coursera from 2014 to 2015. You will see a brief description on the project, followed by links to applications, repositories, presentations and/or reports, where applicable. (Image credit: Drew Conway)

Capstone Project

In this inaugural capstone run, which is offered in partnership with SwiftKey, I’ve created a text prediction application that allows a user to enter a phrase and subsequently predict the next word the user might enter.

– Applied Natural Language Processing concepts such as tokenization, stemming and language modelling
– Created N-grams from text corpus using R packages like tm and RWeka
– Created presentations and application using knitr, HTML5 R Presentation (created with RStudio), and

Web App ( | Presentation (Rpubs) | Milestone report (Rpubs)

Developing Data Products

In this project, I’ve created a data product using, a platform that allows the deployment of web applications using the R Environment via the RStudio IDE.

– Created a query tool to convert postal addresses or place names to map coordinates
– Integration with Google Maps API for geocoding
– Presentation and application created using slidify R package and

Web App ( | Presentation (Github Pages)

Practical Machine Learning

For this project, I worked on the Human Activity Recognition dataset where data are recorded by sensors in wearable activity trackers similar to the products created by Nike and Fitbit. A predictive model that can recognize human activities like sitting-down and standing-up is created.

– Built a prediction model with the Random Forest classifier using the caret R package
– Applied Prediction Study design principles like creation of training, validation and test sets, as well as model selection and cross validation
– Created a HTML report with R Markdown and knitr R package

Report (Github Pages)

Regression Models

In this project, I analyzed the provided dataset and created a regression model to answer questions on motor car trends. An analysis in PDF format is produced.

– Built a multivariate linear regression model with R
– Applied statistical techniques like t-tests and stepwise regression
– Created a PDF report using R Markdown and knitr package


Reproducible Research

In this project, I worked on the Storm Events Database to produce an analysis of the impact of weather events in the United States. The report concludes by identifying the top 10 events that cause the greatest casualties and greatest monetary damage.

– Created a reproducible report using R Markdown and knitr package
– Cleaned and visualized data using R and ggplot2 package

Report (Rpubs)

Exploratory Data Analysis

In this course, I performed exploratory data analysis on the “Individual household electric power consumption Data Set” provided by the UC Irvine Machine Learning Repository and created plots using the base R graphics system and ggplot2 package.

– Visualized data using base R graphics and ggplot2 package


Getting and Cleaning Data

In this project, I cleaned a raw data source and produced a tidy dataset. You can see the analysis file, tidy dataset and codebook on Github.

– Created a tidy data set after cleaning raw data
– Created a complimentary codebook for tidy data set