This page showcases the Data Science Projects that I’ve completed as part of the Johns Hopkins University Data Science Specialization Track offered through Coursera from 2014 to 2015. You will see a brief description on the project, followed by links to applications, repositories, presentations and/or reports, where applicable. (Image credit: Drew Conway)
Capstone Project
In this inaugural capstone run, which is offered in partnership with SwiftKey, I’ve created a text prediction application that allows a user to enter a phrase and subsequently predict the next word the user might enter.
Highlights
– Applied Natural Language Processing concepts such as tokenization, stemming and language modelling
– Created N-grams from text corpus using R packages like tm and RWeka
– Created presentations and application using knitr, HTML5 R Presentation (created with RStudio), and Shinyapps.io
Web App (Shinyapps.io) | Presentation (Rpubs) | Milestone report (Rpubs)
Developing Data Products
In this project, I’ve created a data product using Shinyapps.io, a platform that allows the deployment of web applications using the R Environment via the RStudio IDE.
Highlights
– Created a query tool to convert postal addresses or place names to map coordinates
– Integration with Google Maps API for geocoding
– Presentation and application created using slidify R package and shinyapps.io
Web App (Shinyapps.io) | Presentation (Github Pages)
Practical Machine Learning
For this project, I worked on the Human Activity Recognition dataset where data are recorded by sensors in wearable activity trackers similar to the products created by Nike and Fitbit. A predictive model that can recognize human activities like sitting-down and standing-up is created.
Highlights
– Built a prediction model with the Random Forest classifier using the caret R package
– Applied Prediction Study design principles like creation of training, validation and test sets, as well as model selection and cross validation
– Created a HTML report with R Markdown and knitr R package
Regression Models
In this project, I analyzed the provided dataset and created a regression model to answer questions on motor car trends. An analysis in PDF format is produced.
Highlights
– Built a multivariate linear regression model with R
– Applied statistical techniques like t-tests and stepwise regression
– Created a PDF report using R Markdown and knitr package
Reproducible Research
In this project, I worked on the Storm Events Database to produce an analysis of the impact of weather events in the United States. The report concludes by identifying the top 10 events that cause the greatest casualties and greatest monetary damage.
Highlights
– Created a reproducible report using R Markdown and knitr package
– Cleaned and visualized data using R and ggplot2 package
Exploratory Data Analysis
In this course, I performed exploratory data analysis on the “Individual household electric power consumption Data Set” provided by the UC Irvine Machine Learning Repository and created plots using the base R graphics system and ggplot2 package.
Highlights
– Visualized data using base R graphics and ggplot2 package
Getting and Cleaning Data
In this project, I cleaned a raw data source and produced a tidy dataset. You can see the analysis file, tidy dataset and codebook on Github.
Highlights
– Created a tidy data set after cleaning raw data
– Created a complimentary codebook for tidy data set