Data Science Learning – A Cross Reference

< 1 min. read

While learning data science, I’ve discovered that it is very useful to think of the data science processing as a “pipeline” i.e. a series of actions in a process. Along this pipeline, you will be tackling lots of “How do I…” questions like “How do I remove NA values?” and “How do I create N-grams?”

Furthermore, given the many data science tools and languages available online, you will most likely ask the same questions again when you are learning how to perform data science tasks in another language. While Google Search and Stack Overflow/Stack Exchange comes in very handy when searching for answers, I wanted some structure – a collection of sorts – to these questions and have working examples in different language implementations.

The first two languages I’m starting with will be R and Python – two very popular languages for statistical data analysis. Topics-wise, I’m starting with Linear Algebra and Random Number Generation.

Data Science Cross Reference