A virtual environment for data science

I wanted to conveniently use data science tools without the hassle of installing the required languages and packages, while benefiting from the strengths of the Linux command line tools. There is a pre-packaged VM called the Data Science Toolbox that fills this need.

It comes with R and Python installed, along with the respective popular data analysis packages for R and Python. You will be able to install the VM successfully by following the instructions on the website, including installation of pre-requisites like VirtualBox and Vagrant.

One thing to note is that the VM is set up to be accessed through SSH and that it is configured to use 2 CPU cores and 2 GB RAM by default. If you need to increase the RAM allocation, you will need to edit Vagrantfile and change the vb.memory setting that is expressed in MB. For example 8 GB will be 8192 MB – see code sample below.

 config.vm.provider "virtualbox" do |vb|

  # Customize the amount of memory on the VM:
    vb.memory = "8192"
 end