Find version of python package installed

Below are 3 methods we can try to find the version of an installed python package. We shall use scipy as an example.

Using pip

Method 1 – For pip 1.3 and above: pip show scipy

Method 2 – Alternative (works with older versions of pip): pip freeze | grep scipy

Using version attribute

Method 3 – Launch python/ipython, then execute the commands below:

import scipy
scipy.__version__

Reference for Method 1 is here. Reference for Method 2 is here.

Getting session information in Python

We’ve gone through how to get session information in R previously, so how do we do the same for Python? It seems that there is no single convenient function available so here’s one approach.

To get the system information, you can utilize the commonly used IPython package:

import IPython
IPython.sys_info()

To find out packages that have been loaded at the time (includes modules loaded by Python itself and by any Python IDE), you can utilize the sys.modules.keys() method. The code below gets the package name rather than the sub-components.

import sys
packages = set()
for name in sys.modules.keys():
    packages.add(name.split('.')[0])

print sorted(packages)

Data Science Learning – A Cross Reference

While learning data science, I’ve discovered that it is very useful to think of the data science processing as a “pipeline” i.e. a series of actions in a process. Along this pipeline, you will be tackling lots of “How do I…” questions like “How do I remove NA values?” and “How do I create N-grams?”

Furthermore, given the many data science tools and languages available online, you will most likely ask the same questions again when you are learning how to perform data science tasks in another language. While Google Search and Stack Overflow/Stack Exchange comes in very handy when searching for answers, I wanted some structure – a collection of sorts – to these questions and have working examples in different language implementations.

The first two languages I’m starting with will be R and Python – two very popular languages for statistical data analysis. Topics-wise, I’m starting with Linear Algebra and Random Number Generation.

Data Science Cross Reference

A virtual environment for data science

I wanted to conveniently use data science tools without the hassle of installing the required languages and packages, while benefiting from the strengths of the Linux command line tools. There is a pre-packaged VM called the Data Science Toolbox that fills this need.

It comes with R and Python installed, along with the respective popular data analysis packages for R and Python. You will be able to install the VM successfully by following the instructions on the website, including installation of pre-requisites like VirtualBox and Vagrant.

One thing to note is that the VM is set up to be accessed through SSH and that it is configured to use 2 CPU cores and 2 GB RAM by default. If you need to increase the RAM allocation, you will need to edit Vagrantfile and change the vb.memory setting that is expressed in MB. For example 8 GB will be 8192 MB – see code sample below.

 config.vm.provider "virtualbox" do |vb|

  # Customize the amount of memory on the VM:
    vb.memory = "8192"
 end