Alex McFarlane

Useful Stuff

06: Libraries

Learning Outcomes

  • Awareness of numpy, pandas and matplotlib
  • Basic use of library functions
  • Self help for library function use
  • Familiarity with ipython

Contents

numpy, pandas & matplotlib

numpy is the main math library in python and pandas (paid for by 2$\sigma$) is probabaly the most important data manipulation library. matplotlib is the most popular core plotting engine used by python developers.

Pandas is nothing more than a convinience tool based in numpy. This is crucially important to understand how they work together

First lets look at simply importing them and calling basic functions

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Note the different ways in which the libraries are called. All good python libraries provide a quick-start guide.

Remember that we are just tying these libraries to object names which can be changed, so the following is a terrible idea!

>>> np = 10
>>> print(np)
10

Now we have overrode our numpy library with the number 10! We can easily revert though by re-importing as below

import numpy as np

Common Bug Error when scripting or hacking code together is to forget that you have previously defined variable names. This is particularly common with variables names like df which will often be used for pandas.DataFrame objects (as you will see). However, coders will regularly reuse df and overwrite previous values which can lead to bugs.

Another common error is overriding default python statements. Common mistakes are assigning new values to the built-in types below. As we can see when we simply run them in python they contain object information (recall that if you enter an undefined name you will get a NameError - check this)

>>> object, type, dir, locals
(object, type, <function dir>, <function locals()>)

Just don’t use the names above: You can generally avoid these issues by giving descriptive names like

my_object = 1
curve_type = False
base_dir = r'\\my\path'
local_string = 'this'

Using library functions

You will generally be solving a problem so for example to find the exponential function in python just google “exponential function in python”. If as is common, you know a function exists in a library but are unsure of spelling (or are lazy) you can start typing the method name and hit TAB like pd.read_ TAB will give a series of autocompletion options to select from

pd.read_csv('06_test.csv')
  Programming language Designed by Appeared Extension
0 Python Guido van Rossum 1991 .py
1 Java James Gosling 1995 .java
2 C++ Bjarne Stroustrup 1983 .cpp

you can find help for a function by doing pd.read_csv? in ipython. A more genereric way of finding help for a function is by

help(pd.read_csv)

PATH and ipython

What follows is a little divergence into the internals of your computer so that you understand how ipython magically starts …

A brief foray into PATH and system env variables

Python scripts can be installed as executable files (ones you can double click) by pip - the tool we encountered before. In order to run these scripts from the commandline we need to make sure that they are on our system PATH.

The system PATH is common in both windows and linux. It contains a list of directories that (may) contain executable files. Any executable files in these directories can directly be ran in the Terminal (or by any other program running). Examples of programs (hopefully!) on your path are:

  • ls, the simple program we used earlier for finding files in a directory
  • python, the python interpretor

The PATH environment variable is simple one of many environment variables that will be o your machine. You can actually create environment variables named whatever you fancy. You may have heard of PYTHONPATH. This is another environment variable only used by python and contains a list of all directories that might have .py files that can be ran from python, hence the nomenclature.

Modifying PATH

We can add directories to our path in various ways

Temporarily
  • Windows set PATH = $env:PATH";\\path\to\dir"
  • Linux export PATH="$PATH:/path/to/dir"
Permanently

ipython

There are two ways to run ipython depending on your python setup

  • python -m ipython
  • ipython

The second is my preference because if it doesn’t work then some folders are incorrectly set up. The first is the default and should always work. If it doesn’t then you should pip install ipython again. Possibly also with the -IU mentioned in a previous exercise!

ipython has some great features like

  • TAB-autocompletion (I TAB as soon as I write more than 3 letters of a variable name that already exists!)
  • syntax highlighting
  • auto indentation
  • magic commands e.g. %paste, %timeit, %hist

We will use ipython for the next few sections before introducing other options

Exercises

In this example we are going to use a combination of these libraries to webscrape the Bank of England mortgage rates. This is a jump from what we have done before to get you familiar with more complex python scripts

We can find a relevant dataset of interest by exploring http://www.bankofengland.co.uk/boeapps/iadb/AtoZ.asp?Travel=NIx and after a bit of faff I found the help area for submitting requests in the top left of the home page http://www.bankofengland.co.uk/boeapps/iadb/help.asp?Back=Y&Highlight=CSV#CSV. I decided to use the 2, 3 & 5 year fixed mortgage rates for 75% LTV. (Explicitly this is Monthly interest rate of UK monetary financial institutions (excl. Central Bank) sterling 2, 3 and 5 year (75% LTV) fixed rate mortgages to households (in percent) not seasonally adjusted)

Don’t worry about fully understanding the python structures below yet. This is just to formulate the example

You can copy the text below and then type %paste in ipython to run all the code on your clipboard

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# googled "query website with python using parameters" - you will notice they all use 'requests'
import requests

url_endpoint = 'http://www.bankofengland.co.uk/boeapps/iadb/fromshowcolumns.asp?csv.x=yes'

# This format comes from googling 'python query website with requests' and combining it with the parameters required
# by the help link on the BoE site (second link above)
payload = {
    'Datefrom'   : '01/Jan/1970',
    'Dateto'     : '01/Sep/2019',
    'SeriesCodes': 'IUMBV34,IUMBV37,IUMBV42',
    'CSVF'       : 'TN',
    'UsingCodes' : 'Y',
    'VPD'        : 'Y',
    'VFD'        : 'N'
}
response = requests.get(url_endpoint, params=payload)

After googling "pandas read_csv from string" I came up with the following

from io import BytesIO
df = pd.read_csv(BytesIO(response.content))

Now peek at result to check it looks sensible

df.head()
  DATE IUMBV34 IUMBV37 IUMBV42
0 31 Jan 1995 8.13 8.84 9.44
1 28 Feb 1995 8.38 8.96 9.38
2 31 Mar 1995 8.08 8.64 9.33
3 30 Apr 1995 8.29 8.54 9.4
4 31 May 1995 8.22 8.18 9.4

Now it’s your turn

Exercise 6.1: Using pandas built-in functions

pandas will allow you to plot data directly with little effort if you have it in a particular format. The df.index is usually the x-coord and the columns are each treated as y-coord for plotting simple scatter / line plots.

You will notice the numeric index on the left of the dataframe contained df. We actually want the 'DATE' to be on the x-axis so you will need to make that the index. Secondly, The df.index is in pandas default datetime. This tends to not plot particularly well. We want to convert it to datetime.datetime format.

  1. Google how to convert a column of a dataframe into datetime format and change the 'DATE' column to datetime.
  2. Google how to set an index in a pandas dataframe and set the index so that df.index will show the 'DATE' field.
  3. Use the .plot method to plot the dataframe.

Note that in ipython you will also need to show the plot. Google "How to show matplotlib plots" if it doesn’t appear.

# Solve me!

Next Topic

07: Flow Control