Learning Outcomes
- Awareness of numpy, pandas and matplotlib
- Basic use of library functions
- Self help for library function use
- Familiarity with
ipython
Contents
numpy
, pandas
& matplotlib
numpy
is the main math library in python
and pandas
(paid for by 2$\sigma$) is probabaly the most important data manipulation library. matplotlib
is the most popular core plotting engine used by python
developers.
Pandas is nothing more than a convinience tool based in numpy. This is crucially important to understand how they work together
First lets look at simply importing them and calling basic functions
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
Note the different ways in which the libraries are called. All good python
libraries provide a quick-start guide.
Remember that we are just tying these libraries to object names which can be changed, so the following is a terrible idea!
>>> np = 10
>>> print(np)
10
Now we have overrode our numpy library with the number 10! We can easily revert though by re-importing as below
import numpy as np
Common Bug Error when scripting or hacking code together is to forget that you have previously defined variable names. This is particularly common with variables names like
df
which will often be used forpandas.DataFrame
objects (as you will see). However, coders will regularly reusedf
and overwrite previous values which can lead to bugs.
Another common error is overriding default python
statements. Common mistakes are assigning new values to the built-in types below. As we can see when we simply run them in python
they contain object information (recall that if you enter an undefined name you will get a NameError
- check this)
>>> object, type, dir, locals
(object, type, <function dir>, <function locals()>)
Just don’t use the names above: You can generally avoid these issues by giving descriptive names like
my_object = 1
curve_type = False
base_dir = r'\\my\path'
local_string = 'this'
Using library functions
You will generally be solving a problem so for example to find the exponential function in python just google “exponential function in python”. If as is common, you know a function exists in a library but are unsure of spelling (or are lazy) you can start typing the method name and hit TAB like pd.read_
TAB
will give a series of autocompletion options to select from
pd.read_csv('06_test.csv')
Programming language | Designed by | Appeared | Extension | |
---|---|---|---|---|
0 | Python | Guido van Rossum | 1991 | .py |
1 | Java | James Gosling | 1995 | .java |
2 | C++ | Bjarne Stroustrup | 1983 | .cpp |
you can find help for a function by doing pd.read_csv?
in ipython
. A more genereric way of finding help for a function is by
help(pd.read_csv)
PATH and ipython
What follows is a little divergence into the internals of your computer so that you understand how ipython
magically starts …
A brief foray into PATH and system env variables
Python scripts can be installed as executable files (ones you can double click) by pip
- the tool we encountered before. In order to run these scripts from the commandline we need to make sure that they are on our system PATH.
The system PATH is common in both windows and linux. It contains a list of directories that (may) contain executable files. Any executable files in these directories can directly be ran in the Terminal (or by any other program running). Examples of programs (hopefully!) on your path are:
ls
, the simple program we used earlier for finding files in a directorypython
, the python interpretor
The PATH
environment variable is simple one of many environment variables that will be o your machine. You can actually create environment variables named whatever you fancy. You may have heard of PYTHONPATH
. This is another environment variable only used by python
and contains a list of all directories that might have .py
files that can be ran from python
, hence the nomenclature.
Modifying PATH
We can add directories to our path in various ways
Temporarily
- Windows
set PATH = $env:PATH";\\path\to\dir"
- Linux
export PATH="$PATH:/path/to/dir"
Permanently
ipython
There are two ways to run ipython
depending on your python setup
python -m ipython
ipython
The second is my preference because if it doesn’t work then some folders are incorrectly set up. The first is the default and should always work. If it doesn’t then you should pip install ipython
again. Possibly also with the -IU
mentioned in a previous exercise!
ipython
has some great features like
TAB
-autocompletion (ITAB
as soon as I write more than 3 letters of a variable name that already exists!)- syntax highlighting
- auto indentation
- magic commands e.g.
%paste
,%timeit
,%hist
We will use ipython
for the next few sections before introducing other options
Exercises
In this example we are going to use a combination of these libraries to webscrape the Bank of England mortgage rates. This is a jump from what we have done before to get you familiar with more complex python scripts
We can find a relevant dataset of interest by exploring http://www.bankofengland.co.uk/boeapps/iadb/AtoZ.asp?Travel=NIx
and after a bit of faff I found the help area for submitting requests in the top left of the home page http://www.bankofengland.co.uk/boeapps/iadb/help.asp?Back=Y&Highlight=CSV#CSV
. I decided to use the 2, 3 & 5 year fixed mortgage rates for 75% LTV. (Explicitly this is Monthly interest rate of UK monetary financial institutions (excl. Central Bank) sterling 2, 3 and 5 year (75% LTV) fixed rate mortgages to households (in percent) not seasonally adjusted)
Don’t worry about fully understanding the python
structures below yet. This is just to formulate the example
You can copy the text below and then type %paste
in ipython
to run all the code on your clipboard
(check if you need to modify the below if you are in a corporate environment!)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# googled "query website with python using parameters" - you will notice they all use 'requests'
import requests
url_endpoint = 'http://www.bankofengland.co.uk/boeapps/iadb/fromshowcolumns.asp?csv.x=yes'
# This format comes from googling 'python query website with requests' and combining it with the parameters required
# by the help link on the BoE site (second link above)
payload = {
'Datefrom' : '01/Jan/1970',
'Dateto' : '01/Sep/2019',
'SeriesCodes': 'IUMBV34,IUMBV37,IUMBV42',
'CSVF' : 'TN',
'UsingCodes' : 'Y',
'VPD' : 'Y',
'VFD' : 'N'
}
response = requests.get(url_endpoint, params=payload)
After googling "pandas read_csv from string"
I came up with the following
from io import BytesIO
df = pd.read_csv(BytesIO(response.content))
Now peek at result to check it looks sensible
df.head()
DATE | IUMBV34 | IUMBV37 | IUMBV42 | |
---|---|---|---|---|
0 | 31 Jan 1995 | 8.13 | 8.84 | 9.44 |
1 | 28 Feb 1995 | 8.38 | 8.96 | 9.38 |
2 | 31 Mar 1995 | 8.08 | 8.64 | 9.33 |
3 | 30 Apr 1995 | 8.29 | 8.54 | 9.4 |
4 | 31 May 1995 | 8.22 | 8.18 | 9.4 |
Now it’s your turn
Exercise 6.1: Using pandas
built-in functions
pandas
will allow you to plot data directly with little effort if you have it in a particular format. The df.index
is usually the x-coord and the columns are each treated as y-coord for plotting simple scatter / line plots.
You will notice the numeric index on the left of the dataframe contained df
. We actually want the 'DATE'
to be on the x-axis so you will need to make that the index. Secondly, The df.index
is in pandas default datetime. This tends to not plot particularly well. We want to convert it to datetime.datetime
format.
- Google how to convert a column of a dataframe into datetime format and change the
'DATE'
column todatetime
. - Google how to set an index in a pandas dataframe and set the index so that
df.index
will show the'DATE'
field. - Use the
.plot
method to plot the dataframe.
Note that in ipython
you will also need to show the plot. Google "How to show matplotlib plots"
if it doesn’t appear.
# Solve me!
Your plot should look something like the below if successful. Note that the x-axis is a dateformat as mentioned above!