Alex McFarlane

Useful Stuff

Theano on Amazon Web Services for Deep Learning

This is the second part of a multi-part guide on GPU cloud computing for Deep Learning

  1. Set Up Amazon Elastic Compute Cloud (EC2)
  2. Theano on Amazon Web Services for Deep Learning
  3. Set up Microsoft Azure for CUDA Cloud

This entry demonstrates how you can offload computational tasks to an Amazon Elastic Compute Cloud (EC2) instance through Amazon Web Services (AWS). The guide focuses on CUDA support for Theano.


  • Can set up an EC2 Instance - see part one
  • Familiarity with Linux and Bash e.g. sudo, wget, export
  • Familiarity with Ubuntu for apt-get


  1. Connect
  2. Load Software
  3. Run Code
  4. Close Instance
  5. Common Errors
  6. References


Connect to the Instance through SSH. Assuming you followed part 1 this is just

ssh ubuntu@[DNS]

Load Software

See the references for the sources of these instructions. This code is almost identical with a few tweaks.

Note you will have to do this each time you start a new Instance

You can download this code as

# update software
sudo apt-get update
sudo apt-get -y dist-upgrade

# install dependencies
sudo apt-get install -y gcc g++ gfortran build-essential \
    git wget linux-image-generic libopenblas-dev \
    python-dev python-pip ipython python-nose\
    python-numpy python-scipy\
    gnuplot-qt # a lot quicker than matplotlib for runtime plots

# install bleeding edge theano
sudo pip install --upgrade --no-deps git+git://

# get CUDA
sudo wget

# depackage and install CUDA
sudo dpkg -i cuda-repo-ubuntu1404_7.0-28_amd64.deb
sudo apt-get update
sudo apt-get install -y cuda

# update PATH variables
    echo -e "\nexport PATH=/usr/local/cuda/bin:\$PATH";
    echo -e "export LD_LIBRARY_PATH=/usr/local/cuda/lib64";
} >> ~/.bashrc

# reboot for CUDA
sudo reboot

After waiting about a minute for the reboot, ssh back into the Instance

You can download this code as

# install included samples and test cuda
ver=8.0 # version number -- will get a more robust method in a later edit
echo "CUDA version: ${ver}"
/usr/local/cuda/bin/cuda-install-samples-${ver}.sh ~/
cd NVIDIA\_CUDA-${ver}\_Samples/1\_Utilities/deviceQuery

Make sure the test shows that a GPU exists - common errors are listed here. If you don’t have a GPU then skip the next step or use a GPU EC2 Instance

#  set up the theano config file to use gpu by default
    echo -e "\n[global]\nfloatX=float32\ndevice=gpu";
    echo -e "[mode]=FAST_RUN";
    echo -e "\n[nvcc]";
    echo -e "fastmath=True";
    echo -e "\n[cuda]";
    echo -e "root=/usr/local/cuda";
}>> ~/.theanorc

Install any other dependencies you may require.


To obtain CuDNN you must register with the NVIDIA developer programme here. The download page for CuDNN is here and it’s simplest to download the latest Library for Linux to your local machine and scp it over to EC2 as follows

scp -r ~/Downloads/cudnn-8.0-linux-x64-v5.1.tar ubuntu@[DNS]:/home/ubuntu/

where the [DNS] needs to be entered and the filename will differ as the software is updated. Once the scp has transferred the file move to the active ssh terminal instance in EC2 and do the following to install CuDNN

tar -xvf cudnn-8.0-linux-x64-v5.1-tgz
# use tar -xzf cudnn-8.0-linux-x64-v5.1-tgz if the extension is .tgz
cd cuda
sudo cp lib64/* /usr/local/cuda/lib64/
sudo cp include/* /usr/local/cuda/include/

Now add the following to enable CNMeM

echo -e "\n[lib]\ncnmem=0.5" >> ~/.theanorc

A value between 0-1 allocates this fraction of GPU memory to theano so here we allocate 50% to not be stingy.

Now check that theano is configured properly by opening ipython and running

import theano.sandbox.cuda

which gave me the output

Using gpu device 0: Tesla K80 (CNMeM is enabled with initial size: 50.0% of memory, cuDNN 5105)

Run Code

Transfer the relevant code across the the Cloud e.g.

  • Pull from an existing git repository
  • scp files across

If you are running code in a Spot Instance, I would recommend saving results at runtime and passing them back to your local machine. It is sensible to pickle the state of the neural net at runtime so that you can easily continue the training process from a saved state rather than having to run again from scratch.


Don’t forget to Stop or Close the instance once it has completed the task!

Make sure that you check the instance has been closed in addition to the Spot request in the dashboard! I received a 31 hour bill for an unclosed GPU Compute instance that I had thought I closed which was rather annoying.

In theory this can be automated by running the following as root after code has been executed

shutdown -h now

but now I don’t particularly trust the methodology in practice.

Common Errors

CUDA Failures

A few common errors encountered with installing CUDA


If no GPU exists you will receive the following output

./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

NVIDIA: no NVIDIA devices found
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL
ubuntu@ip-172-31-36-215:~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
deviceQuery        deviceQuery.cpp    deviceQuery.o      Makefile           NsightEclipse.xml  readme.txt         
ubuntu@ip-172-31-36-215:~/NVIDIA_CUDA-8.0_Samples/1_Utilities/deviceQuery$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)
NVIDIA: no NVIDIA devices found
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

The resolution is to cancel the instance and get a GPU instance if you require CUDA support.

Unknown symbol in module

This is a slightly more complicated issue that arose since CUDA 7.5

./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

modprobe: ERROR: could not insert 'nvidia_361_uvm': Unknown symbol in module, or unknown parameter (see dmesg)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAIL

The resolution for this is fairly simple and means that you didn’t install linux-image-extra-virtual as above. This is probably because you followed one of the guides in the references which are now out of date.

Simply run this line

# install the required library
sudo apt-get install linux-image-extra-virtual

# restart instance
sudo reboot

then wait a minute or so for a restart and ssh back in and run the CUDA check again which should give the following at the end of the output

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 8.0, CUDA Runtime Version = 8.0, NumDevs = 1, Device0 = Tesla K80
Result = PASS
