Neural net – cats and dogs

In my previous post, I created and trained a neural network with some toy data.  The intent was to do something simple before hitting real-world data.  Here my “real problem” is image classification.  Specifically, I borrowed a problem from Kaggle where we are asked to train a classifier for distinguishing cats from dogs.  I recognize that my best bet would be to tweak an existing network as training larger networks tends to be difficult, but performance is not my goal.  Rather I want to start from the beginning; I plan to build my classifier from scratch (and hopefully learn something in the process).

cat_dog
Samples from Kaggle’s training set for cats and dogs downsampled and made grayscale for input into my neural net.

 Results up front

I created several convolutional neural nets with different layer structures, but due to long training times, I decided not to explore very much and quickly set a goal for a 73% correct classification rate.  I picked the value based on the 215 submissions in the Kaggle competition; this is about the median performance.  Most importantly, this goal gives me an excuse to quit and not waste an inordinate amount of time trying to improve upon this project.  Ultimately, I ended up quitting with 73.1% of the images in my validation set classified correctly.  There was still some hay to be made, but I called it a day.  With respect to performance on my test set, the model had a 73.85% correct classification rate – I’m satisfied.  For reference on state-of-the-art, the winner of the Kaggle contest correctly classified >98% of the images.  This winner was a student of LeCun, and he applied his PhD thesis to the problem.

The last layer of my net is a logistic regression layer, so for each image, my neural net produces a score for cat and a score for dog. Below I picked out the 3 cats and the 3 dogs with the highest “cat scores” and the highest “dog scores.”  These are the strongest examples of correctly identified cats/dogs and misclassified cats/dogs according to the model.

cat
Top: Cats with highest cat scores.  Bottom: Dogs with highest cat scores (misclassified – I suppose they look a little cat-ish).
dog
Top: Cats with highest dog scores (misclassified – do they look dog-ish to you?)  Bottom: Dogs with highest dog scores.

Data exploration and pre-processing

Collecting the data was easy – it was done for me.  Kaggle provided a labeled training set and a non-labeled test set.  The test set is useless to me without labels, so I ignored it and subdivided the provided training set into my own training, validation, and testing sets.  My partition used 2,000 images for testing, 2,000 for validation, and the remaining 21,000 went to my training set (equal number of cats and dogs in each).  My first real task was to get the data into a useful form.

Each image is a .jpg, but the pictures come in a wide range of sizes.  For example, one image was as big as 1050 pixels wide and another was as small as 50.  I needed a constant, standard size for input into a neural net; I picked 100 x 100 somewhat arbitrarily, but this is large enough to keep much of the detail in the images and not so large that computation becomes too awful for the sizes of nets I was making.  I scaled each image to these dimensions by downsampling or interpolation according to the larger dimension and added a border of 0’s to the other.  I also made the images grayscale for simplicity, subtracted the mean pixel value (centered), and converted them to numpy arrays before saving in this format.  I also created a few extra tools such as a simple function for displaying images, and another for creating a list containing the locations of the images on my hard drive.  Finally, I created a class called “Data” which is what I will use for input into my soon-to-be-created neural net.  The specifics are all encapsulated by the python code below.

import numpy as np
import matplotlib.pyplot as plt
import glob
from PIL import Image        
import os                                                                                                                        

#------------------------------------------------------------------------------------------#
#DATA EXPLORATION AND PREPROCESSING
#------------------------------------------------------------------------------------------#

def get_dims(
    in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Original'):
    
    """Gets basic data for the collection of images.  This info is used to determine how to
    resize and crop our data in the functions that follow
    
    Args:
        in_directory (string): location where all images are stored
    
    Returns:
        int: number of images
        float: mean image width
        float: mean image height
        int: max image width
        int: max image height
        int: min image width
        int: min image height"""    
    
    path = glob.glob(in_directory+"\*.jpg")
    
    print("finding extreme image dimensions")
    max_w = 0
    max_h = 0
    min_w = 1e10
    min_h = 1e10
    sum_w = 0
    sum_h = 0
    for i in range(len(path)):
        img = Image.open(path[i])
        size = img.size
        sum_w += size[0]
        sum_h += size[1]
        if size[0] > max_w: max_w = size[0]
        if size[1] > max_h: max_h = size[1]
        if size[0] < min_w: min_w = size[0]
        if size[1] < min_h: min_h = size[1]
    mean_h=sum_h/len(path)
    mean_w=sum_w/len(path)
    n_images=len(path)
    print("Number images = {}. \n"
          "Mean width = {}. \n"
          "Mean height = {}. \n"
          "Max width = {}. \n"
          "Max height = {}. \n"
          "Min width = {}. \n"
          "Min height = {}. \n".format(n_images, mean_w, mean_h, max_w, max_h, min_w, min_h))
    return n_images, mean_w, mean_h, max_w, max_h, min_w, min_h
        
def munge(
    in_directory = r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Original',
    out_directory = r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Munged',
    width=100,
    height=100):
        
    """Centers, scales, and crops our images - then saves each image as a 
    numpy array.  Also creates a vector of labels.
    
    Args:
        in_directory (string): location of images
        out_directory (string): location to save munged images
        width: width of munged images
        height: height of munged images
    
    Returns:
        None - function saves munged images as numpy arrays and saves label
            vector in out_directory."""
    
    path = glob.glob(in_directory+"\*.jpg")
    num_imgs = len(path)
    labels = np.zeros(num_imgs,dtype='int32') #initialize labels as all zeros
    print("creating labels \n"
        "centering images \n"
        "converting to grayscale \n"
        "saving as numpy arrays")

    for i in range(len(path)):
        
        if 'dog' in path[i]: labels[i] = 1 #dogs have label 1, cats remain 0
        img = Image.open(path[i])
        size = img.size
        ratio = np.array(min(float(width)/size[0],float(height)/size[1]))
        new_size = (size * ratio).astype(int)
        img = img.convert('L') #grayscale
        img = img.resize(new_size, Image.ANTIALIAS)
        
        half_width = img.size[0]//2
        half_height = img.size[1]//2        
        
        img = img.crop((
                    half_width - width//2,
                    half_height - height//2,
                    half_width + width//2,
                    half_height + height//2))         
              
        img = np.array(img)/255. #divide by 255 as PIL is 0-255 but matplotlib uses 0-1
        img = img - np.mean(img) 
        name = os.path.split(path[i])[1][:-3]+'npy'
        np.save(out_directory+'\\'+name, img)
        if i%1000 == 0: print("iteration {} of {} complete".format(i+1,len(path)))
    np.save(out_directory+'\\'+'labels', labels)
    return 

def view_image(location):
    
    """Displays image using matplotlib.  
    
    Args:
        name (string): name of image such as 'cat.123'
        directory (string): location of image
    
    Returns:
        None - displays image"""
    
    img = np.load(location)
    img = img - np.min(img) #image was centered, so make entries between 0 and 1
    plt.imshow(img, cmap='Greys_r') #reverse grayscale colormap

#------------------------------------------------------------------------------------------#
#TOOLS FOR USING DATA IN NEURAL NET
#------------------------------------------------------------------------------------------#

def get_image_list(in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Munged'):
    
    """Creates a list of image addresses.
    
    Args"
        in_directory (string): location containing the numpy arrays of images
    
    Returns:
        list of strings: location/name of each numpy array"""
    
    path=glob.glob(in_directory+"\*.npy")
    path=path[:-1] #this removes the last address which is the labels
    return path    
    
class Data():
    """Class which we will help organize inputs into our Nnets
    
    Attributes:
        images (numpy array of floats): [n_images, n_channels, height, width]
            since gray-scale, n_channels = 1
        labels (list of ints): 0 -> cat, 1 -> dog
        n_classes (int)"""    
    
    def __init__(
        self, 
        path, 
        index, 
        in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Munged'):
        
        """Instantiates Data
        
        Args:
            path (list of strings): list containing every image location
            index (list of ints): indices of the images we will take
            in_directory (string): directory which contains all the images and associated label array"""
        
        self.images = np.zeros([len(index),1,100,100])
        self.labels = np.load(in_directory+'//'+'labels.npy')[index]
        self.n_classes=2
        count = 0
        for i in index:
            self.images[count,0,:,:]=np.load(path[i])
            count+=1

Building the net

After saving all my images as 100 x 100 numpy arrays, it was time to create and train the net.  I decided to learn Theano.  This is a Python library which allows you to symbolically define functions and then efficiently evaluate; if numpy is comparable to MATLAB and sympy to Mathematica, some describe Theano as a hybrid between the two.  It’s convenient for me in that after symbolically defining a function, calculating gradients is easy.  Further, there are built in functions for convolution, pooling, and other useful operations for neural nets.

Beyond Theano, the other big improvements I’ve made over my previous toy problem are that the code is very modular (and reusable), and very little is hardcoded.  There are classes for each type of layer (convolution/pooling layer, fully connected layer, regression layer) and a NNet class, an instance of which is a neural net built from instances of the different types of layers.  I did get some of these ideas for how to structure my code from here.  Finally, I have user set parameters at the beginning of the script where you can control how the NNet class is instantiated (number, type, and size of all layers).  This is done in the function “build_model.”  I then have a second function called “train_model” which fits the model via gradient decent.

I’ll attach my code (mostly for my own reference), but there are a few specific items I want to note:

  • My training set is too large to load all the images into memory – I create a list of the images and permute it in order to load random batches to perform gradient decent.
  • As I trained, I tracked my objective function value and mislcassification rate for both my training and validation sets.  I varied my learning rate  when my validation objective function value consistently failed to improve and quit training when the improvement in my objective function was small (the model could have benefited from more training, but it wasn’t worth the time needed to continue).
  • I instituted momentum for the first time in my gradient decent process.  The idea here is that at a given point, there will be randomness in my gradient since I am taking a random batch.  Momentum adds a proportion of the previously calculated gradient to the current gradient.  Doing this intuitively cancels some of the randomness in the gradients while accentuating similarities.  Even if I used the entire training set for my gradient calculations, momentum could be useful.  One way to see this is to visualize a response surface shaped as a valley with steep sides and a gentle slope along the valley floor.  Momentum will help prevent us from shooting back and forth across the valley due to the large gradients associated with the steep sides.
import time
import theano
import theano.tensor as T
import numpy as np
from theano.tensor.signal import pool 
#bug in Theano where 'signal' is not recognized under theano.tensor
#I have to import separately

#------------------------------------------------------------------------------------------#
#PARAMETERS
#Set paramters for the nnet and training
#------------------------------------------------------------------------------------------#

batch_size = 200
learning_rate=0.01
momentum = 0.4
reg=0.01 #using L2 regularization
n_epochs=10

n_filters = [20,50,50,100,100] #number of filters (channels) for each convolutional layer
filter_dims = [(5,5),(3,3),(3,3),(2,2),(2,2)] #receptive field size for filters
strides = [(1,1),(1,1),(1,1),(1,1),(1,1)] #strides for convolution (downsampling)
poolings=[(2,2),None,(2,2),None,None] #effective max pooling sizes
fully_connected_sizes=[300,300] 

#------------------------------------------------------------------------------------------#
#MISC USEFUL FUNCTIONS
#------------------------------------------------------------------------------------------#

def relu(x):
    """The rectified linear unit which will be used as my activation function for most layers.
    Newer versions of Theano have a built in relu function.
    
    Args:
        x (theano tensor): as used in here, x will be a matrix or a 4D tensor
    
    Returns:
        theano tensor var: when theano function is executed, negative values set to zero and
            postive ones are unchanged"""
        
    return T.switch(x<0, 0, x)

def ceildiv(x,y):
    """Ceiling division - rounds up the quotient x/y
    
    Args:
        x (float): numerator
        y (float): denominator
    
    Returns:
        int: ceiling of x/y"""
    
    return -(-x//y)
        
activation = relu

#------------------------------------------------------------------------------------------#
#NNET LAYERS
#classes for each type of layer in the nnet
#------------------------------------------------------------------------------------------#
    
class RegressionLayer():
    """The final regression layer of a nnet   
    
    Attributes:
        input (theano tensor var): when run, is a numpy array of floats with shape [n_images, n_in]
        W (theano shared tensor var): weights - numpy array of floats when run
        b (theano shared tensor var): bias - numpy array of floats when run
        scores (theano tensor var): softmax of output - array of floats when run
        predicted_labels (theano tensor var): array of ints when run
        params: model paramters - list containing attributes W and b"""
    
    def __init__(self, input, n_in, n_out, W=None, b=None):
        """Initializes the regression layer
        
        Args:  
            input (theano tensor var)
            n_in (int): number of input nodes
            n_out (int): number of output nodes (ie number of classes)
            W (theano shared tensor var): weights
            b (theano shared tensor var): bias"""

        
        if W is None: 
            W = np.random.randn(n_in, n_out)*np.sqrt(2/n_in)
            W = theano.shared(value=W, name='W')
            
        if b is None: 
            b = np.zeros([1, n_out])
            b = theano.shared(value=b, name='b', broadcastable=(True,False))
        
        self.input = input
        self.W = W
        self.b = b
        self.scores = T.nnet.softmax(T.dot(input, self.W)+self.b)
        self.predicted_labels = T.argmax(self.scores, axis = 1)
        self.params = [self.W, self.b]
    
    def cross_entropy(self, labels):
        """We will use cross entropy as our objective function for minimization
        
        Args:
            labels (numpy array of ints) - the true labels of the input data
        
        Returns:
            float: the average cross-entropy between model output and the true labels"""
            
        return -T.mean(T.log(self.scores)[T.arange(labels.shape[0]),labels])
    
    def misclass(self, labels):
        """Misclass percentage is a more interpretable metric than cross_entropy
        
        Args:
            labels (numpy array of ints) - the true labels of the input data
        
        Returns:
            float: the percentage of data points misclassified by the model"""
        
        return T.mean(T.neq(self.predicted_labels,labels))

class FullyConnectedLayer():
    """A dense layer in a nnet
    
    Attributes:
        input (theano tensor var) when run, is a numpy array of floats with shape [n_images, n_in]
        W (theano shared tensor var): weights - numpy array of floats when run
        b (theano shared tensor var): bias - numpy array of floats when run
        output (theno tensor var)
        params: model paramters - list containing attributes W and b"""
    
    def __init__(self, input, n_in, n_out, activation, W=None, b=None):
        """Initializes the regression layer
        
        Args:  
            input (theano tensor var)
            n_in (int): number of input nodes
            n_out (int): number of output nodes (ie number of classes)
            activation (function): non-linear activiation function - default is rectified linear unit
            W (theano shared tensor var): weights
            b (theano shared tensor var): bias"""

        self.input = input
        
        if W is None:
            W = np.random.randn(n_in, n_out)*np.sqrt(2/n_in)
            W = theano.shared(value=W, name='W')
            
        if b is None:
            b = np.zeros([1, n_out])
            b = theano.shared(value=b, name='b', broadcastable=(True,False))

        self.W = W
        self.b = b  
        
        lin_output = T.dot(input, self.W) + self.b
        
        if activation is None:
            self.output = lin_output
        else:
            self.output = activation(lin_output)
        
        self.params = [self.W, self.b] 

class ConvPoolLayer():
    """A layer which performs convolution and optional max pooling   
    
    Attributes:
        input (theano tensor var): when run, is a numpy array of floats with shape 
            [n_images, n_feature_maps, image_height, image_width]
        W (theano shared tensor var): filter weights - numpy array of floats when run
        b (theano shared tensor var): bias - numpy array of floats when run
        stride (ordered pair of ints): horizontal and vertical stride in convoluation
        pooling (ordered pair of ints): horizontal and vertical receptive region for pooling
            None for no pooling
        output (theno tensor var)
        shape (theano tensor var): when run, is a 4-tuple of ints
        params: model paramters - list containing attributes W and b"""    
    
    
    def __init__(self, 
                input, 
                filter_shape, 
                stride, 
                pooling,
                activation,
                W=None, 
                b=None):
        """Initializes a convolution/pooling layer
        
        Args:  
            input (theano tensor var)
            filter_shape: numpy array if ints - [n_feature_maps_output, n_feature_maps_input, filter_height, filter_width]
            stride (ordered pair of ints)
            pooling (ordered pair of ints)
            activation (function): non-linear activiation function - default is rectified linear unit
            W (theano shared tensor var): weights
            b (theano shared tensor var): bias"""      
        
        self.input = input
        fan_in = np.prod(filter_shape[1:])
        #fan_out = (filter_shape[0] * np.prod(filter_shape[2:]) // np.prod(pooling))
        
        if W is None:
            W = np.random.randn(filter_shape[0],
                                       filter_shape[1],
                                       filter_shape[2],
                                       filter_shape[3]) *np.sqrt(1/fan_in)
            W = theano.shared(value=W, name='W', borrow=True)
            
        if b is None:
            b = np.zeros((1,filter_shape[0],1,1))
            b = theano.shared(value=b, 
                              name='b', 
                              borrow=True, 
                              broadcastable=(True,False,True,True))
        
        self.stride = stride
        self.pooling = pooling
        self.W = W
        self.b = b  
        self.params = [self.W, self.b]

        output = T.nnet.conv.conv2d(self.input, 
                                         self.W, 
                                         border_mode='valid', 
                                         subsample=stride
                                         )  + self.b
        
        if pooling != None:
            output = pool.pool_2d(output, 
                                  pooling, 
                                  ignore_border=False, 
                                  st=None, 
                                  padding=(0,0),
                                  mode='max') 

        self.output = activation(output)
        self.shape = np.shape(self.output)

#------------------------------------------------------------------------------------------#
#THE NNET
#class which combines the layers above into an instance of the desired nnet structure
#------------------------------------------------------------------------------------------#

class NNet():
    """Class which will use instances of the layer classes to create a neural
    network.
    
    Attributes:
        input (theano tensor): 4D array of doubles when run
        params (list of shared theano tensors): values  of list elements are
            matrices and vectors of floats
        L2 (theano tensor): float when run
        layer_x (instances of layer classes): one attribute for each layer in 
            the net - layer_0, layer_1, etc...
        cross_entropy: regression layer method
        scores (theano tensor): softmax of output - array of floats when run
        predicted_labels (theano tensor): array of ints when run
        misclass: regression layer method"""
    
    def __init__(self, 
                 input, 
                 filter_shapes,
                 strides,
                 image_shape, 
                 n_classes, 
                 poolings, 
                 fully_connected_sizes,
                 activation,
                 *params):
        """Instantiates a neural net
        
        Args:
            input (theano tensor var)
            filter_shapes (list of numpy arrays if ints): [n_feature_maps_output, 
                n_feature_maps_input, filter_height, filter_width] fir each layer
            strides (list of ordered pairs of ints): strides for each layer
            image_shape (numpy array of ints): [n_images, n_channels, height, width]
            n_classes (int): number of classes
            poolings (ordered pair of ints): effective pool size for each layer
            fully_connected_sizes (list of ints): number of neurons for each fully connected layer
            activation (function): non-linear activiation function - default is rectified linear unit
            *params (theano shared tensors): optional input which assigns model parameters to the shared
                tensor variable rather than randomly generate the parameters.""" 
        
        self.input = input 
        self.params = [] 
        self.L2=0  
        
        flag = False
        if params:
            flag = True
            params = np.array(params).reshape(len(params)//2,2)
        
        #-------------------------------------
        #Convolution and optional pooling layers
        #-------------------------------------
        
        in_shape = image_shape
        n_images = image_shape[0]
        n_filters = len(filter_shapes)
        
        print("input shape: {}".format(image_shape))        
        
        for i in range(n_filters):
            filter_shape = filter_shapes[i]
            stride = strides[i]
            pooling = poolings[i]
            layer_name = 'layer_'+str(i)
            prev_layer_name = 'layer_'+str(i-1)
            
            if i==0: 
                layer_input = input
            else: 
                prev_layer = getattr(self, prev_layer_name)
                layer_input = prev_layer.output
            
            if flag == True:
                W, b = params[i]
            else:
                W = None
                b = None
                
            setattr(self, layer_name, ConvPoolLayer(input=layer_input,
                                                    filter_shape = filter_shape, 
                                                    stride=stride,
                                                    pooling=pooling, 
                                                    activation=activation,
                                                    W=W, 
                                                    b=b))
            
            current_layer = getattr(self, layer_name)
            self.params = self.params + current_layer.params
            #self.L2 += (current_layer.W ** 2).sum()  
           
            
            if pooling != None:
                height = ceildiv(ceildiv(in_shape[2]-filter_shape[2]+1, stride[0]),pooling[0])
                width = ceildiv(ceildiv(in_shape[3]-filter_shape[3]+1, stride[1]),pooling[1])
            else:
                height = ceildiv(in_shape[2]-filter_shape[2]+1, stride[0])
                width = ceildiv(in_shape[3]-filter_shape[3]+1, stride[1]) 
            out_shape=[n_images,filter_shape[0], height, width]
            print("{} - convolution and optional max pooling - output shape: {}".format(layer_name, out_shape))
            
            in_shape = out_shape
                
        #-------------------------------------
        #Fully connected layers
        #-------------------------------------             
                
        n_connected_layers=len(fully_connected_sizes)
        
        for i in range(n_filters, n_filters+n_connected_layers):
            layer_name = 'layer_'+str(i)
            prev_layer_name = 'layer_'+str(i-1)
            prev_layer = getattr(self, prev_layer_name)
            n_out = fully_connected_sizes[i-n_filters]
            
            if i==n_filters:
                layer_input = prev_layer.output.reshape([prev_layer.output.shape[0],in_shape[1]*in_shape[2]*in_shape[3]])                                
                n_in=in_shape[1]*in_shape[2]*in_shape[3]
            else:
                layer_input = prev_layer.output
                n_in = fully_connected_sizes[i-n_filters-1]
            
            if flag==True:
                W, b = params[i]
            else:
                W = None
                b = None
            
            setattr(self, layer_name,FullyConnectedLayer(
                                    input=layer_input,
                                    n_in=n_in,
                                    n_out=n_out,
                                    activation=activation,
                                    W=W, 
                                    b=b))
             
            current_layer = getattr(self, layer_name)
            self.params = self.params + current_layer.params
            self.L2 += (current_layer.W ** 2).sum() 
            
            out_shape = [n_images, n_out]
            print("{} - fully connected - output shape: {}".format(layer_name, out_shape))
            
            in_shape = out_shape

        #-------------------------------------
        #Regression layer
        #-------------------------------------  
        
        i = n_filters+n_connected_layers
        layer_name = 'layer_' + str(i)
        prev_layer_name = 'layer_' + str(i-1)
        prev_layer = getattr(self, prev_layer_name)
        n_in = in_shape[1]
        n_out = n_classes
        
        if flag == True:
            W, b = params[i]
        else:
            W = None
            b = None

        setattr(self, layer_name,RegressionLayer(
                                input=prev_layer.output, 
                                n_in=n_in,
                                n_out=n_out,
                                W=W, 
                                b=b)) 
        
        out_shape = [n_images, n_out]
        print("{} - regression -  output shape: {}".format(layer_name, out_shape))
        
        #-------------------------------------
        #Setting final attritbutes for the nnet
        #-------------------------------------         
        
        current_layer = getattr(self, layer_name)
        self.params = self.params + current_layer.params
        self.L2 += (current_layer.W ** 2).sum()
        self.cross_entropy = current_layer.cross_entropy
        self.misclass = current_layer.misclass   
        self.scores = current_layer.scores 
        self.predicted_labels = current_layer.predicted_labels

#------------------------------------------------------------------------------------------#
#BUILDING AND TRAINING NNET
#Create an instance of NNet as our classifier along with theano functions for training and testing 
#------------------------------------------------------------------------------------------#

def build_model(data,
                filter_dims=filter_dims, #[filter_height, filter_width]
                poolings=poolings,
                strides=strides, 
                n_filters=n_filters,
                fully_connected_sizes=fully_connected_sizes,
                activation=activation,
                reg=reg,
                *params):
    """This function creates an instance of the NNet class taking its inputs from the list
    of parameters at the top of this code.
    
    Args:
        filter_dims (list of order pairs): height and width of filters for each convolution layer
        poolings (list of ordered pairs): effective max pooling sizes for each layer (None is no pooling)
        strides (list of ordered paiers): strides for each convolution
        n_filters (list of ints): number of filters for each convolution layer
        fully_connected_sizes (list of ints): number of neurons in each fully connected layer
        activation (function)
        reg (float): L2 regularization parameter
        *params (theano shared tensor): optional input - shared variables with which to initialize the nnet
            parameters
        
    Returns:
        Nnet instance: classifier
        theano function: update - performs gradient decent
        theano function: test - runs one forward pass through the Nnet"""
    
    n_classes = data.n_classes
    image_shape = np.shape(data.images)
    
    filter_shapes = [] 
    n_out_channels = []
    for n_filter, filter_dim in zip(n_filters, filter_dims):
        if n_out_channels == []: filter_shapes.append((n_filter, image_shape[1], filter_dim[0], filter_dim[1]))
        else: filter_shapes.append((n_filter, n_out_channels, filter_dim[0], filter_dim[1]))
        n_out_channels = n_filter
    
    x = T.tensor4('x') #4-D tensor: [n_images, n_feature_maps (channels), image_height, image_width]
    y = T.ivector('y') #1-D vector of labels 
    learning_rate = T.scalar('learning_rate')
    momentum = T.scalar('momentum')
    
    print("building model")
    classifier = NNet(input=x, 
                      filter_shapes=filter_shapes,
                      strides=strides,
                      image_shape=image_shape, 
                      n_classes=n_classes, 
                      poolings=poolings, 
                      fully_connected_sizes=fully_connected_sizes,
                      activation = activation,
                      *params)   
            
    loss = classifier.cross_entropy(y) + reg*classifier.L2
    misclass = classifier.misclass(y)
    scores = classifier.scores
    predicted_labels = classifier.predicted_labels
    
    updates = []
    for param in classifier.params:
        delta_param = theano.shared(value=param.get_value()*0., broadcastable = param.broadcastable)
        updates.append((param,param + delta_param))
        updates.append((delta_param, momentum * delta_param - learning_rate * T.grad(loss, param)))
 
    print("setting up functions")
    
    #theano function which updates parameters via gradient decent
    update = theano.function(
        inputs=[x,y,learning_rate,momentum],
        outputs=[loss, misclass],
        updates=updates
        )
    
    #theano function for testing classifier
    test = theano.function(
        inputs=[x,y],
        outputs=[scores, predicted_labels, misclass, loss]
        )
        
    return classifier, update, test
    
def train_model(update,
                test,
                path_train,
                path_validate,
                batch_size = batch_size,
                n_epochs = n_epochs,
                learning_rate=learning_rate,
                momentum = momentum):
    """Performs gradient decent with minibatches and prints performance indicators on training
    and validation sets.

    Args:
        update (theanoo funciton): performs a single paramter update via gradient decent (with momentum)
        test (theano function): performs a single foward pass of the net
        path_train (list of strings): locations of each training image
        path_validate (list of strings): locations of each validation image
        batch_size (int)
        n_epochs (int): number of epochs over with to perform gradient decent
        learning_rate (float)
        momentum (float)
    
    Returns:
        None - the updated model paramters are captured via the shared theano variables in the instance of Nnet
            associated with update and test"""
    
    n_images = len(path_train)
    mid=n_images//2 #mid point index of path so we can take equal number of cats and dogs in each batch
    index_cats = np.arange(mid)
    index_dogs = np.arange(mid,n_images)    
    
    data_validate = Data(path_validate, np.arange(len(path_validate)),in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Validate\Munged')
    
    for epoch in range(n_epochs):
        
        index_perm_cats = np.random.permutation(index_cats)
        index_perm_dogs = np.random.permutation(index_dogs)
        
        tic = time.clock()   
        
        sum_misclass_train = 0
        sum_loss = 0
                
        for i in range(mid//batch_size):
            terms = np.arange(i*batch_size//2,(i+1)*batch_size//2)
            index = np.concatenate((index_perm_cats[terms], index_perm_dogs[terms]),axis=0).astype(int)
            data_train=Data(path_train,index)
            loss,misclass_train=update(data_train.images,data_train.labels,learning_rate,momentum)
            print("epoch {}/{} - batch {}/{}".format(epoch+1, n_epochs, i+1, mid//batch_size))
            print("loss: {} - misclass: {}".format(loss, misclass_train))
            sum_misclass_train+=misclass_train
            sum_loss+=loss
        _,_,misclass_validate, loss_validate = test(data_validate.images,data_validate.labels)
        
        toc = time.clock()        
        
        print('--------------------------')
        print("epoch number {} of {} complete - running time: {} minutes".format(epoch+1, n_epochs, (toc-tic)/60))
        print("average loss: {}".format(sum_loss/(mid//batch_size)))
        print("average training misclass: {} percent".format(sum_misclass_train/(mid//batch_size)*100))
        print("validation misclass: {} percent".format(misclass_validate*100))
        print("validation loss: {}".format(loss_validate))
        print('--------------------------')

Challenges

By far, the limiting factor here was computational resources.  I don’t have an issue with memory as I am loading my training set in batches, but my final model (5 convolution layers, 2 fully connected layers, and a regression layer – just shy of 100,000 total parameters), took about an hour to run once through the training set.  Obviously this makes picking hyperparamters (learning rate, batch size, size and shape of the network) difficult as any changes take many hours to check if performance has improved.

As a side note, I created my net so that the dimensions of the output of each layer would automatically be sized correctly for input into the next.  I have since learned that I need to be much more intentional about sizing these inputs and outputs.  I found myself wanting to add another layer into a network which I had already trained, but sizing this added layer is challenging without being very intentional about the original dimensions – lesson learned.

Finally, if I wish to greatly improve my model, this will probably require I learn how to leverage my GPU for faster computation (Theano supposedly makes this relatively easy).  Training was so tedious that I was limited to how complex I could make my model, and once I surpassed my performance goal, I called it quits.  Maybe I’ll revisit this one if I decide to learn how to use my GPU with Theano.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s