In my previous post, I created and trained a neural network with some toy data. The intent was to do something simple before hitting real-world data. Here my “real problem” is image classification. Specifically, I borrowed a problem from Kaggle where we are asked to train a classifier for distinguishing cats from dogs. I recognize that my best bet would be to tweak an existing network as training larger networks tends to be difficult, but performance is not my goal. Rather I want to start from the beginning; I plan to build my classifier from scratch (and hopefully learn something in the process).

**Results up front**

I created several convolutional neural nets with different layer structures, but due to long training times, I decided not to explore very much and quickly set a goal for a 73% correct classification rate. I picked the value based on the 215 submissions in the Kaggle competition; this is about the median performance. Most importantly, this goal gives me an excuse to quit and not waste an inordinate amount of time trying to improve upon this project. Ultimately, I ended up quitting with 73.1% of the images in my validation set classified correctly. There was still some hay to be made, but I called it a day. With respect to performance on my test set, the model had a 73.85% correct classification rate – I’m satisfied. For reference on state-of-the-art, the winner of the Kaggle contest correctly classified >98% of the images. This winner was a student of LeCun, and he applied his PhD thesis to the problem.

The last layer of my net is a logistic regression layer, so for each image, my neural net produces a score for cat and a score for dog. Below I picked out the 3 cats and the 3 dogs with the highest “cat scores” and the highest “dog scores.” These are the strongest examples of correctly identified cats/dogs and misclassified cats/dogs according to the model.

**Data exploration and pre-processing**

Collecting the data was easy – it was done for me. Kaggle provided a labeled training set and a non-labeled test set. The test set is useless to me without labels, so I ignored it and subdivided the provided training set into my own training, validation, and testing sets. My partition used 2,000 images for testing, 2,000 for validation, and the remaining 21,000 went to my training set (equal number of cats and dogs in each). My first real task was to get the data into a useful form.

Each image is a .jpg, but the pictures come in a wide range of sizes. For example, one image was as big as 1050 pixels wide and another was as small as 50. I needed a constant, standard size for input into a neural net; I picked 100 x 100 somewhat arbitrarily, but this is large enough to keep much of the detail in the images and not so large that computation becomes too awful for the sizes of nets I was making. I scaled each image to these dimensions by downsampling or interpolation according to the larger dimension and added a border of 0’s to the other. I also made the images grayscale for simplicity, subtracted the mean pixel value (centered), and converted them to numpy arrays before saving in this format. I also created a few extra tools such as a simple function for displaying images, and another for creating a list containing the locations of the images on my hard drive. Finally, I created a class called “Data” which is what I will use for input into my soon-to-be-created neural net. The specifics are all encapsulated by the python code below.

import numpy as np import matplotlib.pyplot as plt import glob from PIL import Image import os #------------------------------------------------------------------------------------------# #DATA EXPLORATION AND PREPROCESSING #------------------------------------------------------------------------------------------# def get_dims( in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Original'): """Gets basic data for the collection of images. This info is used to determine how to resize and crop our data in the functions that follow Args: in_directory (string): location where all images are stored Returns: int: number of images float: mean image width float: mean image height int: max image width int: max image height int: min image width int: min image height""" path = glob.glob(in_directory+"\*.jpg") print("finding extreme image dimensions") max_w = 0 max_h = 0 min_w = 1e10 min_h = 1e10 sum_w = 0 sum_h = 0 for i in range(len(path)): img = Image.open(path[i]) size = img.size sum_w += size[0] sum_h += size[1] if size[0] > max_w: max_w = size[0] if size[1] > max_h: max_h = size[1] if size[0] < min_w: min_w = size[0] if size[1] < min_h: min_h = size[1] mean_h=sum_h/len(path) mean_w=sum_w/len(path) n_images=len(path) print("Number images = {}. \n" "Mean width = {}. \n" "Mean height = {}. \n" "Max width = {}. \n" "Max height = {}. \n" "Min width = {}. \n" "Min height = {}. \n".format(n_images, mean_w, mean_h, max_w, max_h, min_w, min_h)) return n_images, mean_w, mean_h, max_w, max_h, min_w, min_h def munge( in_directory = r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Original', out_directory = r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Munged', width=100, height=100): """Centers, scales, and crops our images - then saves each image as a numpy array. Also creates a vector of labels. Args: in_directory (string): location of images out_directory (string): location to save munged images width: width of munged images height: height of munged images Returns: None - function saves munged images as numpy arrays and saves label vector in out_directory.""" path = glob.glob(in_directory+"\*.jpg") num_imgs = len(path) labels = np.zeros(num_imgs,dtype='int32') #initialize labels as all zeros print("creating labels \n" "centering images \n" "converting to grayscale \n" "saving as numpy arrays") for i in range(len(path)): if 'dog' in path[i]: labels[i] = 1 #dogs have label 1, cats remain 0 img = Image.open(path[i]) size = img.size ratio = np.array(min(float(width)/size[0],float(height)/size[1])) new_size = (size * ratio).astype(int) img = img.convert('L') #grayscale img = img.resize(new_size, Image.ANTIALIAS) half_width = img.size[0]//2 half_height = img.size[1]//2 img = img.crop(( half_width - width//2, half_height - height//2, half_width + width//2, half_height + height//2)) img = np.array(img)/255. #divide by 255 as PIL is 0-255 but matplotlib uses 0-1 img = img - np.mean(img) name = os.path.split(path[i])[1][:-3]+'npy' np.save(out_directory+'\\'+name, img) if i%1000 == 0: print("iteration {} of {} complete".format(i+1,len(path))) np.save(out_directory+'\\'+'labels', labels) return def view_image(location): """Displays image using matplotlib. Args: name (string): name of image such as 'cat.123' directory (string): location of image Returns: None - displays image""" img = np.load(location) img = img - np.min(img) #image was centered, so make entries between 0 and 1 plt.imshow(img, cmap='Greys_r') #reverse grayscale colormap #------------------------------------------------------------------------------------------# #TOOLS FOR USING DATA IN NEURAL NET #------------------------------------------------------------------------------------------# def get_image_list(in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Munged'): """Creates a list of image addresses. Args" in_directory (string): location containing the numpy arrays of images Returns: list of strings: location/name of each numpy array""" path=glob.glob(in_directory+"\*.npy") path=path[:-1] #this removes the last address which is the labels return path class Data(): """Class which we will help organize inputs into our Nnets Attributes: images (numpy array of floats): [n_images, n_channels, height, width] since gray-scale, n_channels = 1 labels (list of ints): 0 -> cat, 1 -> dog n_classes (int)""" def __init__( self, path, index, in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Train\Munged'): """Instantiates Data Args: path (list of strings): list containing every image location index (list of ints): indices of the images we will take in_directory (string): directory which contains all the images and associated label array""" self.images = np.zeros([len(index),1,100,100]) self.labels = np.load(in_directory+'//'+'labels.npy')[index] self.n_classes=2 count = 0 for i in index: self.images[count,0,:,:]=np.load(path[i]) count+=1

**Building the net**

After saving all my images as 100 x 100 numpy arrays, it was time to create and train the net. I decided to learn Theano. This is a Python library which allows you to symbolically define functions and then efficiently evaluate; if numpy is comparable to MATLAB and sympy to Mathematica, some describe Theano as a hybrid between the two. It’s convenient for me in that after symbolically defining a function, calculating gradients is easy. Further, there are built in functions for convolution, pooling, and other useful operations for neural nets.

Beyond Theano, the other big improvements I’ve made over my previous toy problem are that the code is very modular (and reusable), and very little is hardcoded. There are classes for each type of layer (convolution/pooling layer, fully connected layer, regression layer) and a NNet class, an instance of which is a neural net built from instances of the different types of layers. I did get some of these ideas for how to structure my code from here. Finally, I have user set parameters at the beginning of the script where you can control how the NNet class is instantiated (number, type, and size of all layers). This is done in the function “build_model.” I then have a second function called “train_model” which fits the model via gradient decent.

I’ll attach my code (mostly for my own reference), but there are a few specific items I want to note:

- My training set is too large to load all the images into memory – I create a list of the images and permute it in order to load random batches to perform gradient decent.
- As I trained, I tracked my objective function value and mislcassification rate for both my training and validation sets. I varied my learning rate when my validation objective function value consistently failed to improve and quit training when the improvement in my objective function was small (the model could have benefited from more training, but it wasn’t worth the time needed to continue).
- I instituted momentum for the first time in my gradient decent process. The idea here is that at a given point, there will be randomness in my gradient since I am taking a random batch. Momentum adds a proportion of the previously calculated gradient to the current gradient. Doing this intuitively cancels some of the randomness in the gradients while accentuating similarities. Even if I used the entire training set for my gradient calculations, momentum could be useful. One way to see this is to visualize a response surface shaped as a valley with steep sides and a gentle slope along the valley floor. Momentum will help prevent us from shooting back and forth across the valley due to the large gradients associated with the steep sides.

import time import theano import theano.tensor as T import numpy as np from theano.tensor.signal import pool #bug in Theano where 'signal' is not recognized under theano.tensor #I have to import separately #------------------------------------------------------------------------------------------# #PARAMETERS #Set paramters for the nnet and training #------------------------------------------------------------------------------------------# batch_size = 200 learning_rate=0.01 momentum = 0.4 reg=0.01 #using L2 regularization n_epochs=10 n_filters = [20,50,50,100,100] #number of filters (channels) for each convolutional layer filter_dims = [(5,5),(3,3),(3,3),(2,2),(2,2)] #receptive field size for filters strides = [(1,1),(1,1),(1,1),(1,1),(1,1)] #strides for convolution (downsampling) poolings=[(2,2),None,(2,2),None,None] #effective max pooling sizes fully_connected_sizes=[300,300] #------------------------------------------------------------------------------------------# #MISC USEFUL FUNCTIONS #------------------------------------------------------------------------------------------# def relu(x): """The rectified linear unit which will be used as my activation function for most layers. Newer versions of Theano have a built in relu function. Args: x (theano tensor): as used in here, x will be a matrix or a 4D tensor Returns: theano tensor var: when theano function is executed, negative values set to zero and postive ones are unchanged""" return T.switch(x<0, 0, x) def ceildiv(x,y): """Ceiling division - rounds up the quotient x/y Args: x (float): numerator y (float): denominator Returns: int: ceiling of x/y""" return -(-x//y) activation = relu #------------------------------------------------------------------------------------------# #NNET LAYERS #classes for each type of layer in the nnet #------------------------------------------------------------------------------------------# class RegressionLayer(): """The final regression layer of a nnet Attributes: input (theano tensor var): when run, is a numpy array of floats with shape [n_images, n_in] W (theano shared tensor var): weights - numpy array of floats when run b (theano shared tensor var): bias - numpy array of floats when run scores (theano tensor var): softmax of output - array of floats when run predicted_labels (theano tensor var): array of ints when run params: model paramters - list containing attributes W and b""" def __init__(self, input, n_in, n_out, W=None, b=None): """Initializes the regression layer Args: input (theano tensor var) n_in (int): number of input nodes n_out (int): number of output nodes (ie number of classes) W (theano shared tensor var): weights b (theano shared tensor var): bias""" if W is None: W = np.random.randn(n_in, n_out)*np.sqrt(2/n_in) W = theano.shared(value=W, name='W') if b is None: b = np.zeros([1, n_out]) b = theano.shared(value=b, name='b', broadcastable=(True,False)) self.input = input self.W = W self.b = b self.scores = T.nnet.softmax(T.dot(input, self.W)+self.b) self.predicted_labels = T.argmax(self.scores, axis = 1) self.params = [self.W, self.b] def cross_entropy(self, labels): """We will use cross entropy as our objective function for minimization Args: labels (numpy array of ints) - the true labels of the input data Returns: float: the average cross-entropy between model output and the true labels""" return -T.mean(T.log(self.scores)[T.arange(labels.shape[0]),labels]) def misclass(self, labels): """Misclass percentage is a more interpretable metric than cross_entropy Args: labels (numpy array of ints) - the true labels of the input data Returns: float: the percentage of data points misclassified by the model""" return T.mean(T.neq(self.predicted_labels,labels)) class FullyConnectedLayer(): """A dense layer in a nnet Attributes: input (theano tensor var) when run, is a numpy array of floats with shape [n_images, n_in] W (theano shared tensor var): weights - numpy array of floats when run b (theano shared tensor var): bias - numpy array of floats when run output (theno tensor var) params: model paramters - list containing attributes W and b""" def __init__(self, input, n_in, n_out, activation, W=None, b=None): """Initializes the regression layer Args: input (theano tensor var) n_in (int): number of input nodes n_out (int): number of output nodes (ie number of classes) activation (function): non-linear activiation function - default is rectified linear unit W (theano shared tensor var): weights b (theano shared tensor var): bias""" self.input = input if W is None: W = np.random.randn(n_in, n_out)*np.sqrt(2/n_in) W = theano.shared(value=W, name='W') if b is None: b = np.zeros([1, n_out]) b = theano.shared(value=b, name='b', broadcastable=(True,False)) self.W = W self.b = b lin_output = T.dot(input, self.W) + self.b if activation is None: self.output = lin_output else: self.output = activation(lin_output) self.params = [self.W, self.b] class ConvPoolLayer(): """A layer which performs convolution and optional max pooling Attributes: input (theano tensor var): when run, is a numpy array of floats with shape [n_images, n_feature_maps, image_height, image_width] W (theano shared tensor var): filter weights - numpy array of floats when run b (theano shared tensor var): bias - numpy array of floats when run stride (ordered pair of ints): horizontal and vertical stride in convoluation pooling (ordered pair of ints): horizontal and vertical receptive region for pooling None for no pooling output (theno tensor var) shape (theano tensor var): when run, is a 4-tuple of ints params: model paramters - list containing attributes W and b""" def __init__(self, input, filter_shape, stride, pooling, activation, W=None, b=None): """Initializes a convolution/pooling layer Args: input (theano tensor var) filter_shape: numpy array if ints - [n_feature_maps_output, n_feature_maps_input, filter_height, filter_width] stride (ordered pair of ints) pooling (ordered pair of ints) activation (function): non-linear activiation function - default is rectified linear unit W (theano shared tensor var): weights b (theano shared tensor var): bias""" self.input = input fan_in = np.prod(filter_shape[1:]) #fan_out = (filter_shape[0] * np.prod(filter_shape[2:]) // np.prod(pooling)) if W is None: W = np.random.randn(filter_shape[0], filter_shape[1], filter_shape[2], filter_shape[3]) *np.sqrt(1/fan_in) W = theano.shared(value=W, name='W', borrow=True) if b is None: b = np.zeros((1,filter_shape[0],1,1)) b = theano.shared(value=b, name='b', borrow=True, broadcastable=(True,False,True,True)) self.stride = stride self.pooling = pooling self.W = W self.b = b self.params = [self.W, self.b] output = T.nnet.conv.conv2d(self.input, self.W, border_mode='valid', subsample=stride ) + self.b if pooling != None: output = pool.pool_2d(output, pooling, ignore_border=False, st=None, padding=(0,0), mode='max') self.output = activation(output) self.shape = np.shape(self.output) #------------------------------------------------------------------------------------------# #THE NNET #class which combines the layers above into an instance of the desired nnet structure #------------------------------------------------------------------------------------------# class NNet(): """Class which will use instances of the layer classes to create a neural network. Attributes: input (theano tensor): 4D array of doubles when run params (list of shared theano tensors): values of list elements are matrices and vectors of floats L2 (theano tensor): float when run layer_x (instances of layer classes): one attribute for each layer in the net - layer_0, layer_1, etc... cross_entropy: regression layer method scores (theano tensor): softmax of output - array of floats when run predicted_labels (theano tensor): array of ints when run misclass: regression layer method""" def __init__(self, input, filter_shapes, strides, image_shape, n_classes, poolings, fully_connected_sizes, activation, *params): """Instantiates a neural net Args: input (theano tensor var) filter_shapes (list of numpy arrays if ints): [n_feature_maps_output, n_feature_maps_input, filter_height, filter_width] fir each layer strides (list of ordered pairs of ints): strides for each layer image_shape (numpy array of ints): [n_images, n_channels, height, width] n_classes (int): number of classes poolings (ordered pair of ints): effective pool size for each layer fully_connected_sizes (list of ints): number of neurons for each fully connected layer activation (function): non-linear activiation function - default is rectified linear unit *params (theano shared tensors): optional input which assigns model parameters to the shared tensor variable rather than randomly generate the parameters.""" self.input = input self.params = [] self.L2=0 flag = False if params: flag = True params = np.array(params).reshape(len(params)//2,2) #------------------------------------- #Convolution and optional pooling layers #------------------------------------- in_shape = image_shape n_images = image_shape[0] n_filters = len(filter_shapes) print("input shape: {}".format(image_shape)) for i in range(n_filters): filter_shape = filter_shapes[i] stride = strides[i] pooling = poolings[i] layer_name = 'layer_'+str(i) prev_layer_name = 'layer_'+str(i-1) if i==0: layer_input = input else: prev_layer = getattr(self, prev_layer_name) layer_input = prev_layer.output if flag == True: W, b = params[i] else: W = None b = None setattr(self, layer_name, ConvPoolLayer(input=layer_input, filter_shape = filter_shape, stride=stride, pooling=pooling, activation=activation, W=W, b=b)) current_layer = getattr(self, layer_name) self.params = self.params + current_layer.params #self.L2 += (current_layer.W ** 2).sum() if pooling != None: height = ceildiv(ceildiv(in_shape[2]-filter_shape[2]+1, stride[0]),pooling[0]) width = ceildiv(ceildiv(in_shape[3]-filter_shape[3]+1, stride[1]),pooling[1]) else: height = ceildiv(in_shape[2]-filter_shape[2]+1, stride[0]) width = ceildiv(in_shape[3]-filter_shape[3]+1, stride[1]) out_shape=[n_images,filter_shape[0], height, width] print("{} - convolution and optional max pooling - output shape: {}".format(layer_name, out_shape)) in_shape = out_shape #------------------------------------- #Fully connected layers #------------------------------------- n_connected_layers=len(fully_connected_sizes) for i in range(n_filters, n_filters+n_connected_layers): layer_name = 'layer_'+str(i) prev_layer_name = 'layer_'+str(i-1) prev_layer = getattr(self, prev_layer_name) n_out = fully_connected_sizes[i-n_filters] if i==n_filters: layer_input = prev_layer.output.reshape([prev_layer.output.shape[0],in_shape[1]*in_shape[2]*in_shape[3]]) n_in=in_shape[1]*in_shape[2]*in_shape[3] else: layer_input = prev_layer.output n_in = fully_connected_sizes[i-n_filters-1] if flag==True: W, b = params[i] else: W = None b = None setattr(self, layer_name,FullyConnectedLayer( input=layer_input, n_in=n_in, n_out=n_out, activation=activation, W=W, b=b)) current_layer = getattr(self, layer_name) self.params = self.params + current_layer.params self.L2 += (current_layer.W ** 2).sum() out_shape = [n_images, n_out] print("{} - fully connected - output shape: {}".format(layer_name, out_shape)) in_shape = out_shape #------------------------------------- #Regression layer #------------------------------------- i = n_filters+n_connected_layers layer_name = 'layer_' + str(i) prev_layer_name = 'layer_' + str(i-1) prev_layer = getattr(self, prev_layer_name) n_in = in_shape[1] n_out = n_classes if flag == True: W, b = params[i] else: W = None b = None setattr(self, layer_name,RegressionLayer( input=prev_layer.output, n_in=n_in, n_out=n_out, W=W, b=b)) out_shape = [n_images, n_out] print("{} - regression - output shape: {}".format(layer_name, out_shape)) #------------------------------------- #Setting final attritbutes for the nnet #------------------------------------- current_layer = getattr(self, layer_name) self.params = self.params + current_layer.params self.L2 += (current_layer.W ** 2).sum() self.cross_entropy = current_layer.cross_entropy self.misclass = current_layer.misclass self.scores = current_layer.scores self.predicted_labels = current_layer.predicted_labels #------------------------------------------------------------------------------------------# #BUILDING AND TRAINING NNET #Create an instance of NNet as our classifier along with theano functions for training and testing #------------------------------------------------------------------------------------------# def build_model(data, filter_dims=filter_dims, #[filter_height, filter_width] poolings=poolings, strides=strides, n_filters=n_filters, fully_connected_sizes=fully_connected_sizes, activation=activation, reg=reg, *params): """This function creates an instance of the NNet class taking its inputs from the list of parameters at the top of this code. Args: filter_dims (list of order pairs): height and width of filters for each convolution layer poolings (list of ordered pairs): effective max pooling sizes for each layer (None is no pooling) strides (list of ordered paiers): strides for each convolution n_filters (list of ints): number of filters for each convolution layer fully_connected_sizes (list of ints): number of neurons in each fully connected layer activation (function) reg (float): L2 regularization parameter *params (theano shared tensor): optional input - shared variables with which to initialize the nnet parameters Returns: Nnet instance: classifier theano function: update - performs gradient decent theano function: test - runs one forward pass through the Nnet""" n_classes = data.n_classes image_shape = np.shape(data.images) filter_shapes = [] n_out_channels = [] for n_filter, filter_dim in zip(n_filters, filter_dims): if n_out_channels == []: filter_shapes.append((n_filter, image_shape[1], filter_dim[0], filter_dim[1])) else: filter_shapes.append((n_filter, n_out_channels, filter_dim[0], filter_dim[1])) n_out_channels = n_filter x = T.tensor4('x') #4-D tensor: [n_images, n_feature_maps (channels), image_height, image_width] y = T.ivector('y') #1-D vector of labels learning_rate = T.scalar('learning_rate') momentum = T.scalar('momentum') print("building model") classifier = NNet(input=x, filter_shapes=filter_shapes, strides=strides, image_shape=image_shape, n_classes=n_classes, poolings=poolings, fully_connected_sizes=fully_connected_sizes, activation = activation, *params) loss = classifier.cross_entropy(y) + reg*classifier.L2 misclass = classifier.misclass(y) scores = classifier.scores predicted_labels = classifier.predicted_labels updates = [] for param in classifier.params: delta_param = theano.shared(value=param.get_value()*0., broadcastable = param.broadcastable) updates.append((param,param + delta_param)) updates.append((delta_param, momentum * delta_param - learning_rate * T.grad(loss, param))) print("setting up functions") #theano function which updates parameters via gradient decent update = theano.function( inputs=[x,y,learning_rate,momentum], outputs=[loss, misclass], updates=updates ) #theano function for testing classifier test = theano.function( inputs=[x,y], outputs=[scores, predicted_labels, misclass, loss] ) return classifier, update, test def train_model(update, test, path_train, path_validate, batch_size = batch_size, n_epochs = n_epochs, learning_rate=learning_rate, momentum = momentum): """Performs gradient decent with minibatches and prints performance indicators on training and validation sets. Args: update (theanoo funciton): performs a single paramter update via gradient decent (with momentum) test (theano function): performs a single foward pass of the net path_train (list of strings): locations of each training image path_validate (list of strings): locations of each validation image batch_size (int) n_epochs (int): number of epochs over with to perform gradient decent learning_rate (float) momentum (float) Returns: None - the updated model paramters are captured via the shared theano variables in the instance of Nnet associated with update and test""" n_images = len(path_train) mid=n_images//2 #mid point index of path so we can take equal number of cats and dogs in each batch index_cats = np.arange(mid) index_dogs = np.arange(mid,n_images) data_validate = Data(path_validate, np.arange(len(path_validate)),in_directory=r'C:\Users\Jesse\Desktop\Python\Neural Net\Data\Validate\Munged') for epoch in range(n_epochs): index_perm_cats = np.random.permutation(index_cats) index_perm_dogs = np.random.permutation(index_dogs) tic = time.clock() sum_misclass_train = 0 sum_loss = 0 for i in range(mid//batch_size): terms = np.arange(i*batch_size//2,(i+1)*batch_size//2) index = np.concatenate((index_perm_cats[terms], index_perm_dogs[terms]),axis=0).astype(int) data_train=Data(path_train,index) loss,misclass_train=update(data_train.images,data_train.labels,learning_rate,momentum) print("epoch {}/{} - batch {}/{}".format(epoch+1, n_epochs, i+1, mid//batch_size)) print("loss: {} - misclass: {}".format(loss, misclass_train)) sum_misclass_train+=misclass_train sum_loss+=loss _,_,misclass_validate, loss_validate = test(data_validate.images,data_validate.labels) toc = time.clock() print('--------------------------') print("epoch number {} of {} complete - running time: {} minutes".format(epoch+1, n_epochs, (toc-tic)/60)) print("average loss: {}".format(sum_loss/(mid//batch_size))) print("average training misclass: {} percent".format(sum_misclass_train/(mid//batch_size)*100)) print("validation misclass: {} percent".format(misclass_validate*100)) print("validation loss: {}".format(loss_validate)) print('--------------------------')

**Challenges**

By far, the limiting factor here was computational resources. I don’t have an issue with memory as I am loading my training set in batches, but my final model (5 convolution layers, 2 fully connected layers, and a regression layer – just shy of 100,000 total parameters), took about an hour to run once through the training set. Obviously this makes picking hyperparamters (learning rate, batch size, size and shape of the network) difficult as any changes take many hours to check if performance has improved.

As a side note, I created my net so that the dimensions of the output of each layer would automatically be sized correctly for input into the next. I have since learned that I need to be much more intentional about sizing these inputs and outputs. I found myself wanting to add another layer into a network which I had already trained, but sizing this added layer is challenging without being very intentional about the original dimensions – lesson learned.

Finally, if I wish to greatly improve my model, this will probably require I learn how to leverage my GPU for faster computation (Theano supposedly makes this relatively easy). Training was so tedious that I was limited to how complex I could make my model, and once I surpassed my performance goal, I called it quits. Maybe I’ll revisit this one if I decide to learn how to use my GPU with Theano.