“What is this picture of?” You ask a friend this question and he will reply instantly. However, in the past it seemed impossible for computers to do this, Until a few years ago when Deep Learning and the use of GPUs (Graphical Processing Units) changed the way we looked at this problem.
Google Photos’ tagging feature is no longer unknown to anyone. I won’t say Google always gets it right, but I think it does quite well most of the time. And, what is particularly fascinating is that when it does get something wrong, the mistakes it makes seem surprisingly all too human.
Let’s split the task of Image Identification into two parts, where we will first make use of a pre-trained model to check how it performs on our given test dataset, for which we already know the contents of the image. In the second case, we will make use of the famous MNIST dataset, train our model on it and then finally identify the content in the images.
The detailed notebook with the required code is present at the following link : Github repo
Case 1 : Pre-Trained VGG Model
Once trained, any network can be used as inference, or to make predictions about the data it is trained on. Inference happens to be a much less computation-intensive process.
VGG-16 is a pre-trained model which we are going to use., Anybody can download and use pre-trained models without having to master the skills necessary to tune and train these sets of models. Loading the pre-trained models and using the model for prediction is relatively straightforward.
VGG16 is a Convolutional Neural Network model proposed by K. Simonyan and A. Zisserman in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition” (refer here). VGG refers to a Deep Convolutional Network for object recognition developed and trained by Oxford’s renowned Visual Geometry Group (VGG), which achieved very good performance on the ImageNet dataset. This model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. VGG-19 is another such structure with 19 layers. VGG-16 has 16 layers.
Here is how a VGG-16 structure looks. You can probably observe all 16 layers.
In this part we will use the pre-trained weight of VGG-16 to build an Image identification model. So, let’s get started.
As usual, import all the necessary modules.
Since we are using the pre-trained weights all we have to do is import the weight and define the architecture (i.e. Convolutional Layer, Maxpool Layer or Fully Connected Layer).
What You Can Do With The Model
Sometimes, you want to freeze the weight for the first few layers so that they remain intact throughout the fine-tuning process. Say you want to freeze the weights for the first 10 layers. This can be done by the setting layer.trainable=False for first 10 layers.
You can play around the model by changing the parameters. I would advise you to do that, it will be a good learning process.
The next step is to resize the image that we are providing as input so that it can be tested upon the build model. Generally, in a simple Convolutional Neural Network all the training and test images are of same size.
Now we will compile the model using Stochastic Gradient Descent Optimizer and Categorical_Cross-Entropy as Loss function. You can play around with the learning rate if you want.
We will format the output so that it can be read clearly. We will limit ourselves to 10 best guesses.
Now finally we will visualize the result.
This is the input that we have given for prediction.
And here is what our prediction looks like:
This seems quite an accurate prediction because the image does look like Egyptian cat, which is the best guess in this case.
Well, result seems quite fascinating but I know you are not happy. Using someone else’s model to predict something is not a big deal, Right? And probably this is not why you came in the first place to read this blog. Implementation of Convolutional Neural Network means implementation, and nothing else!
Alright, I understand. Now we are going to build our first toy Convolutional Network from scratch.
Case 2 : Training Model on MNIST
The dataset that we are going to use for building the model is an MNIST dataset. The MNIST database is one of the most widely used datasets for the toy implementation of CNN among people who want to try learning identification techniques and pattern recognition methods. It consists of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
You can download this dataset from one of the following sources or directly load using MNIST module available in tensorflow or keras.
It has been quite some time since we are implementing the known theory into models – first in Artificial Neural Network and now in Convolutional Neural Network. So, you must be familiar with what the first step should be.
You guessed it right – the first step is to import the necessary modules.
The next step is to read the downloaded dataset and split it into train and test set. If you have downloaded the data manually you can use the below code or else you can directly import the data using the MNIST module available in tensorflow or keras provided by C ase 1 .
Now that we have the dataset, we are going to reshape it so that it will be taken as valid input in a tensorflow placeholder. At the same time, we will normalize X_train and X_test.
If we didn’t scale our input training vectors, the ranges of our distributions of feature values would likely be different for each feature, and thus the learning rate would cause corrections in each dimension that would differ from one another-which will ultimately make finding optimal parameters difficult and thus will finally affect the performance). Also we will convert Y_train and Y_test to dummies to one hot encode the categorical variable into binary form (Instead of 1 column will now have 10 columns with row filled by 1 corresponding to correct category and rest all zeros).
Now we have the dataset in the correct format that can be fed into the Convolutional Neural Network.
We have 60,000 images for training. This is quite a large dataset. It will not be a good idea to feed the input individually because of the time it will take. To solve this problem, we can create batches of images. Let’s say we have defined the batch size as 50 – what this implies is that, at one go, we will feed 50 images into the model to ensure faster array calculation and thus faster training time. When you import the dataset using the MNIST library you do not have to explicitly define the next batch function. You can use:
But since we have loaded the dataset manually we have to define this function.
Next, we will define weight and bias variable and also convolutional and maxpool layers.
We will create the placeholder for input and label and convert image to a 4D tensor.
Using the functions defined in Step 5, we will create weight and bias variables for all the layers i.e. convolutional layer, maxpool layer and fully connected layer.
Now comes the most important part of creating the network. We must define the loss function and optimizer function. By optimizing the loss function, the model starts learning (training). For this purpose we are going to use cross_entropy as loss function and AdamOptimizer as optimizer.
Now we are all set to go. We will train the model for 1,000 epoch and print the result (train accuracy) after every 100 steps.
Now; that the model is trained. We will test the model on 10,000 dataset and see the result both for training and testing.
96% accuracy on the training set can be considered good. If you want, you can play around with other parameters to see if this result improves.
This was a simple implementation of Convolutional Neural Network. I hope you understood the basic idea and will be able to build your own model on different datasets. To dive deep into mathematics and proper understanding of Convolutional Neural Network you can refer this and solve the assignment.
In the next article we will build Image search Engine using Convolutional Neural Network.