We have already seen the basic idea behind Recurrent Neural Networks in the previous tutorial. In this tutorial we are going to implement the network on a simple task – sentence generation.
A Quick RecapWe had seen that Recurrent Neural Networks differ from simple networks in the fact that RNNs have additional connections that either connect directly to the same layer or even lower layers (the ones closer to inputs). These non-forward connections are called Recurrent connections. The Recurrent connections have a time-delay (usually one time step when using discrete time), thus making the model aware of its previous inputs.
In a traditional neural network all inputs (and outputs) are assumed to be independent of each other. But if we want to predict the next word in a sentence we better know which words came before it. RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being dependent on the previous computations. We can assume RNNs as Neural Networks that have a “memory” which captures information about what has been calculated so far.
There are 2 example cases of RNNs which have been explained with the help of examples. The first case with the basic level of RNN, has been included at Github repo as ‘Recurrent_Neural_Network_Basic_Case.ipynb’ and is left to the user to understand. The second case specific to text has been shared in the same repository as ‘RNN_Detailed_Case.ipynb’
Let’s start implementing:
As usual, import all the necessary modules.
Next thing we are going to do is read the text file, and convert all the characters in lowercase. This process ensures that we have all the characters in lower case only, and both upper case and lower case characters are treated as equal.
Now we will take unique characters from the whole text (which is basically a set of all the characters initially present), and then convert all of them to integer format. The idea behind converting them to integers lies in the faster and convenient training process.
Now we have 50 unique characters in integer format. The next thing we want to do is create an input for our model. The input for our model is a sequence of characters. For the time being, we are taking sequence length as 100 i.e. we will have a dataset of size= (number of original characters – sequence length). For each input data (100 character sequence), the next one character will be our output.
Now, we have our datasets. It is always a good idea to standardize them. We are going to convert our target variable to one hot encoding format. (Suppose you have ‘Name’ feature which can take values ‘Ram’, ‘Shyam’, and ‘Mohan’. One hot encoding converts ‘Name’ feature to three features, ‘is_Ram’, ‘is_Shyam’, and ‘is_Mohan’ which are all binary). We are also going to reshape and normalize our input variable.
Normalizing refers to the process of making something “standard”., It ensures data redundancy and enhances data integrity.
Now, we have the dataset in standard format. Here comes the most important part of the process i.e. defining the model. We are going to use 1 hidden layer of LSTM network with 256 hidden units and dropout probability of 0.2.
Dropout is a regularization technique where, while you’re updating layers of your neural net, you don’t randomly update, or “dropout,” some of the layers. That is, while updating your neural net layer, you update each node with probability “1-dropout”, and leave it unchanged with probability “dropout”.
We are also using “categorical_crossentropy” as loss function, “adam” as optimizer, and “softmax” as activation function.
You can play around with all these things to have a better idea as to how LSTM network works.
We are going to train our model with 20 epochs and a batch size of 128. You can always increase the number of epochs, as long as the model continues to improve. We are also going to create checkpoints, so that at a later time, the model can be retrieved and used. The idea behind creating the checkpoint is to save the model’s weights while training, so that at a later time, you do not have to go through the same process again.
Now we have weight file saved in the location, where we wanted it to be. We are going to reuse it in prediction. We will choose a random sequence of characters as first input and keep on changing the sequence until we have our desired number of output characters.
In our case we are going to define a sequence of 100 characters as input and will generate the next 100 characters as output.
The result produced is not quite satisfying. But there are two things worth observing here: First, that it has been able to learn a pattern of a number of characters in a single row, and so it keeps changing the row after an average number of characters (the last three lines are output).
Second, regarding a pattern of correct words like “ice”, “the”, “added” and “then” are real words. But, many other words are not making sense. This limitation could be overcome by building an LSTM network with more than one layer, increasing the number of epochs and increasing the length of the dataset.
Training large datasets using CPU takes too much time. That’s why using GPU has been almost inevitable, very important for quickly training deep-learning models.
Training a Recurrent Neural Network is a fun exercise. The same algorithm could be extended for many other exercises like music generation, speech generation etc. It can also be efficiently extended to real life applications like “video captioning” and “language translation”.
I hope you enjoyed this article and are ready to make your own text generating system. Generating music lyrics will be a fun exercise, if you are considering doing it.