You saw the title of the blog and probably you might be eager to know more about backpropagation, gradient descent, application to datasets in practice and what not. However, before we get there, don’t you think it’s a good idea to see how neural nets originated in the first place because as they say don’t become a mere recorder of facts, but try to penetrate the mystery of their origin.
The history of the neural network:
History of neural network time travels to early 1943. McCulloch and Pitts (1943) introduced the concept of “artificial neuron” based on their understanding of neurology. It was an extremely simple artificial neuron. The output of this model was either a zero or a one. Here is a simple model proposed by them.
Fig. 1- http://aishack.in/tutorials/artificial-neurons-mccullochpitts-model/
W represents the weight, weight of 1 represents excitatory input and weight of -1 represents inhibitory input. X1, X2, and X3 are the inputs of the model. The sum was calculated by multiplying the inputs with the corresponding weights.
Sum = x1w1 + x2w2 + x3w3 +…
This sum was called weighted sum. Then a threshold value was chosen and if the weighted sum was greater than threshold, output was 1 otherwise 0.
After that many people attempted to produce the replica of biological neurons, however they witnessed little or no success. Belmont Farley and Wesley Clark of the MIT succeeded in running the first simple neural network in the year 1954. Farley and Clark were able to train networks containing at most 128 neurons to recognize simple patterns.
Rosenblatt (1958) took considerable interest in the field and he succeeded to design and develop the Perceptron. According to Rosenblatt Perceptron contained three layers. Out of three layers the middle layer was known as the association layer. This system was able to connect or associate a given input to a random output unit.
Fig. 2- https://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.perceptron.jpg
This model was capable to learn how to sort simple images into categories such as triangles and squares.
Despite poor public interest and minimal funding, several researchers are religiously and consistently working to develop models for problems such as pattern recognition and other similar issues
In 1974 Paul werbos in his famous paper Backpropagation through time: What it does and how to do it came up with the idea of Back-propagation. It is probably the most well-known and widely applied algorithm of the neural networks till today.
Fig. 3- http://www.scaruffi.com/mind/ai.html
In the late 1970s and early 1980s, comprehensive books and conferences provided a forum for people in diverse fields with specialized technical languages. Funding became available throughout Europe, Japan, and the US. This laid pavement to a vibrant emerging field of neural networks. In 1979 Kunihiko Fukushima came up with the groundbreaking idea of Convolutional neural networks (see). It proved to be extremely useful in the cases dealing with visuals like image identification etc.,
In 1998, Sutton R.S and Barto A.G proposed the concept of Reinforcement learning (see). Reinforcement learning is one of the most active research areas in the field of artificial intelligence today, it works on reward-based mechanism. It rewards the model if it performs better otherwise impose a penalty; ultimately model learns its task by trial and error method.
The evolution was consistent throughout the journey. As computational power increased tremendously due to the advent of distributed and GPU systems, neural networks emerged at a swift pace and gained the shape that we see today.
A Resurgence of the neural network:
Fig. 4- https://s3.amazonaws.com/re-workproduction/post_images/207/numbers/original.jpg?1466618950
While of all this was happening, neural network never becomes major tool in the industries and was limited to Research. There were multiple issues with backpropagation algorithm if someone wanted to extend the neural network beyond two to three hidden layers it often got stuck in local optima. The fact that it was not capable of performing amidst multiple hidden layers well. This led to poor results in benchmark datasets. Also, there was another major issue with the initiation of weights in a deep network. There were many limitations to use neural network and one amongst them was the availability of data. It was very limited and nobody bothered to collect and store the data because nobody was familiar with the kind of benefits they can get by analyzing the data.
Now that companies and Industries have realized the benefits of collecting the data. There is Availability of large unstructured training data sets what we call today as Big data. With the multidimensional grids of GPU’s, which packs thousands of relatively simple processing cores on a single chip. It enables smooth working of 10, 25, even 50-layer networks of today compared to one-layer networks of the 1960s and the two- to three-layer networks of the 1980s. This concept is popularly known as “deep learning”. “Deep” in “deep learning” refers to – the depth of the network’s layers. Currently, deep learning is known as best-performing systems in almost every area of artificial-intelligence research. Large-scale networks can be built and trained in very small time. All this has led to ground-breaking improvements across a variety of applications. This includes image classification, video analytics, speech recognition and natural language processing. We can’t connect the dots looking forward; we can only connect them looking backward. So while we discuss the happenings in the past and tried to connect them, all of all of it made sense. We also realized that those small contributions had a huge influence on making the deep learning so deep.
Large companies like Google and IBM are betting big on deep learning with the kind of data set they have and what they are gaining from it. More and more companies are looking to provide smarter solutions for their customers. The explosion of new AI related companies that are delivering these solutions are emerging rapidly. Industries, if chose to not to be left behind, have to adapt to this change. Fraud detection, medical diagnostics, personal assistants, the defense you name the field and deep learning has certainly created an impact there. The future belongs to it and there is no denial of this fact.
Why the name “deep learning”?
Before we start with what deep learning does? Let’s break down and understand what we mean by the two terms “deep” and “learning”. We will familiarize with the term “deep” later. First, let’s understand the meaning of “learning”. Here “Learning” stands for learning through “Artificial Neural Network” about which we are talking about for so long and probably there would be a picture in your mind of how does it look like.
So here is a simple artificial neural network.
Fig. 5- https://www.tutorialspoint.com/artificial_intelligence/images/atypical_ann.jpg
To put it simply, it is a rough mimic of neurons in the brain. This means that inputs are passed from the network and finally it processes those inputs and gives us the final output (more on this part later).
Now coming to the second part of the question what does “deep” stands for?
If you have a minimal awareness about technological terminologies, Then there are chances that you have probably come across this term quite a few times. Over the past two years, it’s a”the trending word”d being tossed around a lot, and it’s something that has seized everybody’s curiosity.
We hear the term “deep” and instantly get intimidated by it. However, it’s not as intimidating as it sounds. Ok, now you will be probably be thinking and saying enough beating around the bush- tell me what is it? So here it goes.
The deep neural network is simply a feed forward network with many hidden layers. Isn’t it simple to understand? Or does it still sound intimidating?
This is more or less all I have to say about the definition. The only thing that I have to add is- neural networks can be recurrent or feedforward.
Fig. 6- https://www.researchgate.net/profile/Wim_De_mulder/publication/266204519/figure/fig5/AS:[email protected]/Recurrent-versus-feedforward-neural-network.png
What is Feedforward and Recurrent Networks?
Feedforward network do not have any loops and can be organized in layers, while recurrent neural networks have loops in their graphs.
How deep is your “deep”?
If there are “many” layers in the network, then we say that the network is deep. The question that should be flashing through your mind right now is how many layers does a network have to have in order to qualify as deep?
Well, there is no definite answer to this question. It’s the same question like asking how many hours do I have to read to g qualify to a particular exam. Only one thing is definite here- a network with only a single hidden layer cannot be called “deep” and is conventionally called “shallow”. While a network comprising of two or more hidden layers counts as deep. Now 10 years down the line it may happen that a network with two or three layers will be called shallow and a network with ten and more layers will be called “Deep”. “Deep” and “shallow” are relative terms and can and will change with the change of reference.
Fig. 7- https://media.licdn.com/mpr/mpr/AAEAAQAAAAAAAAhlAAAAJDAzNjhmZjdiLWNjMzktNGYzYi04ZmYyLWQ3Y2Y1NDkxYzA1MA.png
While the logic behind the artificial neural network and deep learning is fundamentally same but this does not convert into the fact that the two artificial neural networks combined together will perform similarly to that of deep neural network when trained using the same algorithm and training data.
So what differentiates deep neural nets from ordinary networks? One of the main difference between deep neural networks and simple artificial neural networks is the way we use backpropagation. Usually, backpropagation trains later layer more efficiently than it trains earlier layers–as we go back into the network, the errors get smaller and more diffuse.
So what we do in the deep neural network is we first try to solve the problem of building ordinary good first layer, and then try to solve the problem of building a good second layer eventually, we’ll have a deep feature space that we can feed into our actual problem.
What makes these deep neural nets so useful is its capability of discovering latent structures (what we call feature learning) from within vast majority of unlabeled and unstructured data i.e. big data.
Fig. 8- https://blog.algorithmia.com/wp-content/uploads/2016/11/deep-learning-importance.png
Also the fact that deep learning excels on problem domains where the input data is in the form of images (pixel data) or documents (text data) or files (audio, video data) instead of conventional tabular form making it unique and useful.
This was the first part in the series of “Deep learning: Imagine the unimaginable”. I hope this write-up has cleared some of your doubts about -why Deep Learning is called “Deep” and at the same time have given some flavor of Neural Network’s history. Nothing will make me happier than knowing the fact that, this article has helped you in some way.,Do share your feedback. ☺
We will continue this series with the next part of the series as “Applications of Deep Learning in today’s world” coming soon; till then adios. Stay tuned.