Data Science and Artificial Intelligence

Data Science Glossary- Deep Learning Key Terms

This post presents a collection of Data Science key terms, with concise definitions segregated into definite topics. It is time-consuming for familiarizing with data science terminologies, as these words aren’t used in the routine. However, once you start learning about it and understand its usage, recalling these terms will no more be an intimidating task. I, in this article, have put up some of the key data science terminologies into categories. It is as follows:

Deep Learning Key Terms

To begin a journey in the stream of deep learning without hassle and intimidation is to the ground research about deep learning concepts. Otherwise, Deep Learning terminologies could become quite overwhelming to newcomers. There are quite a few words that will reoccur when one start interacting and working with deep learning applications.

Free Step-by-step Guide To Become A Data Scientist

Subscribe and get this detailed guide absolutely FREE

This glossary outlines the deep learning key terminologies, to help cross the initial hurdles. It’s a hardship to understand and remember these terminologies, so for all the aspiring data scientists out there, I want you to ace in your field of study. I’ve put the definitions of the most commonly used deep learning terminologies in a simple and understandable way.

Activation Function

In the artificial neural network, the role of activation function is to draw boundaries for output decision, by merging the network’s weighted inputs. There are different categories of activation functions in Deep Learning and some of them are as follows:

  • Identity Function (linear)
  • Sigmoid Function (logistic, or soft step)
  • Hyperbolic (tangent)
  • ReLU function (Rectified linear unit) and others

Adam Optimization

It is an extension to Stochastic Gradient Descent,  mostly trains deep learning models. The Adam optimization algorithm uses the running averages of both the gradients and the second moments of the gradients are used. It computes adaptive learning rates for each parameter. Some of its features are as follows:

  • The computational efficiency is remarkable with minimal memory requirements.
  • It is invariant to the diagonal rescaling of the gradients.

Adam works are more compatible in comparison with other stochastic optimization methods

Back Propagation

It is a familiar algorithm which artificial neural networks use for supervised learning with gradient descent. For instance, let us consider an artificial neural network and an error function, this method computes the error gradient function for the weights of the neural network.

Computer Vision

It is a field of computer science that speaks about allowing computers to visualize, process and recognize images/videos in the same way that a human vision does. In the present time, development of deep learning, the upsurge of computational power and a huge amount of image data are the impetus for Computer Vision. Below are some of its  important uses:

  • Detection of Pedestrians, road and other cars by smart (self-driving) cars
  • Object identification
  • Object tracking
  • Motion scrutiny
  • Image renovation

Convolutional Neural Network

It is characteristically associated with computer vicinity and image recognition. Convolutional Neural Networks (CNN) use mathematical ideas of convolution to impersonate the neural connectivity network of the biological visual cortex.

Cost Function

A cost function is a measure of “how good” a neural network did with respect to its given training sample and the expected output. It also depends on variables such as weights and biases. A cost function is a single value, not a vector because it rates how good the neural network did.

The cost function explains and evaluates the error of the model. The mathematical formula for cost function is as follows:


  • h(x) is prediction
  • y is the actual value
  • m is number of rows in the training set

Feedforward Neural Network

It is an artificial neural network and the connection between the nodes of this network is non-cyclic. The information in a feedforward network progresses in a single direction from the input nodes, through hidden layers (if any) to the output nodes and there are not cycles occurs in the process. The feedforward neural network holds the pride of being the pioneering easily accessible type of artificial neural network.

Gradient Descent

It is a developmental algorithm for discovering values of parameters (coefficients) of any function (f) that decreases a cost function (cost). Gradient descent is specifically convenient for functions that are analytically difficult to solve and arrive at precise solutions. For example, assigning derivatives to zero and solving.


It is a parameter whose value is allocated prior to training like machine learning or deep learning. Apparently different models need different hyperparameters and some require nothing. Hyperparameters mustn’t be mistaken with parameters of the model as the parameters mostly assessed or learned from data.

Some of the important features about hyperparameters are:

  • Hyperparameter helps in evaluating model parameters.
  • They are mostly set manually.
  • They assist in altering the performance of a model

Long Short-Term Memory Network (LSTM)

It is network is a category of recurring neural network proficient to learn about directive dependence for categorization estimation problems. LSTMs are a complex area of deep learning. The behavior of LSTM network is in demand for complex delinquent sectors like Machine translation, Speech recognition, and others.

Multilayer Perceptron (MLP)

The multilayer perceptron (MLP) is a feedforward artificial neural network producing cluster of outputs from any given set of inputs. It is a deep learning method, categorized by numerous layers of input nodes connected as a focused graph between the input and output layers. MLP employs backpropagation for training networks.

Neural Network

It matches images to features and much more in the deep learning sector. The special benefit of Neural Networks is their use in hidden layers of weighted functions called neurons. With neural networks, one can effectively build a network that’s capable of mapping multiple functions.


Perceptron is nothing but a modest linear binary classifier. It takes inputs and corresponding weights (on behalf of relative input status), later both are combined to develop an output, for classification.


It is an open source machine learning library for python, built around Torch. It provides flexibility as a deep learning development stage. Below are the key features that are responsible for the extensive use of PyTorch:

  • Simple and accessible API
  •  Python support
  • Interactive computation graphs

Recurrent Neural Network

Unlike feedforward neural networks, the acquaintances of recurrent neural networks usually form a directed cycle. This bidirectional flow allows internal temporal state representation, which, sequentially permits sequence processing, and, it mainly provides the essential competencies for identifying speech and handwriting.

Recurrent Neural Network

Image Source:

Supervised Neural Network

A category of machine learning algorithm which uses known dataset (also called as training dataset) for estimations. The training dataset comprises input data and response values. The supervised learning algorithm with these details, build a model that makes approximations of the response values for novel datasets. The larger the training dataset the higher is the predictive power. This is in use mostly to generalize the new datasets adeptly.


The torch is an uncluttered source of machine learning library. It is built around the Lua programming language. It offers a wide spectrum of algorithms for deep learning.

Unsupervised Neural Network

It is a type of machine learning algorithm which makes implications from datasets entailing input data without labeled retorts. One of the most common unsupervised learning methods is cluster analysis. It is an investigative data analysis to discover unseen patterns or consortium in data.

Vanishing Gradient Problem

In machine learning, the vanishing gradient problem is a struggle found in training some of the artificial neural networks with gradient-based learning approaches and backpropagation. The problem worsens as the layers of architecture increases.

Word Embedding

Word embedding is the common term for a group of language demonstrating and feature learning techniques in natural language processing (NLP). The words or phrases from the jargon are recorded to vectors of factual numerals.


So, there you have it –technical deep learning terms with an elaborative explanation for the beginners. I hope this helps you get your head around some of the complex and important terms you might encounter when you begin to explore deep learning.

I hope the listicle about “Deep Learning Key Terms” will be accessible to serve as a cheat sheet whenever you’re in need of it. This concludes the series of “Data Science Glossary”.


Pavithra Vasist

Pavithra Vasist is a Content Writer working with Aeon Learning Pvt Ltd. She was previously working with MetricFox, a marketing outsourcing firm as a Copy Writer. She holds a bachelor's degree in Electrical and Electronics Engineering. Besides writing, she's fascinated with electronic gadgets and mostly spends her spare time drawing or traveling. She resides in Bangalore.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles