All CategoriesCareers

Introduction to Data Structures

Data structure is about structuring or arranging and storing of data in such a way so that it can be accessed and used efficiently. 

This defines the relationship between data and the operations that can be performed on the data. 

In this tutorial, we will be discussing Data Structures and Algorithms. As we know Data Structure is an essential and important topic to learn, therefore we have tried to keep this article as basic as we could.

So let us dive into in detail. 

The main idea behind Data structure is to 

  • reduces the memory usage of a Data Structure operation i.e., space complexity
  • reduces the execution time of operations on Data Structure i.e., the time complexity

Data structures and algorithms that we will see later in this blog are concepts that are independent of language. Therefore once we master it in any one language, it’s relatively easy to switch to another one, though it differs in the built-in methods and APIs for different languages.

Therefore, the data structures and algorithm as concepts are the same across languages, the implementation, however, varies greatly.

For Example, the most common use of Data Structure is a Dictionary, where all the words are arranged in alphabetical order that is, in a sorted manner which makes the searching of words to find its meaning, very easy.

We have two types of Data Structures:

  • Linear Data Structure
  • Non-Linear Data Structure

Linear Data Structure

A data structure is said to be linear if the elements are arranged in a linear and sequential order.

Data items can be traversed in a single run. The implementation of this type of Data structure is easy.

Common Examples of Linear Data Structure are:

  1. Arrays
  2. Queues
  3. Stacks
  4. Linked lists

Non-Linear Data Structure

A data structure is said to be non-linear if the elements are arranged in a non – sequential order. 

Data cannot be traversed in a single run and also the implementation is difficult. 

Examples of Non-linear Data Structure are:

  1. Trees
  2. Graphs

Why do we need Data Structure?

  1. Allows easier processing of data.
  2. Data Structures are important for designing efficient algorithms.
  3. Secure way of storing data.
  4. We can access data anytime and anywhere.

Execution Time Cases

There are three cases which are used to compare Data Structure’s execution time:

  • Worst Case: The scenario where a specific Data structure operations take the maximum time it can take. 
  • Best Case: The scenario that takes the least execution time to perform a Data Structure’s operation.
  • Average Case: This is the scenario that depicts the average execution time of the operation of a Data Structure. 

Algorithm

An important aspect to Data Structures are the Algorithms as Data structures are implemented using Algorithms. It is generally described that Data Structure + Algorithm = Programs. 

An algorithm is a step-by-step procedure, which defines a set of instructions to be executed in a certain order to get the desired output.

Algorithms can be implemented in more than one programming language.

A data structure is a systematic way of organizing and accessing data, and an algorithm is a step-by-step procedure for performing some task in a finite amount of time.

Operations performed on Data Structure

There are various operations performed on Data Structures. Few of them are listed as:

  • Sorting:  arranging the data elements of a data structure in a specific order is called sorting
  • Searching: searching for a particular element in a data structure is called searching
  • Merging:  combining elements of two similar data structure to form a new data structure of the same type is called merging
  • Insertion: Inserting or adding a new element to a data structure 
  • Deletion: removal of an element from a data structure if present is called deletion
  • Traversal: processing all the elements present in a data structure is called Traversing

Asymptotic Notations 

Before writing a program we create a blueprint in the form of Algorithms that serves as a flowchart depicting the steps according to which we could implement the program. 

There could be various approaches to solve a particular problem, we can call these approaches as algorithms. 

Among these approaches, we choose the best algorithm based on its time and space complexity. 

To represent these complexities Asymptotic Notations are used. These are expressions that allow us to analyze an algorithm’s time and space complexities by identifying its behavior as the input size(n) for the algorithm increases. This is also known as Algorithm’s Growth Rate.

Using Asymptotic Notation we can very well conclude the Best Case, Average Case and Worst-Case scenario of an algorithm. 

Types of Asymptotic Notations

There are commonly three types of Asymptotic Notation used for calculating the running time complexity of an algorithm.

  1. Big Oh(O) Notation → it is the asymptotic notation for the worst case for a given expression and is represented by O. It gives us an asymptotic upper bound for the growth rate of the runtime of an algorithm.

It measures the maximum amount of time an algorithm can take to complete its operation.

Common Big O notations:

  • O(1):  The big O notation O(1) represents the complexity of an algorithm whose execution time is constant or same for operations regardless of the input data.
  • O(n):  The big O notation O(n) represents the complexity of an algorithm whose performance is directly proportional (grows linearly) to the size of the input data.
  • O(n^2):  The big O notation O(n^2) represents the complexity of an algorithm is directly proportional to the square of the size of the input data.
  • O(log n): The big O notation O(log n) basically means time goes up linearly while ‘n’ goes up exponentially.  

2. Omega Notation → it is the asymptotic notation for the best case for a given expression and is represented by the symbol Ω. It expresses the lower bound of an algorithm’s running time.

In other words, if we represent the time complexity of an algorithm in the form of Ω, it means that the algorithm will at least take this much time to complete its execution, it can take more than this too.

3. Theta Notation → it is the asymptotic notation, which is represented by the symbol Θ.  This is used to denote the asymptotically tight bound that is both the upper and the lower bound of an algorithm’s running time.

The theta notation Θ represents the average running time that lies between the best and the worst cases.

Sorting techniques

The sorting operation mentioned above can be done in many ways. These various sorting techniques are analyzed for the best, worst and the average.

As we know sorting refers to arranging data in a particular format.

These sorting techniques are:

  • Bubble sort: this is the simplest of sorting techniques. This is a comparison based algorithm, in which for a given array, each element compares with the adjacent element in the array and the elements are swapped if not in order.

Bubble sort has worst-case complexity: O(n2) and best case complexity: Ω(n)

where n is the number of elements being sorted.

Let us see the working of the Bubble sort:

Bubble sorting starts from the first element present at the 0th index, compares it with the adjacent element to check which one is greater and swaps to keep it in ascending order and the process continues.

Let’s say we have an array with 5 elements

156229010

In this case, sorting starts from the first two elements, compares themselves, here 15 is greater than 6, therefore the two element gets swapped.

615229010

Now the second element compares itself with the third since they are in sorted order already, no swapping is done.

615229010

Next, we compare the next two values, that is, 22 and 90, since they are in sorted order, no swapping is done.

615229010

Then we move on to the last two elements, compare the two values and swap if needed.

After the first iteration the sorting looks like this:

615221090

After the second iteration it looks like this:

615102290

After the third iteration it looks like this:

610152290

And when no swapping is required, the array is completely sorted and bubble sorting is done.

  • Selection sort: in this sorting algorithm the first element of the array is selected and is compared with all the other elements in the array. If any other element in the array is less than the first element, then swapping of those two elements takes place.

Selection sort has worst-case complexity: O(n2) and best case complexity: Ω(n2)

Let us see the working of selection sort:

Taking the same array of elements as above

156229010

We select the first element that is, 15 and compares it with the rest of the element. Here we found 6 to be the lowest value and hence swaps the two elements.

615229010

After the first iteration, the minimum value in the array that is 6 appears to be in the first position

615229010

For the second iteration, the second element 15 is selected and is compared to the rest of the elements in the list, 10 is found to be the least and hence the two get swapped. The array looks like this

610229015

The same process continues and after the third iteration the array looks like this

610159022

After the fourth iteration, the array looks like this and is finally in the sorted order.

610152290
  • Insertion sort:  It is another comparison based sorting algorithm. Insertion sorting starts from index 1 that is the key. This key element is compared to the elements to its left. 

If the key element is less than the element at the left, we swap the two and if the key element is greater than the element at the left we left it as it is. 

Then we make the element at index 2 as the key and compare the element at its left and do the swapping accordingly until it gets sorted and repeats the same procedure until the array is sorted.

Insertion sort has worst-case complexity: O(n2) and best case complexity: Ω(n)

Let us take an array with 4 elements as below:

7452

We make the element at index 1 that is, 4 as the key and compares it with the value at the left. Since 4 is less than 7 hence the two gets swapped. 

4752

Element at the 2nd index that is 5 is the key now. Comparing it with the values at the left that is 7, since 7 is greater than 5 and 4 is less than 5, so no change in position of 4 and 5 will be moved to the position of 7.

4572

The last element that is 2 is the key now. Comparing it with all the elements at the left we found all the elements are greater than 2, so all the elements will move forward and 2 will be shifted to position of 4.

2457

Hence the above array results in the final sorted array.

  • Merge sort: this is the sorting technique that is based on divide and conquer algorithm. In this algorithm, sorting is done by first dividing the array into equal halves and then combining it in a sorted manner.

Merge sort has worst-case complexity: O(n log(n)) and best case complexity: Ω(n log(n))

Let us see the working of this algorithm.

We take an unsorted array of length 8 as follows:

774165332210460

We will divide the array into two equal halves, which will make two arrays with 4 elements each.

77416533
2210460

Now we will further divide the two arrays into their respective halves

7741
6533
2210
460

We will further divide it until it can no more be divided and we get the atomic values

77
41
65
33
22

10
4
60

Since they are completely broken down into atomic values, we will now combine them in exactly the same manner as they were divided

While combining we will first compare the two elements being combined and sort them. Therefore while combining the elements 77  and 41 since 77 is greater hence we swap the two elements. 

Likewise while combining 65 and 33, since 65 is greater we swap the two. Similarly, 22 and 10 are swapped whereas 4 and 60 remain as it is. Hence the combined array looks as below.

4177

3365

1022

460

Now in the second iteration, we will compare lists of two data values, and merge them into an array of 4 each

33416577

4102260

And after final merging, the list looks like this

410223341606577

  • Quicksort: It is a recursive sorting method that keeps calling itself. It is based on divide and conquer algorithm and is efficient for large data sets.

In this, an element is selected as a Pivot value and two subarrays are created one for storing values less than the pivot value and other for storing values larger than the pivot value.

The main question arises here is which element to be picked up as pivot. The value of pivot is generally chosen based on some logic and implementation. There are four possibilities:

  1. Selecting first element of the list as the pivot
  2. Selecting last element of the list as the pivot
  3. Selecting a random element as pivot
  4. Selecting median as the pivot

However we prefer to choose the last element of the list as Pivot.

Quicksort has worst-case complexity: O(n2) and best case complexity: Ω(n log(n))

Let us see the working of the same.

We have an array with element as follows.

29604419305957

Here we will take the last element(70) as Pivot. Now we have to do partitioning of the array in such a way such that the pivot comes to its correct position in a sorted array and all the elements smaller than the pivot comes to the left side of the pivot and elements larger than the pivot comes to the right side of the pivot. This method is also known as Partitioning. We will continue to do this until we get the sorted list.

We will take two variables i and j, i will be used to calculate the final position of the pivot and j is used to iterate through the array.  Low is the starting index and high is the ending index(the index of pivot). 

Initially, i and j are initialized to low-1 and low respectively.

It will start by comparing each value of j with the pivot. j is in the range from low to high.

If arr[j] <= pivot

i++

arr[i] = arr[j] 

For every time the value of j is less than or equal to value of pivot, i is incremented by 1 and value at index if i is swapped with the value at the index of j. Note that no swapping is done if the i=j.

These steps are repeated as long as j<=high-1. Now the control comes out of the iteration and pivot is swapped from arr[i+1] to arr[high]. 

In the above array that we have taken to perform quick sort on the pivot value is array[high] = 57. 

i = -1 and j = 0, 29 <= 57, i = 0, no swap(since i=j)

I = 0 and j = 1, 60 >57, no swap

i = 0 and j = 2, 44<=57, i = 1, swap 60<>44

29446019305957

i = 1 and j = 3, 19<=57, i = 2, swap 60 <> 19

29441960305957

i = 2, j = 4, 30<=57, i = 3, swap 60<>30

29441930605957

i = 3, j = 5, 59>57, no swap

We come out of the loop because j is now equal to high.

Finally, we place pivot at the correct position by swapping

arr[i+1] and arr[high]

So since i = 3, we will replace the pivot value 57 with i+1 that is, at i= 4.

29441930575960

Now the pivot value 57  is placed at the correct position with all values smaller to it has been moved left to it and all values larger than it is placed the right side of it.

But we see the array is still not sorted. Hence we will again split the array into two parts. That is: 

29441930

And 

5960

We will again perform partitioning on these two halves until we get the sorted list.

Finally, the sorted array would be:

19293044575960

The Quicksort pictorial view :

Searching Techniques: The searching operation mentioned above can be done in many ways. The two common ways of searching are:

  • Linear Search: this is the simplest method for searching, in this technique of searching the element to be found is searched sequentially in the list. This operation can be performed both on the sorted and unsorted list.  

Searching is done from the 0th index and is traversed through the complete list until the element is found or the end of the list is reached.

Linear Search has worst-case complexity: O(n) and best case complexity: O(1)

  • Binary Search: binary search the fastest search algorithm. This algorithm works on the principle of divide and conquer. It requires the list to be in sorted order. 

Given a sorted array of n elements, to search an element we compare it with the middle element of the sorted list. If the element matches with the middle element the search is successful, otherwise the list is divided into two equal halves, one from the 0th element to the middle element and another from the center element to the last element. 

The searching mechanism proceeds from either of the two halves, if the value of the search key is less than the item in the middle of the interval then it checks on the first half of the division otherwise in the second half of the division. 

Binary Search has worst-case complexity: O(log n) and best case complexity: O(1)

This brings us to the end of this blog. So far we have only read about the Data Structure, Algorithms, Sorting and Searching technique. A lot more is still left to cover like the types of Data Structure and its implementation. So stay tuned for our next blog.

Do leave us a comment for any query or suggestions.

Keep visiting our website for more blogs on Data Science and Data analytics.

Happy Learning 🙂

Series Navigation<< What Are the Skills Needed to Be a Data Scientist? | People Ask Google

Mitali Singh

Python|| Machine Learning|| Statistics|| Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close