Data Science and Artificial Intelligence

Data Manipulation with Numpy – 1

In this blog, we will be learning the methods to manipulate different data using Python NumPy library. Before moving ahead, let us revise the below theories first.

PYTHON

Free Step-by-step Guide To Become A Data Scientist

Subscribe and get this detailed guide absolutely FREE

Python is a popular programming language. It was created by Guido van Rossum and released in 1991. It was mainly developed for enhancing the readability of the code, and its syntax allows programmers to express concepts in fewer lines of code.

Python features:

  1. It is an Object-Oriented language
  2. It is a high-level language
  3. It is open-source
  4. It supports cross-platform
  5. It can easily be integrated with languages like C, C++, JAVA, etc.

Why Learn Python for Data Science?

Python is the most extensively used scientific language. Matrix and vector manipulation is extremely important for scientific computations. Both NumPy and Pandas are essential libraries used for scientific computations in Python due to their inbuilt functions and high-performance matrix computation capabilities.

Let us now understand NumPy and its working.

NumPy

NumPy stands for ‘Numerical Python’. It is a Python package which provides fast mathematical computations and processing of single dimensional and multidimensional arrays and matrices. NumPy provides various functions which are capable of performing the numeric computations.

Working with NumPy

For illustrating Python code in this blog, we are using Jupyter Notebook IDE which is a browser-based interpreter that allows you to interactively work with Python. We can think of Jupyter as a digital notebook that gives us the ability to execute the command, take notes and draw charts.

NOTE: To start working in Python we need to have Python installed in our systems. To install Python follow these steps:

  1. Go to the website https://www.python.org/download
  2. Download the latest version of Python(current v 3.7.3)
  3. Run the setup and install the same

To install Jupyter we can refer to the below blog:

https://acadgild.com/blog/anaconda-python-tutorial

How to use NumPy

Firstly, load the below library:

 

 

An alias ‘np’ will be created for the namespace.

Now, we should first check its version

 

 

Creating Arrays Using NumPy

There are several ways to create an array in NumPy. To create a simple 1-D array we will execute the below code

a = np.array([1,2,3])
print(a)
type(a) #to get the type of array we are using the attribute ‘type’

 

 

To create a matrix of 3×4 dimension with all ones, we will be using the below code:

np.ones((3,4), dtype = float) #to get the data type of the array we are using the ‘dtype’ attribute

 

 

To create a matrix of 3×4 dimension with a predefined value we will be using the below code:

np.full((3,4), 0.11)

 

 

To create an array with a set sequence we will execute the below code:

np.arange(10,30,5)

array([10, 15, 20, 25])

To create an array of even space between the range (0,1) of values we will using the below code:

np.linspace(0,1,5)

array([0. , 0.25 , 0.50, 0.75, 1. ])

To create an identity matrix, use the below code:

np.eye(3)

 

 

To create an array of uniformly distributed random values between 0 and 1, we will be using the below code:

np.random.random((3,3))

 

 

To create an array of normally distributed random values with mean 0 and standard deviation 1, we will use the following code:

np.random.normal(0,1,(3,3))

 

 

To create an array of random integers in the interval (0, 10), we will be using the below code:

np.random.randint(0, 10, (2,2))

 

 

Now we will see some of the frequently used data types in NumPy.

DataTypes in NumPy

Data type Description
bool_ Boolean(True or False) stored as a byte
int_ Default integer type
Int8, int16, int32, int64 Integer type according to range
uint8, uint16, uint32, uint64 Unsigned Integer type according to range
float_, float16, float32, float64 Float type according to range
complex_, complex64, complex128 Data type for complex number

 

OPERATIONS PERFORMED ON NUMPY ARRAY

We will be performing the following operations on NumPy:

  • Array Attributes
  • Array Indexing
  • Array Slicing
  • Array Reshaping
  • Array Concatenation and Slicing

1.Array Attributes

  • Ndim: displays the dimension of the array
  • Shape: returns a tuple of integers indicating the size of the array
  • Size: returns the total number of elements in the NumPy array
  • Dtype: returns the type of elements in the array, i.e. int64, character
  • Itemsize: returns the size in bytes of each item

For eg: Consider the following array, where we defining 3 random arrays, 1D, 2D and 3D array. We’ll use random number generator, which we will seed with a set value to ensure that same random array is generated each time the code is executed.

import numpy as np
np.random.seed(0)

x1 = np.random.randint(10, size = 5)
x2 = np.random.randint(10, size = (3,3))
x3 = np.random.randint(10, size = (3,4,5))

print(“x3 ndim: “, x3.ndim)
print(“x3 shape: “, x3.shape)
print(“x3 size: “, x3.size)
print(“dtype: ”,x3.dtype)
print(“x3 itemsize:”, x3.itemsize, “bytes”)
print(“x3 nbytes: “, x3.nbytes, “bytes”)

 

 

 

 

The functions of attributes ndim, shape, size, itemsize, nbytes used in the above code has been explained above.

  1. Array Indexing

Indexing in NumPy is similar to indexing in Python as it starts from 0. In an array, value can be accessed by specifying the desired index in square brackets.

x1 = np.random.randint(10, size = 5)
x1

 

 

x1[0] #to fetch the data at the first index

3

x1[4] #to fetch the data at the fourth index

0

x1[-1] #to index from the end of the array we are doing negative indexing

0

In a multi-dimensional array, items can be accessed using a comma separated tuple of indices.

x2 = np.random.randint(10, size = (3,3))
x2

 

 

x2[0, 0] #to fetch the data present at 1st row and 1st column

4

x2[2, -1] #to fetch the data present at 3rd row and last column

9

The values can also be modified using any of the above index notation

x2[0, 0] = 12 # to modify the value of array present at 1st row and 1st column
x2

 

 

  1. Array Slicing

We can use array slicing to access subarrays using the slice notation, marked by a colon(:).

For 1D subarrays:

x = np.arange(10)
x

 

 

x[:5],x[5:],x[4:7]

 

 

In the above code, we have sliced the array before index 5(excluding data for 5th index), after index 5(including 5th index) and middle sub-array(including 4th index and excluding 5th index)

x[::2] #to access every alternate element

array([0, 2, 4, 6, 8])

x[::-1] #to access all elements in a reversed order

array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])

x[5::-2] #reversing alternate array from the 5th index

array([5, 3, 1])

For multi-dimensional arrays:

x2 = np.random.randint(10, size = (3,3))
x2

 

 

x2[:2, :3] # to access 2 rows and  3 columns

 

 

x2[:3,::2] #to access all rows and alternate column

 

 

x2[::-1,::-1] #to reverse the whole matrix

 

 

  1. Reshaping of Arrays

This involves changing the arrangement of items so that the shape of the array changes while maintaining the dimension as same.

x = np.arange(1, 10).reshape((3,3))
print(x)

 

 

To fetch the row vector using the attributes reshape and newaxis

x = np.array([1,2,3])
x

 

 

x.reshape([1, 3])

 

 

x[np.newaxis, :]

 

 

Column vector via reshape and newaxis

x.reshape((3, 1))

 

 

x[:, np.newaxis]

 

 

  1. Array Concatenation and Splitting

By using this operation, we can combine multiple arrays into one and conversely split a single array into multiple arrays.

Concatenation:

Using the following functions we concatenate 2 arrays in NumPy:

  • concatenate
  • vstack(vertical stack)
  • hstack(horizontal stack)

For 1D array:

x = np.array([3,4,5])
y = np.array([6,7,8])
np.concatenate([x, y])

 

 

For 2D array:

a = np.array([[1,2,3],
	            [4,5,6]])
np.concatenate([a,a]) #to concatenate along the 1st axis

 

 

 

np.concatenate([a,a], axis = 1) #to concatenate along the second axis

 

 

For working with arrays of mixed dimensions vstack and hstack are preferred.

x = np.array([1,2,3])
y = np.array([[8,7,6],
              [5,6,4]])
np.vstack([x,y])

 

 

z = np.array([[22],
              [25]])
np.hstack([y,z])

 

 

Splitting:

Use the following function to split arrays in NumPy:

  • split
  • vsplit(vertical split)
  • hsplit(horizontal split)
x = [1, 2, 3, 22, 35, 6, 7, 8]
x1, x2, x3 = np.split(x, [3, 5])

 

 

y = np.arange(9).reshape((3, 3))
y

 

 

up, down = np.vsplit(y, [2])
print(up)

 

 

print(down)

 

 

left, right = np.hsplit(y, [2])
print(left)

 

 

print(right)

 

 

Example: Let us see how the above operations are performed on arrays and matrices with the help of some use cases.

Reshaping:

Problem Statement 1: Reshape numbers from 0 to 5 into 3 rows and 2 columns and store it in an array.

Problem Statement 2: Reshape the same array as used in Q1 with 2 rows and 3 columns

 Problem Statement 3: Reshape the same array to have 3 columns with an unspecified number of rows.

 

Indexing and Slicing

Vector indexing

Indexing a 1D array according to the given statement:

 

 

 

 

 

 

Output:

Matrix Indexing

 

Indexing a matrix according to the given statement

 

 

 

 

 

 

 

Output:

 

 

 

 

 

 

 

We can follow the above steps to work on NumPy array creation and its related operations.

We hope this post clearly explains the concept of data manipulation with NumPy.

In the next blog, we will be discussing more on NumPy and illustrate more examples. Keep visiting our website Acadgild for more updates on Data Science and other technologies.

Enroll for Data Science Masters course conducted by Acadgild and become a successful and professional Data Scientist.

 

Series Navigation<< How Data Science Is Changing The Restaurant Industry.HOW ARTIFICIAL INTELLIGENCE IS IMPACTING INDUSTRIES >>

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close