In this blog, we will be learning the methods to manipulate different data using Python NumPy library. Before moving ahead, let us revise the below theories first.
PYTHON
Python is a popular programming language. It was created by Guido van Rossum and released in 1991. It was mainly developed for enhancing the readability of the code, and its syntax allows programmers to express concepts in fewer lines of code.
Python features:
- It is an Object-Oriented language
- It is a high-level language
- It is open-source
- It supports cross-platform
- It can easily be integrated with languages like C, C++, JAVA, etc.
Why Learn Python for Data Science?
Python is the most extensively used scientific language. Matrix and vector manipulation is extremely important for scientific computations. Both NumPy and Pandas are essential libraries used for scientific computations in Python due to their inbuilt functions and high-performance matrix computation capabilities.
Let us now understand NumPy and its working.
NumPy
NumPy stands for ‘Numerical Python’. It is a Python package which provides fast mathematical computations and processing of single dimensional and multidimensional arrays and matrices. NumPy provides various functions which are capable of performing the numeric computations.
Working with NumPy
For illustrating Python code in this blog, we are using Jupyter Notebook IDE which is a browser-based interpreter that allows you to interactively work with Python. We can think of Jupyter as a digital notebook that gives us the ability to execute the command, take notes and draw charts.
NOTE: To start working in Python we need to have Python installed in our systems. To install Python follow these steps:
- Go to the website https://www.python.org/download
- Download the latest version of Python(current v 3.7.3)
- Run the setup and install the same
To install Jupyter we can refer to the below blog:
How to use NumPy
Firstly, load the below library:
An alias ‘np’ will be created for the namespace.
Now, we should first check its version
Creating Arrays Using NumPy
There are several ways to create an array in NumPy. To create a simple 1-D array we will execute the below code
a = np.array([1,2,3]) print(a) type(a) #to get the type of array we are using the attribute ‘type’
To create a matrix of 3×4 dimension with all ones, we will be using the below code:
np.ones((3,4), dtype = float) #to get the data type of the array we are using the ‘dtype’ attribute
To create a matrix of 3×4 dimension with a predefined value we will be using the below code:
np.full((3,4), 0.11)
To create an array with a set sequence we will execute the below code:
np.arange(10,30,5)
array([10, 15, 20, 25])
To create an array of even space between the range (0,1) of values we will using the below code:
np.linspace(0,1,5)
array([0. , 0.25 , 0.50, 0.75, 1. ])
To create an identity matrix, use the below code:
np.eye(3)
To create an array of uniformly distributed random values between 0 and 1, we will be using the below code:
np.random.random((3,3))
To create an array of normally distributed random values with mean 0 and standard deviation 1, we will use the following code:
np.random.normal(0,1,(3,3))
To create an array of random integers in the interval (0, 10), we will be using the below code:
np.random.randint(0, 10, (2,2))
Now we will see some of the frequently used data types in NumPy.
DataTypes in NumPy
Data type | Description |
bool_ | Boolean(True or False) stored as a byte |
int_ | Default integer type |
Int8, int16, int32, int64 | Integer type according to range |
uint8, uint16, uint32, uint64 | Unsigned Integer type according to range |
float_, float16, float32, float64 | Float type according to range |
complex_, complex64, complex128 | Data type for complex number |
OPERATIONS PERFORMED ON NUMPY ARRAY
We will be performing the following operations on NumPy:
- Array Attributes
- Array Indexing
- Array Slicing
- Array Reshaping
- Array Concatenation and Slicing
1.Array Attributes
- Ndim: displays the dimension of the array
- Shape: returns a tuple of integers indicating the size of the array
- Size: returns the total number of elements in the NumPy array
- Dtype: returns the type of elements in the array, i.e. int64, character
- Itemsize: returns the size in bytes of each item
For eg: Consider the following array, where we defining 3 random arrays, 1D, 2D and 3D array. We’ll use random number generator, which we will seed with a set value to ensure that same random array is generated each time the code is executed.
import numpy as np np.random.seed(0) x1 = np.random.randint(10, size = 5) x2 = np.random.randint(10, size = (3,3)) x3 = np.random.randint(10, size = (3,4,5)) print(“x3 ndim: “, x3.ndim) print(“x3 shape: “, x3.shape) print(“x3 size: “, x3.size) print(“dtype: ”,x3.dtype) print(“x3 itemsize:”, x3.itemsize, “bytes”) print(“x3 nbytes: “, x3.nbytes, “bytes”)
The functions of attributes ndim, shape, size, itemsize, nbytes used in the above code has been explained above.
- Array Indexing
Indexing in NumPy is similar to indexing in Python as it starts from 0. In an array, value can be accessed by specifying the desired index in square brackets.
x1 = np.random.randint(10, size = 5) x1
x1[0] #to fetch the data at the first index
3
x1[4] #to fetch the data at the fourth index
0
x1[-1] #to index from the end of the array we are doing negative indexing
0
In a multi-dimensional array, items can be accessed using a comma separated tuple of indices.
x2 = np.random.randint(10, size = (3,3)) x2
x2[0, 0] #to fetch the data present at 1st row and 1st column
4
x2[2, -1] #to fetch the data present at 3rd row and last column
9
The values can also be modified using any of the above index notation
x2[0, 0] = 12 # to modify the value of array present at 1st row and 1st column x2
- Array Slicing
We can use array slicing to access subarrays using the slice notation, marked by a colon(:).
For 1D subarrays:
x = np.arange(10) x
x[:5],x[5:],x[4:7]
In the above code, we have sliced the array before index 5(excluding data for 5th index), after index 5(including 5th index) and middle sub-array(including 4th index and excluding 5th index)
x[::2] #to access every alternate element
array([0, 2, 4, 6, 8])
x[::-1] #to access all elements in a reversed order
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
x[5::-2] #reversing alternate array from the 5th index
array([5, 3, 1])
For multi-dimensional arrays:
x2 = np.random.randint(10, size = (3,3)) x2
x2[:2, :3] # to access 2 rows and 3 columns
x2[:3,::2] #to access all rows and alternate column
x2[::-1,::-1] #to reverse the whole matrix
- Reshaping of Arrays
This involves changing the arrangement of items so that the shape of the array changes while maintaining the dimension as same.
x = np.arange(1, 10).reshape((3,3)) print(x)
To fetch the row vector using the attributes reshape and newaxis
x = np.array([1,2,3]) x
x.reshape([1, 3])
x[np.newaxis, :]
Column vector via reshape and newaxis
x.reshape((3, 1))
x[:, np.newaxis]
- Array Concatenation and Splitting
By using this operation, we can combine multiple arrays into one and conversely split a single array into multiple arrays.
Concatenation:
Using the following functions we concatenate 2 arrays in NumPy:
- concatenate
- vstack(vertical stack)
- hstack(horizontal stack)
For 1D array:
x = np.array([3,4,5]) y = np.array([6,7,8]) np.concatenate([x, y])
For 2D array:
a = np.array([[1,2,3], [4,5,6]]) np.concatenate([a,a]) #to concatenate along the 1st axis
np.concatenate([a,a], axis = 1) #to concatenate along the second axis
For working with arrays of mixed dimensions vstack and hstack are preferred.
x = np.array([1,2,3]) y = np.array([[8,7,6], [5,6,4]]) np.vstack([x,y])
z = np.array([[22], [25]]) np.hstack([y,z])
Splitting:
Use the following function to split arrays in NumPy:
- split
- vsplit(vertical split)
- hsplit(horizontal split)
x = [1, 2, 3, 22, 35, 6, 7, 8] x1, x2, x3 = np.split(x, [3, 5])
y = np.arange(9).reshape((3, 3)) y
up, down = np.vsplit(y, [2]) print(up)
print(down)
left, right = np.hsplit(y, [2]) print(left)
print(right)
Example: Let us see how the above operations are performed on arrays and matrices with the help of some use cases.
Reshaping:
Problem Statement 1: Reshape numbers from 0 to 5 into 3 rows and 2 columns and store it in an array.
Problem Statement 2: Reshape the same array as used in Q1 with 2 rows and 3 columns
Problem Statement 3: Reshape the same array to have 3 columns with an unspecified number of rows.
Indexing and Slicing
Vector indexing
Indexing a 1D array according to the given statement:
Output:
Matrix Indexing
Indexing a matrix according to the given statement
We can follow the above steps to work on NumPy array creation and its related operations.
We hope this post clearly explains the concept of data manipulation with NumPy.
In the next blog, we will be discussing more on NumPy and illustrate more examples. Keep visiting our website Acadgild for more updates on Data Science and other technologies.
Enroll for Data Science Masters course conducted by Acadgild and become a successful and professional Data Scientist.