Data Analytics with R, Excel & Tableau
Trending

Hierarchical Clustering with R

Hierarchical clustering is the other form of unsupervised learning after K-Means clustering. It is a type of machine learning algorithm that is used to draw inferences from unlabeled data. This approach doesn’t require to specify the number of clusters in advance. 

We will carry out this analysis on the popular USArrest dataset. We have already done the analysis on this dataset by using K-means clustering in our previous blog. I suggest you to go through the blog to have a better understanding of the dataset. You can refer to the same from the below link: Analyzing USArrest dataset using K-means Clustering

Let us now dive into the coding part :

We will load the dataset and get the first few records.

Getting the structure of the dataset using the str() function.

Checking for any null values, if present

Hence there is no null value present. 

Summarizing the dataset using the summary() function.

Now that we have summarized the dataset and observed that there are total 50 rows and 4 columns. 

Importing the necessary libraries.

Data Preparation

Scaling the dataset and displaying the first few records

Based on the algorithmic structure, there are two ways of clustering the data points.

  • Agglomerative: An agglomerative approach begins with each observation in a separate clusters of its own, and successively merges similar clusters together until a stopping criterion is satisfied, until there is just one big clusters.
  • Divisive: this is an inverse of agglomerative clustering, in which all objects are included into one cluster. 

Performing Agglomerative Hierarchical Clustering

We perform the agglomerative hierarchical clustering with hclust. 

First we need to compute the dissimilarity values using dist() function and will then store these values into hclust() function.

After this we specify the agglomeration method to be used (i.e. “complete”, “average”, “single”, “ward.D”). Here we have used the method ‘complete linkage’ that means for each pair of clusters, the algorithm computes and merges them to minimize the maximum distance between the clusters. 

We will then plot the dendrogram, which is a multilevel hierarchy where clusters at one level are joined together to form the clusters at the next levels.

Dissimilarity Matrix

Hierarchical clustering using Complete Linkage

It gives the below graph

Working with Dendrogram

In the above code we have divided the tree into four groups and fetched the number of members in each cluster and then plot the graph.

Visualizing K-Means Clustering

We will use agnes() function, in which each observation is assigned to its own cluster. Then the similarity between each of the cluster is computer and the most similar cluster is merged into one. 

Creating two separate dendrograms

Hence we have computed the optimal number of clusters and visualize K-mean clustering.

Hope you find this blog helpful. In case of any query or suggestions drop us a comment below. 

Keep visiting our website for more blogs on Data Science and Data Analytics.

Suggested reading:

Keep visiting our site www.acadgild.com for more updates on Data Analytics and other technologies. Click here to learn data science course in Bangalore.

Keep visiting our website for more blogs on Data Science and Data Analytics.

Series Navigation<< Google Assistant

Badal Kumar

Data Analyst at Aeon Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close