LDA stands for Linear Discriminant Analysis is another Machine Learning technique and classification method used for dimensionality reduction technique which is used in supervised classification problem. One of its main advantages is the model is interpretable and the prediction is easy.
The main goal of dimensionality reduction techniques is to reduce the dimensions by removing the redundant and dependent features by transforming the features from higher dimensional space to a space with lower dimensions.
The difference between PCA and LDA is PCA ignore the class labels and LDA attempts to find a feature subspace the maximize class separability.
In this blog we will make predictions on the irish dataset. We have already implemented irish dataset by Principal Component Analysis in our previous blog.
So let us begin coding in R and understand the difference between PCA and LDA.
Loading the data and displaying the first few records.
Getting the structure of the dataset.
Checking for null values, if present.
Hence, no null value present in the whole dataset.
Summarizing the dataset.
Finding the correlation between the independent variables.
As we can see the correlation between Petal.Length and Petal.Width is more.
Data prepration:
Splitting the data into training data and test data
Computing LDA
The Linear Discriminant Analysis can be easily computed using the function lda() from the MASS package.
It gives the following output
LDA determines group means and computes, for each individual, the probability of belonging to the different groups. The individual is then affected to the group with the highest probability score.
Percentage separations achieved by the first Discriminant Function is 99.14% which is very high.
Let’s Check the attributes in LDA
Checking the prior probabilities of class membership using the attribute ‘prior’.
Counting the number of each species using the attribute ‘counts’
Scaling the values of the 2 Linear discriminant obtained in the earlier result.
VisualizationÂ
Stacked histograms of discriminant function values.
The above code displays histograms and density plots for the observations in each group on the linear discriminant dimension.
As we can group 1 that is, Setosa is not overlapping with other species while Versicolor and Verginica are overlapping at some point.
Visualizing Biplot
Creating Confusion Matrix and Accuracy on Training Data
We got an accuracy of 97.34% on the training data.
Creating Confusion Matrix and Accuracy on Test Data
As we can see we got the best accuracy on the Test data, therefore we can infer that all flowers belongs to their respective species correctly.
And this brings us to the end. Do drop us a comment below for any query or suggestions.
Suggested Reading:
Keep visiting our site www.acadgild.com for more updates on Data Analytics and other technologies. Click here to learn data science course in Bangalore.
Keep visiting our website for more blogs on Data Science and Data Analytics.