Data Analytics with R, Excel & Tableau
Trending

Association Rule mining Using R

Market basket analysis is the proven technique used to find the  hidden pattern of purchases. The relationship is made in the form of a conditional algorithm.

              IF {beer, whiskey} THEN {diaper}

like, which describes “the items on the right hand side are likely to be ordered with the items on the left hand side:”

{Ai} -> {Ci}

To learn about market basket analysis THEORY please refer to the article link below. 

In this article we are going to discuss purchase behaviour by customers using cosmetic data.

The cosmetic Dataset

Description:

The cosmetic data set contains point-of-sale transaction data from a typical local retail outlet. The data set transactions and the items are aggregated in categories.

Let’s dive into coding part: 

At first We have stored data into mydata.  

For association rule analysis we are going to use the library arules.

Install arules package by instal.package() command. And then load the library using the library() command. 

Let’s create some rules by running this line here. 

In the above console we’ve noticed that it is showing 68880 rules whereas by default.  

Let’s check the summary() of rules. 

T confidence is 80% and support is 10%. 10% indicates that translation which has at least occured 10% of the time.  Sometimes in a bigger data set we can find products atlet 1% of the time.

Rules with specified parameter values

Now lets create rules tuning parameters with minimum support of 70%  with 80% confidence by default. 

In the above console it has identified 15 rules.

Now let’s inspect rules

Here in the above console we can see the top 15 rules.we can see, if we see the 9th rule someone buys eyebrow.pencil  , they are 90% likely to buy a bag too. The problem is all the rules you see in the console have 0 values.

Targeting items

lets create rules tuning parameters with minimum support of 70%  with 80% confidence. And keeping Foundation on the right hand side and default lhs. 

There are 16 rules that have been generated and foundation is the first item to purchase. Lets inspect these rules. 

In the above console we can see that the customer has purchased lip.gloss also purchase foundation with support of 35.6% and confidence of 72%. 

Data visualization: 

To viz rules we have to install a package i.e; arulesViz and load the library with library command.  

Lets plot rules 

In the above console we can see lift, support, & confidence values.  Darker red dots denotes high lift value. And light dots mean low support, low confidence and lift value is also low. 

Plot method as grouped.

So this is a slightly presentation of the same thing, as darker colour shows high lift value. Support is denoted by the size of the bubble. 

Lets plot in method as graph. 

This is one of the best graphical presentations of the same thing, as darker colour shows high lift value. Support is denoted by the size of the bubble. Here lift values vary from (1.34-1.577) and support values from (0.116-0.356). 

lets create rules tuning parameters with minimum support of 1%  with 70% confidence. And keeping Foundation on the right hand side and bag, blush at left hand side. 

 In the above console we can see the top 19 rules.

Here in the above console we can see the top 19 rules.we can see, if we see the 1st rule someone buys lip.Glows  , they are 72.6% likely to buy a bag too. 

Plot rules with method as graph.

This graphical presentation of the same thing, as darker colour shows high lift value. Support is denoted by the size of the bubble. Here lift values vary from (1.306-1.577) and support values from (0.021-0.356)

lets create more rules tuning parameters with minimum support of 10%  with 50% confidence. And keeping Foundation on the right hand side and bag, and all the other items at lhs. 

 In the above console we can see the top 22 rules.

Here in the above console we can see the 1st rule someone buys lip.Glows  , they are likely to buy foundation with 167 time observed so on and so forth. 

Plot rules with method as graph.

This graphical presentation of the same thing with 22 rules, as darker colour shows high lift value. Support is denoted by the size of the bubble. Here lift values vary from (0.953-1.37) and support values from (0.1-0.356)

Redundancies

We can ignore these repeated rules using the following snippet of code:

To find redundancy we use the command is.redundant(). 

In the above console it is showing which all the rules are duplicate has been terminated. 

We got 10 best rules.

In the above plot we can see the 10 rules and we can see the connection of items with other items as well. 

Suggested Reading:

Hope this blog helped you in understanding modelling technique market basket analysis & its use.

Keep visiting our site www.acadgild.com for more updates on Data Analytics and other technologies. Click here to learn data science course in Bangalore.

Series Navigation<< KNN in Python

Badal Kumar

Data Analyst at Aeon Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close