Data Analytics with R, Excel & TableauData Science and Artificial Intelligence
Trending

Market Basket Analysis

In today’s industrialism world “market basket analysis” is one of the most important modelling techniques that helps retailers to improve their business by predicting their purchasing behaviors. In this article, we’ll discuss how market basket analysis works and what it takes to improve business.

Market basket is the technique used to find the pattern of purchases. The relationship is made in the form of a conditional algorithm.

Free Step-by-step Guide To Become A Data Scientist

Subscribe and get this detailed guide absolutely FREE

For example, if a customer is looking forward to buying / purchasing items, bread and milk suggest or recommend item eggs too.

IF {bread, milk} THEN {egg}

like, which describes “the items on the right hand side are likely to be ordered with the items on the left hand side:”

{Ai} -> {Ci}

A bunch of items purchased by a consumer is a set of items or an itemset. The set of items on the left-hand side ({bread, milk} in the example above) is the precedent of the rule, while the one to the right {egg} is the subsequent.  

The probability of an event will occur, i.e., a customer will buy bread and milk, is the support of the rule. 

It refers to the frequency that a set of items come out in transactions details. The support of an item combination is used to analyse and classify products. Hence, if a bread and milk have high support, then they can attract people to the store and price for the product display accordingly.

The probability of a consumer will purchase an egg on the condition of purchasing bread and milk is specified as the confidence of the rule. Basically Confidence is used for product placing method and increasing profit of a business. Keeping high priced items near identified high confidence (which is known as driver) items can increase the overall pricing on purchases.

What is lift?

The lift of the market basket rule is known as the ratio of the support of the left-hand side of the rule (bread, milk) re-occurring with the right-hand side (egg), upon the probability that the left-hand side and right-hand side co-occur if both are independent.

  • A lift greater than >1 says that the presence of the precedent increases the chances that the subsequent will occur in a given transaction
  • Lift below <1 indicates that purchasing the precedent reduces the chances of purchasing the subsequent in the same transaction. 
  • When the lift is =1, then purchasing the precedent makes no difference on the chances of purchasing the subsequent

Apriori algorithm

The Apriori algorithm is a data mining technique  widely applied in business statistics that identifies a set of items or the itemsets that arise with a support greater than a frequency value and finds the confidence of all probable rules based on those set of items.

Market basket analysis technique or algorithm finds out the rules with lift that are greater than 1 assisted with high confidence level values and often, high support.

Applications:

  • Product recommendation 
  • Content optimisation  
  • Movie recommendation 

The Groceries Dataset

Description:

The Groceries data set contains 1 month (30 days) of real-world point-of-sale transaction data from a typical local grocery outlet. The data set contains 9835 transactions and the items are aggregated to 169 categories.

Source : rdrr.io

Let’s dive into coding part

For association rule analysis we are going to use the library arules.

First install arules package by instal.package() command. And then load the library using the library() command. 

To now about the dataset which is already inbuilt in R. 

Load the data into R environment for further analysis with read.csv() command line. And look for a summary of the dataset.  

In the above console we can see the item level details and the total number or rows are 9835 and 169 variables. And the most frequent items as well. 

Explore the data before making any rules:

Let us visualize the item frequency of product which purchased most frequently.  

Here in the above plot we can see whole milk, vegetables, buns etc are frequently bought by customers. 

Let us visualize item frequency of items which purchased most frequently with the support of more than 10%.  

In the above plot we can see which product is bought frequently having more than 10% confidence  level. 

Let’s create some rules by running this line here. 

In the above console we’ve noticed that it is showing 0 rules whereas by default.  

T confidence is 80% and support is 10%. 10% indicates that translation which has at least occured 10% of the time.  Sometimes in a bigger data set we can find products atlet 1% of the time.   

Rules with specified parameter values

Now lets create rules tuning parameters with minimum support of 0.1%  with 80% of confidence. 

In the above console it has identified 410 rules.

Now let’s inspect rules

Here in the above console we can see the top 10 rules.we can see, if someone buys house keeping products and sour cream , they are 92% likely to buy whole milk too. 

Sorting items out

sort rules by support in descending order.  

Often we want the most relevant rules first. Let’s find out the most likely rules. We can easily sort by support by executing the following code.

In the above console we can see the top 10 rules created by apriori algorithm. 

Redundancies

Generally, rules will repeat and some of the rules are very very obvious. Redundancy illustrates that one item might be a given. As an analyst we can vote to drop the repeated items from the data.
We can ignore these repeated rules using the following snippet of code:

To find redundancy we use the command is.redundant(). 

In the above console it is showing which all the rules are duplicate. 

To see the summary of redundant rules.

It shows that there are 18 rules which are redundant. 

To remove those duplicate rules we use !redundant. 

So now we have 392 rules after removing duplicate rules. 

Inspecting rules after removing repeated rules.

Lets plot rules

Here in the above console we can see rules plotted and because there are 392 rules we are unable to see it clearly. 

Targeting items

Lets target item whole milk and let see the rules with confidence of 80% and support of 0.01%.

There is no rule made at 0.015 of support.

Now we are targeting item whole milk and let see the rules with confidence of 8% and support of 0.01%.

Here in the above console we can see there are 35 rules generated. You need to play a lot with parameters to find the optimal number of rules. Normally the data set will be huge accordingly,we need to understand what support and confidence parameter values should be.

Let’s inspect the rules.  

Though we figure it out for whole milk we will see for beer also.  

Plot the items.

In the above plot we can see the 35 rules and we can see the connection of items with other items as well. 

Let’s do the same for beer.  

We got 22 rules.

In the above plot we can see the 22 rules and we can see the connection of items with other items as well. 

Suggested Reading:

Hope this blog helped you in understanding modelling technique market basket analysis & its use.

Keep visiting our site www.acadgild.com for more updates on Data Analytics and other technologies. Click here to learn data science course in Bangalore.

Tags

Badal Kumar

Data Analyst at Aeon Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close