Data Analytics with R, Excel & TableauData Science and Artificial Intelligence
Trending

Customer Segmentation with RFM Model.

Introduction

RFM (recency, frequency, monetary) analysis is a marketing research technique used to find which customers are the best ones by RFM score, how recently a customer has purchased (recency), how repeatedly they purchase (frequency), and how much the customer spends (monetary).

RFM looks at recency, frequency and monetary values for each customer, associate them, and then organize them into distant customer segments for campaign targeting. RFM analysis is useful in understanding receptivity of your customers and for segmentation directed database marketing.

Free Step-by-step Guide To Become A Data Scientist

Subscribe and get this detailed guide absolutely FREE

RFM analysis is an easy method to find your best customers, explain their behavior and then run targeted marketing campaigns to increase business, satisfaction and customer lifetime worth.

Customer Segmentation

Are you facing challenges in conversion rates in spite of context up the best marketing campaigns?

Then, the main reason for it is you aren’t targeting the right customer.

Because each customer behaves differently to a special  marketing campaign. 

To classify your customers established on their buying behavior and then influence each customer segment in a particular way and not the same for your entire customer database.

To attain that, you need RFM (Recency, Frequency, Monetary) analysis.

RFM is a proven marketing research model to build customer relationships and for behaviour based customer segmentation. It arranges customers based on their transaction history – how recently, how frequently and how much they spent.

The RFM model brings us an idea about what percentage of your actual customers would be in each of these segments. And figure out how effective the recommended marketing force can be for your business.

Here’s how RFM analysis is useful…

Sending a custom  message to the targeted customer which is classified by RFM model group will generate much higher conversion rate.

Isn’t it wonderful?

All marketing or promotional campaigns should pick up a target customer segment at first, then create promotional material that will circulate with targeted audiences.

RFM makes identifying customer groups easy.

RFM segmentation easily answers these questions about your business…

  • Who are Your loyal customers?
  • Which customers are about to churn?
  • Who are Potential consumers to be converted into more profitable customers?
  • Who are those customers that you don’t need to pay much attention to?
  • Which kind of customers you must retain?
  • Which group of customers is more likely to react to your current campaign?

Proven Effectiveness 

Pareto’s rule says that 80% of the results come from 20% of the causes.

Similarly, 20% of customers contribute to 80% of your total revenue.

People who spent once are more likely to spend again. People who spend a lot are more likely to repeat them.

Pareto Principle is at the core of the RFM model. Focusing your efforts on critical segments of customers is likely to give you a much higher return on investment!

Data Sources:  https://docs.google.com/spreadsheets/d/1cfEo-HT6H0Gwve3Kvar-R_3NvlEJ89Mof-RLjTT442w/edit#gid=223842759   

About this file

This dataset contains all purchases made for an online retail company based in the UK during an eight month period.

Columns

InvoiceNo

StockCode

Description

Quantity

InvoiceDate

UnitPrice

CustomerID

Country

Let’s dive into coding part: 

To upload .xlsx file we have to import a library i.e; readxl.

To see the first 6 rows and No. columns we use head command. And then extracting the structure of the data by using str command.

In the above console we can see the structure of the data having 541909 rows and  8 variables. 

Let’s find out if there is any missing data in our dataset  with is.na() command. 

In the console it is saying true; means there’s missing values are available in our dataset.

Let’s find out how many missing values are present in our data set column wise. 

In the above console there are 135080 missing values in CustomerID.

Let’s convert the date in default format in R first. 

Adding one more column to our dataset i.e; Amount.

With sapply() function we can see the sum of missing value. 

Here in the above console we can see that Description is having 1454 and CustomerID is having 135080 total missing values.

Here nrow() to see the total number of rows. 

Total numbers of rows are present in the data set are 541909.

Let’s clean the data set by dropping Null values using !is.na command for both CustomerId and Description.  

Now the total number of rows is 406829. 

Data Preparation:  

Let’s change the date month and Year format into R default format.

Outliers detection: 

Install the library ggplot2 for interactive visualization

Box Plot 

For outliers detection basically we go for box plot. 

Here in the above console we can see some of the Amount values are more than 2000. Which leads to outliers.

To remove outliers we limit the amount variable to less than 2000.

Now the data is ready for further analysis. Now we can check for the top 10 products sold by month-year and Year. And also top five selling products by sales revenue in 2020 and 2011.

In the above console we can see the top 5 products by the year 2010 and 2011. 

Let’s see the top 5 selling products by monthly in 2010-2011. 

In the above console we can see the bar plot for top selling plot with respect to amount. 

Sales revenue by month: 

Visualizing sales revenue in 2010.  

Here in the above console we can see sales Revenue 2010 by months. 

Visualizing sales revenue in 2011. 

Here in the above console we can see sales Revenue 2011 by months. 

Top product by description and the number of orders by quantity.  

Here in the above console we can see the Top product by description and the number of orders by quantity in 2010.

Here in the above console we can see the Top product by description and the number of orders by quantity in 2011.

Top countries based on year of spent.

Here in the above console we can see that the United Kingdom, Germany, France were the top 3 countries who spent more in 2010.

Here in the above console we can see that the United Kingdom, Netherland, Eire were the top 3 countries who spent more in 2010.

Top countries with the number of quantity of a product ordered by year.  

In the above table we can see Eire has ordered 1440 Assorted incense pack by the year of 2010. 

In the above table we can see Eire has ordered 1440 Assorted incense packs by the year of 2010. 

RFM Analysis

Here we will be preparing the data for Recency, Frequency and Monetary analysis for Customer segmentation and classification. There’s nothing much to prepare other than formatting dates that we have already done in our data preparation part above for the analysis. We will need the following columns for the analysis InvoiceNo, Quantity, InvoiceDate, UnitPrice, CustomerId, Country, and Amount. Here I have included cancelled transactions as they can affect how the customer is classified.

Lets store our data into dtatRFM with the following columns InvoiceNo, Quantity, InvoiceDate, UnitPrice, CustomerId, Country, and Amount.We required a packed name didroorfm for RFM analysis. So we have to install the package with command install.package(“didroorfm”).

And then we will call this package by the library() function to perform further tasks.  

To find The RFM score we will give five for each recency, frequency, and monetary. Based on that score we will find the most loyal customer as well the customer who left our ecosystem. 

Based on the RFM score we can see there are less people having 4 and 5 which is a very good score. And most number of most people are lying on the score of 3. Those who are below 3 need attention to bring them back to the ecosystem. 

Following these scores the company can do email campaigns for marketing based on their score, the company will give offers or discounts to bring their customers back into the ecosystem . Most of the retailer companies like Walmart, Flipkart, Amazon, ebay use this technique to make customer relationship. 

Here in the above console we have divided our customers into classes. You can see in the numbers in which class customers lies. 

We can see that most of the customers belong to class 2 and 3. Customers belonging to Class 5 and 4 are our most valuable Customers. Customer Class by Country.

RFM score by country.  

In the above console we can see that the customer/customers having class 1-5 belong to which countries. 

This kind of analysis can help any organization to make decisions after knowing the customer and boost business.  

Suggested Reading:

https://acadgild.com/blog/data-cleaning-using-mice-package-in-r

Keep visiting our site www.acadgild.com for more updates on Data Analytics and other technologies. Click here to learn data science course in Bangalore.

Series Navigation<< Data cleaning using Mice Package in RData Analytics resume building >>

Badal Kumar

Data Analyst at Aeon Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close