Big Data Hadoop & SparkData Analytics with R, Excel & Tableau

Machine Learning with Spark – Part 4 : Determining Credibility of a Customer

In our earlier posts of Machine Learning with Spark, we had seen how our data looks like, along with its headers. However, that description was not sufficient to provide a complete business view. To have a complete grasp of the problem, we should know every part of the attributes of the data.
Following is the mapping for each attribute’s numerical value to its actual categorical values. This gives us enough information about which attribute value corresponds to what significance in actual business.
Attributes:

Creditability :

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

  • 1 : Yes
  • 0 : No

Account Balance (DM i.e. Deutsche Mark,western Germany currency):

  • 1 : No Account
  • 2 : None
  • 3 : Below 200 DM
  • 4 : 200 DM or above

Duration of Credit (month): Continuous variable
Payment Status of Previous Credit :

  • 0 : Delayed
  • 1 : Other Credits
  • 2 : Paid
  • 3 : No problem
  • 4 : Previous credits cleared

Purpose of loan :

  • 1 : New car
  • 2 : Used car
  • 3 : Furniture
  • 4 : Radio/TV
  • 5 : Appliances
  • 6 : Repair
  • 8 : Vacation
  • 9 : Retraining
  • 10 : Business
  • 0 : Other

Credit Amount : Continuous variable
Value Savings/Stocks :

  • 1 : None
  • 2 : Below 100 DM
  • 3 : 100 – 500 DM
  • 4 : 500 – 1000 DM
  • 5 : > 1000 DM

Length of current employment :

  • 1 : Unemployed
  • 2 : < 1 Year
  • 3 : 1 – 4 Year
  • 4 : 4 – 7 Year
  • 5 : > 7 Year

Instalment per cent :

  • 1 : > 35%
  • 2 : 25% – 35%
  • 3 : 20% – 25%
  • 4 : < 20%

Sex & Marital Status :

  • 1 : Male(Divorced)
  • 2 : Male(Single)
  • 3 : Male(Married/Widowed)
  • 4 : Female

Guarantors:

  • 1 : None
  • 2 : Co-applicant
  • 3 : Guarantor

Duration in Current address:

  • 1 : < 1 Year
  • 2 : 1 – 4 Year
  • 3 : 4 – 7 Year
  • 4 : > 7 Year

Most valuable available asset :

  • 1 : None
  • 2 : Car
  • 3 : Life Insurance
  • 4 : Real Estate

Age (years) : Continuous variable
Concurrent Credits :

  • 1 : Other Banks
  • 2 : Dept Stores
  • 3 : None

Type of apartment :

  • 1 : Free
  • 2 : Rented
  • 3 : Owned

No of Credits at this Bank :

  • 1 : 1
  • 2 : 2 or 3
  • 3 : 4 or 5
  • 4 : More than 6

Occupation :

  • 1 : Unemployed,unskilled
  • 2 : Unskilled,permanent resident
  • 3 : Skilled
  • 4 : Executive

No of dependents:

  • 1 : >3
  • 2 : <3

Telephone :

  • 1 : Yes
  • 2 : No

Foreign Worker :

  • 1 : Yes
  • 2 : No

Now, let’s do a quick check on the average account balance, credit amount and loan duration as per the credibility.

# register the Customers frame as table
Customers.registerTempTable(“credit”)
# query the credability table to check average balance amount,average loan and average duration for
# each class of customer i.e. 1 and 0
results =  sqlContext.sql(“SELECT creditability, avg(balance) as avgbalance, avg(amount) as avgamt, \
                         avg(duration) as avgdur  FROM credit GROUP BY creditability “)
# check the result of the query
results.show()

+————-+——————+——————+——————+
|creditability|        avgbalance|            avgamt|            avgdur|
+————-+——————+——————+——————+
|