In our earlier posts of Machine Learning with Spark, we had seen how our data looks like, along with its headers. However, that description was not sufficient to provide a complete business view. To have a complete grasp of the problem, we should know every part of the attributes of the data.
Following is the mapping for each attribute’s numerical value to its actual categorical values. This gives us enough information about which attribute value corresponds to what significance in actual business.
100% Free Course On Big Data Essentials
Subscribe to our blog and get access to this course ABSOLUTELY FREE.
Account Balance (DM i.e. Deutsche Mark,western Germany currency):
Duration of Credit (month): Continuous variable
Purpose of loan :
Credit Amount : Continuous variable
Length of current employment :
Instalment per cent :
Sex & Marital Status :
Duration in Current address:
Most valuable available asset :
Age (years) : Continuous variable
Type of apartment :
No of Credits at this Bank :
No of dependents:
Foreign Worker :
Now, let’s do a quick check on the average account balance, credit amount and loan duration as per the credibility.
|# register the Customers frame as table
# query the credability table to check average balance amount,average loan and average duration for
# each class of customer i.e. 1 and 0
results = sqlContext.sql(“SELECT creditability, avg(balance) as avgbalance, avg(amount) as avgamt, \
avg(duration) as avgdur FROM credit GROUP BY creditability “)
# check the result of the query
|creditability| avgbalance| avgamt| avgdur|