Big Data Hadoop & Spark

Hive Use Case – Real Estate Analysis

This blog will help you learn analysis in Hive. While the application described in this article will clear your concepts in Hive querying.
Due to an industry, real estate activity is outlined as any economic dealings associated with the acquisition, sale, owner-operation or lease of property. This in addition includes income-generating residential properties, like flat, buildings and single-room rentals.
Real estate services are not enclosed within the sector. Also, samples of real estate services embrace brokerages, property management, appraisers, investment property analysts and different consultants.
All analysts, working with Big Data are using Hive or some other tool to query dataset and get results with ease. Although other querying languages exists, Hive gives us a variety of new features when compared to traditional approaches.
So,On demand of accelerating consumers in real estate field, filtered series of information was collected and handed over to data analyst team.
Given a dataset to perform analysis, answer the following Top Rated questions from buyers.
Use the below link to download dataset for analysis.

Load data into Hive.

load data local inpath ‘/home/hadoop/Downloads/real_state.csv’ into table realEstate;

Problem Statement:

City wise list all the Condos which is not less than ten thousand.
Hence query formed:select * from realEstate where ((price>10000) AND (type==’Condo’)) group by city;

hive> select * from realEstate where ((price>10000) AND (flat_type==’Condo’)) group by city;
Query ID = acadgild_20161128193737_259037e2-32d6-4916-8a47-e7ae4b84ac65
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1479427486045_0007, Tracking URL = http://localhost:8088/proxy/application_1479427486045_0007/
Kill Command = /home/acadgild/hadoop-2.6.0/bin/hadoop job -kill job_1479427486045_0007
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-11-28 19:42:19,269 Stage-1 map = 0%, reduce = 0%
2016-11-28 19:43:19,547 Stage-1 map = 0%, reduce = 0%
2016-11-28 19:44:20,442 Stage-1 map = 0%, reduce = 0%
2016-11-28 19:45:08,760 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.58 sec
2016-11-28 19:46:00,946 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 7.33 sec
2016-11-28 19:46:05,239 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 9.05 sec
2016-11-28 19:46:13,085 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 10.86 sec
2016-11-28 19:46:24,593 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12.77 sec
MapReduce Total cumulative CPU time: 12 seconds 770 msec
Ended Job = job_1479427486045_0007
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 13.16 sec HDFS Read: 63459 HDFS Write: 137 SUCCESS
Total MapReduce CPU Time Spent: 13 seconds 160 msec
Time taken: 553.351 seconds, Fetched: 13 row(s)

Problem Statement:

In GALT city which residential type has more than 800sq__ft. Display their respective details street,sq__ft,sale_date,city.

Hence query formed:
Select street,sq__ft,sale_date,city from realEstate where ((city==’GALT’) AND (sq__ft>800));

Problem Statement:

Which is the cheapest Condo in CA. name the city,street and price for the Condo.
Hence query formed:

Select street,city,price from realEstate where type==’Condo’ order by price limit 1;

Problem Statement:

List top 5 residency details which lie in the budget of 60000-120000, an area more than 1450, sold after 17th may, min bedroom 3 and, min bathroom 2.

Hence query formed:
Select street,city,zip,state,beds,baths,sq__ft,type,sale_date,price from realEstate where((price>=60000 and price<=120000) AND (sq__ft>1150) AND (beds>2) AND (baths>1)) order by price limit 5;


Also Now design a query using the partition to help data analyst find the solution faster, as dividing dataset according to property types and bedrooms.
Especially relevant for understanding partitioning Hive follow the link.

Problem Statement:

separate list of residential apartments with more than 2 beds. Also include columns in following order City,Baths,Sq_feet,Price,flat_type,Beds respectively.

Hence Creating table to do partitioning.
CREATE TABLE interestedBHK( City string,Baths int,Sq_feet int,Price int) partitioned BY (flat_type string,Beds int)row format delimited FIELDS terminated BY ‘,’ stored AS textfile;

Also load data and  Hence query formed:
insert into table interestedBHK partition(flat_type,Beds) select City,Baths,Sq_feet,Price,flat_type,Beds from realEstate where (Beds>2 and flat_type=’Residential’);


Therefore hope this use case helps you work with Hive.
For more queries, you can comment below and also come to ACADGILD for more trending blogs on Big Data and other technologies.


  1. Very informative and interesting use cases are provided in the blog.One can get good expertise in Big Data Technologies by following this blog regularly.Thanks for all the authors for sharing the knowledge with us..

  2. Hi Prateek
    I have tried to load the data into hive table but not loaded sucessfully . The problem is the column name ” sale_date” . i have taken ‘timestamp’ datatype but what you are taken please let me know.

  3. Hi sir,
    i have tried to run this query
    hive> select * from realEstate where ((price>10000) AND (type==’Condo’)) group by city;
    FAILED: SemanticException [Error 10025]: Expression not in GROUP BY key street
    Can you please let me know the issue

  4. SemanticException [Error 10025]: Expression not in GROUP BY key street
    select * from realestate where ((price>10000) AND (type==’Condo’)) group by city;
    your query is wrong

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles