Hive Real Life Use Cases

Practice Query Creation

This blog of Big Data will be a good practice for Hive Beginners, for practicing query creation. At the end, you will be able to create a table, load data to the table and perform analytical analysis on the dataset provided in Hive real life use cases.
The Topic for this blog as referring to the real life dataset of Petrol suppliers. And in the second part, we are covering real world Olympic records.
The sample dataset is made in reference to today’s rate of petrol consumption.But is not an actual data. It has been modified and published in terms to learning only.
Here in this blog we will compare top petrol sellers of the world and get some hands-on experience in Hive. Go to the link If you need to download and setup Hive.

PETROL:

DATA SET DESCRIPTION:

 ColumnNO. Name Example DataType Column1: District.ID I4N 1M1 varchar Column2: , Distributor name shell varchar Column3: Buy rate (million) \$957.70 varchar Column4: Sell rate(million) \$5779.92 varchar Column5: volumeIN(millioncubic litter) 933 int Column6: volumeOUT(millioncubic litter) 843, int Column7: Year 1624 int

create table petrol (distributer_id STRING,distributer_name STRING,amt_IN STRING,amy_OUT STRING,vol_IN INT,vol_OUT INT,year INT) row format delimited fields terminated by ‘,’ stored as textfile;

1)In real life what is the total amount of petrol in volume sold by every distributor?
SELECT distributer_name,SUM(vol_OUT) FROM petrol GROUP BY distributer_name;

2)Which are the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually?
SELECT distributer_id,vol_OUT FROM petrol order by vol_OUT desc limit 10;

3)Find real life 10 distributor name who sold petrol in the least amount.

SELECT distributer_id,vol_OUT FROM petrol order by vol_OUT limit 10;

4)Try One yourself
The constraint to this query is the difference between volumeIN and volumeOuT is illegal in real lifeÂ if greater than 500. As we see all distributors are receiving patrols on every next cycle.
List all distributors who have this difference, along with the year and the difference which they have in that year.
Hint: (vol_IN-vol_OUT)>500

——————————xxx xxx xxx—————————

We all know how much interesting Olympics held every time, and we all love it.InÂ real life Olympic Games are considered the world’s foremost sports competition with more than 200 nations participating. The Olympic Games are held every four years, with the Summer and Winter Games alternating by occurring every four years but two years apart.

The sample dataset is made in reference to real life Olympic competition.But is not an actual data. It has been modified and published in terms to learning only.

Olympic:

Olympic Data analysis using Hive

 ColumnNO. Name Example DataType Column1: AthleteName Michael Phelps STRING Column2: , Age 23 INT Column3: Country United States STRING Column4: Year 2008 INT Column5: Closing Date 8/24/2008 STRING Column6: Sport Swimming STRING Column7: Gold Medals 8 INT Column8: Silver Medals 0 INT Column9: Bronze Medals 0 INT Column10: Total Medals 8 INT

create table olympic (athelete STRING,age INT,country STRING,year STRING,closing STRING,sport STRING,gold INT,silver INT,bronze INT,total INT) row format delimited fields terminated by ‘\t’ stored as textfile;

1. Using the dataset list the total number of medals won by each country in swimming.

select country,SUM(total) from olympic where sport = “Swimming” GROUP BY country;

2)Display real life number of medals India won year wise.
select year,SUM(total) from olympic where country = “India” GROUP BY year;

3)Find the total number of medals each country won display the name along with total medals.
select country,SUM(total) from olympic GROUP BY country;

4)Find the real life number of gold medals each country won.
select country,SUM(gold) from olympic GROUP BY country;

5)Try One yourself
Which country got medals for Shooting, year wise classification?
Hope this blog helped you in learning Hive with real life scenario, which we come through our everyday life.
While I leave you with a simple query to solve, keep visiting our site ACADGILD Â for more practice, queries on Hive and other trending technologies.

Related Popular Courses:

BIG DATA DEVELOPER

ANDROID CERTIFICATE COURSE

KAFKA CONSUMER EXAMPLE

DATA SCIENCE COURSES ONLINE

BEST ANALYTICS COURSES

prateek

An alumnus of the NIE-Institute Of Technology, Mysore, Prateek is an ardent Data Science enthusiast. He has been working at Acadgild as a Data Engineer for the past 3 years. He is a Subject-matter expert in the field of Big Data, Hadoop ecosystem, and Spark.

1. Jitendra says:

I have basic knowledge and practice of all Hadoop tools.
I want to make carrier in bigdata, but how ???

2. Sangeetha says:

Very useful.

3. vaish says:

select country,year from olympic where sport=â€™Shootingâ€™ order by year;
is this correct??

1. Gopikrishna K S says:

select country, year, sum(total) from olympics where sport = ‘Shooting’ group by country, year;

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Close