Big Data Hadoop & Spark

Frequently Asked Hive Technical Interview Queries

This blog will be a good practice for big data developer/engineers to learn and execute common hive technical interview questions like the query to find friends of friends, transpose rows to columns.

Before moving to this blog user can refer our below link blogs to more on hive.

https://acadgild.com/blog/hive-beginners-guide

https://acadgild.com/blog/bucketing-in-hive

https://acadgild.com/blog/hive-real-life-use-cases

Then, let us begin with queries.

Scenario 1: write a query to find friends of friends i.e, mutual friends of a user.

DATASET:

Here we are using dataset freind_details to achieve the above objective which contains 2 columns user_name, and user_friend_name.

DOWNLOAD LINK:

You can download the friend_details dataset from the below link.

Friends_List_Dataset_Link

Once the input dataset is downloaded you use the below command to create table and load the dataset into the created table.

CREATE TABLE:

CREATE TABLE FRIEND_DETAILS(USER_NAME STRING, USER_FRIEND_NAME STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

LOAD DATA:

LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/Blog/Hive_Queries_Interview_Questions/Datasets/friend_list.txt' INTO TABLE FRIEND_DETAILS;

You can view the table values using the select all command.

VIEW DATA:

SELECT * FROM FRIEND_DETAILS;

Now, let us write the query to convert to find friend’s friends of a user.

QUERY:

SELECT  f1.USER_NAME,

        f2.USER_FRIEND_NAME as user_friend_of_friend

FROM    FRIEND_DETAILS f1,

        FRIEND_DETAILS f2

WHERE   f1.USER_NAME = 'prateek'

AND     f1.USER_FRIEND_NAME = f2.USER_NAME;

OUTPUT:

Here in the above query, we are creating a alias table name as f1 and f2, applying where condition to get mutual friends of the user prateek.

As we can see from the above result Onkar is a friend of Prateek and ajit and sumit are friends of Onkar, thus mutual friends of Prateek are ajit and sumit.

Scenario 2: In the SALES table quantity of each product is stored in rows for every year. Write a query to transpose the quantity for each product and display it in columns?

DATASET:

Here we are using dataset prod and sales to achieve the above objective.

DOWNLOAD LINK:

You can download the product details and product sales datasets from the below link.

PRODUCT_DETAILS_LINK

PRODUCT_SALES_LINK

DATASET DESCRIPTION:

PROD:

  1. PROD_ID
  2. PROD_NAME

SALES:

  1. SALE_ID
  2. PRODUCT_ID
  3. YEAR
  4. QUANTITY
  5. PRICE

Once the input dataset is downloaded you use the below command to create table and load the dataset into the created table.

CREATE TABLE PROD:

CREATE TABLE PROD(PROD_ID INT,PROD_NAME STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

LOAD DATA:

LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/Blog/Hive_Queries_Interview_Questions/Datasets/product_details.txt' INTO TABLE PROD;

VIEW DATA:

SELECT * FROM PROD;

CREATE TABLE SALES:

CREATE TABLE SALES(SALE_ID INT, PRODUCT_ID INT, YEAR INT, Quantity INT, PRICE INT)ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

LOAD DATA:

LOAD DATA LOCAL INPATH '/home/cloudera/Desktop/Blog/Hive_Queries_Interview_Questions/Datasets/product_sales.txt' INTO TABLE SALES;

VIEW DATA:

SELECT * FROM SALES;

Now, let us write the query to convert to transpose the quantity for each product and display it in columns.

Here the result should contain the product name and the total product quantity sold in each year.

EXPECTED RESULT:

prod_name quantity_sold_in_2016    quantity_sold_in_2017 quantity_sold_in_2018

QUERY:

SELECT P.PROD_NAME,

      MAX(IF(S.YEAR=2016, S.QUANTITY, NULL)) QUAN_2016,

      MAX(IF(S.YEAR=2017, S.QUANTITY, NULL)) QUAN_2017,

      MAX(IF(S.YEAR=2018, S.QUANTITY, NULL)) QUAN_2018

FROM   PROD P,

      SALES S

WHERE (P.PROD_ID = S.PRODUCT_ID)

GROUP BY P.PROD_NAME;

In the above query, we have used IF condition to create year columns and performed groupby operation based on the product name.

OUTPUT:

We can see from the output we have successfully shown the result of the total amount of products sold in yearly wise.

Scenario 3: Write a query to find the number of products sold in each year?

For, the above query we will be referring to the sales table which we have used in the above scenario.

QUERY:

SELECT YEAR,

      COUNT(1) NUM_PRODUCTS

FROM   SALES

GROUP BY YEAR;

OUTPUT:

As we can observe we have successfully achieved our object to find the number of products sold in each year.

We hope this post has been helpful in understanding how to perform data analysis using Hive. In the case of any queries, feel free to comment below and we will get back to you at the earliest.

Keep visiting our site ACADGILD for more trending blogs updates on Big Data and other technologies.

Related Popular Courses:

BIG DATA DEVELOPER

INSTALL NIFI ON LINUX

DATA SCIENCE COURSES ONLINE

BEST ANALYTICS COURSES

Manjunath

is working with AcadGild as Big Data Engineer and is a Big Data enthusiast with 2+ years of experience in Hadoop Development. He is passionate about coding in Hive, Spark, Scala. Feel free to contact him at [email protected] for any further queries.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close