All CategoriesBig Data Hadoop & Spark - Advanced

Running Hive Queries Using Java API

We can run Hive queries through a Java programs by using Hive JDBC connection. Let us learn running hive queries using Java API.

Pre-requisites for running Hive Queries Using Java Programs

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

  • For the Hadoop cluster setup, follow the instructions mentioned in the document given below:

https://acadgild.com/blog/hadoop-2-x-installation-guide/

  • For Hive installation, follow the instructions mentioned in the document given below:

https://drive.google.com/open?id=0B1QaXx7tpw3SMDktdk5RbHk1MVE

Note: We have configured Hive metastore as MYSQL. You can also configure the metastore to MYSQL by following this blog.

https://acadgild.com/blog/mysql-metastore-integration-with-hive/

Configure hiveserver2 port number by adding the below property in your hive-site.xml file

<property>

<name>hive.server2.thrift.port</name>

<value>10000</value>

<description>TCP port number to listen on, default 10000</description>

</property>

In the newer versions of Hive, to connect with JDBC, Hive uses Hiveserver2. So, you need to start Hiveserver2 for working out through hive-jdbc.

Note: Please make sure that all the Hadoop daemons and Hive-metastore and Hiverserver2 have been started successfully before executing your Java program.

To start the Hadoop daemons, use the command start-all.sh

To start hive-metastore, use the command hive –service metastore

To start hiveserver2, use the command hiveserver2 start

Before writing a Java program, let us check in Hive.

We are using the default database of the Hive and inside we have a list of tables, one of which is the olympics table. We have described the olympics table to check the schema of the table. You can see the result in the below screen shot.

Now let us write a Java program for running the same queries.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class Hive_java{
  private static String driver = "org.apache.hive.jdbc.HiveDriver";  //driver used for hiveserver2
  public static void main(String[] args) throws SQLException {
    try {
      Class.forName(driver);
    } catch (ClassNotFoundException e) {
      e.printStackTrace();
      System.exit(1);
    }
    Connection connect = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", "");   //connecting to default database
    Statement state = connect.createStatement();
    System.out.println("!!!!!!!!!!Running 1st query!!!!!!!!!!");
    System.out.println("Listing tables in hive");
    ResultSet show_tables = state.executeQuery("show tables");
    while (show_tables.next()) {
        System.out.println(show_tables.getString(1));
    }
    System.out.println("!!!!!!!!!!Running 2nd query!!!!!!!!!!");
    System.out.println("Describing olympics table");
    ResultSet describe_table = state.executeQuery("describe olympics");
    while (describe_table.next()) {
        System.out.println(describe_table.getString(1) + "\t" + describe_table.getString(2));
      }
  }
}

Hadoop

For running this program, you need to add some dependencies. Below is the list of dependencies that is required.

Hive-jdbc-*-standalone.jar

hive-jdb-*.jar

hive-metastore-*.jar

hive-service-*.jar

Note: Here * represents your Hive version. You can find all these jars in your $HIVE_HOME/lib directory and other than these you need to add two more jars Hadoop-core-1.2.1.jar & mysql-connector jar. You can download them from the below link.

https://drive.google.com/open?id=0ByJLBTmJojjzdmY0MzBHb1F5YXc

Now if you run this program, you will get the list of tables present in the Hive and the schema of the olympics table. You can see the same in the below screen shot.

Now let us try to run some Hive queries which require to launch a MapReduce job.

Add the below code in the same program.

System.out.println("!!!!!!!!!!Running 3rd query!!!!!!!!!!");
    System.out.println("Calculating the average age of atheletes");
    String select = "select avg(olympic_age) from olympics";
    ResultSet avg_age = state.executeQuery(select);
    while(avg_age.next()){
    	System.out.println(avg_age.getString(1));
    }

Now let us try to run the program with the added code.
This time we have got an error!!

Calculating the average age of atheletes
Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296)
	at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
	at Hive_java.main(Hive_java.java:41)

To resolve this error, in the database connection line we need to provide the username and password of the Hadoop user. This user should have complete access and all the permissions of HDFS

This is our database connection loc.

Connection connect = DriverManager.getConnection(“jdbc:hive2://localhost:10000/default”, “”, “”); //connecting to default database

Now we will provide the details of Hadoop user with all the permissions to HDFS.

Connection connect = DriverManager.getConnection(“jdbc:hive2://localhost:10000/default”, “kiran”, “”); //connecting to default database

Now let us run the code again.

Yes! This time it is successful and you can see the result in the below screen shot.

!!!!!!!!!!Running 3rd query!!!!!!!!!!
Calculating the average age of atheletes
26.405433646812956

This is how we can run Hive queries through Java programs using hive-jdbc conenctor.

We hope this blog helped you in running Hive queries through Java programs. Keep visiting our site www.acadgild.com for more updates on big data and other technologies.

Hadoop

Suggested Reading

How to run Hive queries on Spark

Related Popular Courses:

                  LEARN HADOOP ONLINE

                 CORE JAVA ESSENTIALS

            ANDROID COURSE

                 IT TRAINING WEBSITES

                 WHAT IS APACHE KAFKA

Tags

2 Comments

  1. This blog helps me a lot. But when I tried to execute above java code, it given me below error. I have import all jar files as per the requirement. Can you please help?

    18/05/30 17:12:15 INFO jdbc.Utils: Supplied authorities: localhost:10000
    18/05/30 17:12:15 INFO jdbc.Utils: Resolved authority: localhost:10000
    Exception in thread “main” java.lang.NoSuchMethodError: org.apache.hadoop.hive.common.auth.HiveAuthUtils.getSocketTransport(Ljava/lang/String;II)Lorg/apache/thrift/transport/TTransport;
    at org.apache.hive.jdbc.HiveConnection.createUnderlyingTransport(HiveConnection.java:519)
    at org.apache.hive.jdbc.HiveConnection.createBinaryTransport(HiveConnection.java:539)
    at org.apache.hive.jdbc.HiveConnection.openTransport(HiveConnection.java:309)
    at org.apache.hive.jdbc.HiveConnection.(HiveConnection.java:196)
    at org.apache.hive.jdbc.HiveDriver.connect(HiveDriver.java:107)
    at java.sql.DriverManager.getConnection(DriverManager.java:664)
    at java.sql.DriverManager.getConnection(DriverManager.java:247)
    at Hive_java.main(Hive_java.java:15)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close