We can run Hive queries through a Java programs by using Hive JDBC connection. Let us learn running hive queries using Java API.
Pre-requisites for running Hive Queries Using Java Programs
-
For the Hadoop cluster setup, follow the instructions mentioned in the document given below:
https://acadgild.com/blog/hadoop-2-x-installation-guide/
-
For Hive installation, follow the instructions mentioned in the document given below:
https://drive.google.com/open?id=0B1QaXx7tpw3SMDktdk5RbHk1MVE
Note: We have configured Hive metastore as MYSQL. You can also configure the metastore to MYSQL by following this blog.
Configure hiveserver2 port number by adding the below property in your hive-site.xml file
<property>
<name>hive.server2.thrift.port</name>
<value>10000</value>
<description>TCP port number to listen on, default 10000</description>
</property>
In the newer versions of Hive, to connect with JDBC, Hive uses Hiveserver2. So, you need to start Hiveserver2 for working out through hive-jdbc.
Note: Please make sure that all the Hadoop daemons and Hive-metastore and Hiverserver2 have been started successfully before executing your Java program.
To start the Hadoop daemons, use the command start-all.sh
To start hive-metastore, use the command hive –service metastore
To start hiveserver2, use the command hiveserver2 start
Before writing a Java program, let us check in Hive.
We are using the default database of the Hive and inside we have a list of tables, one of which is the olympics table. We have described the olympics table to check the schema of the table. You can see the result in the below screen shot.
Now let us write a Java program for running the same queries.
import java.sql.SQLException; import java.sql.Connection; import java.sql.ResultSet; import java.sql.Statement; import java.sql.DriverManager; public class Hive_java{ private static String driver = "org.apache.hive.jdbc.HiveDriver"; //driver used for hiveserver2 public static void main(String[] args) throws SQLException { try { Class.forName(driver); } catch (ClassNotFoundException e) { e.printStackTrace(); System.exit(1); } Connection connect = DriverManager.getConnection("jdbc:hive2://localhost:10000/default", "", ""); //connecting to default database Statement state = connect.createStatement(); System.out.println("!!!!!!!!!!Running 1st query!!!!!!!!!!"); System.out.println("Listing tables in hive"); ResultSet show_tables = state.executeQuery("show tables"); while (show_tables.next()) { System.out.println(show_tables.getString(1)); } System.out.println("!!!!!!!!!!Running 2nd query!!!!!!!!!!"); System.out.println("Describing olympics table"); ResultSet describe_table = state.executeQuery("describe olympics"); while (describe_table.next()) { System.out.println(describe_table.getString(1) + "\t" + describe_table.getString(2)); } } }
For running this program, you need to add some dependencies. Below is the list of dependencies that is required.
Hive-jdbc-*-standalone.jar
hive-jdb-*.jar
hive-metastore-*.jar
hive-service-*.jar
Note: Here * represents your Hive version. You can find all these jars in your $HIVE_HOME/lib directory and other than these you need to add two more jars Hadoop-core-1.2.1.jar & mysql-connector jar. You can download them from the below link.
https://drive.google.com/open?id=0ByJLBTmJojjzdmY0MzBHb1F5YXc
Now if you run this program, you will get the list of tables present in the Hive and the schema of the olympics table. You can see the same in the below screen shot.
Now let us try to run some Hive queries which require to launch a MapReduce job.
Add the below code in the same program.
System.out.println("!!!!!!!!!!Running 3rd query!!!!!!!!!!"); System.out.println("Calculating the average age of atheletes"); String select = "select avg(olympic_age) from olympics"; ResultSet avg_age = state.executeQuery(select); while(avg_age.next()){ System.out.println(avg_age.getString(1)); }
Now let us try to run the program with the added code.
This time we have got an error!!
Calculating the average age of atheletes Exception in thread "main" java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:296) at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392) at Hive_java.main(Hive_java.java:41)
To resolve this error, in the database connection line we need to provide the username and password of the Hadoop user. This user should have complete access and all the permissions of HDFS
This is our database connection loc.
Connection connect = DriverManager.getConnection(“jdbc:hive2://localhost:10000/default”, “”, “”); //connecting to default database
Now we will provide the details of Hadoop user with all the permissions to HDFS.
Connection connect = DriverManager.getConnection(“jdbc:hive2://localhost:10000/default”, “kiran”, “”); //connecting to default database
Now let us run the code again.
Yes! This time it is successful and you can see the result in the below screen shot.
!!!!!!!!!!Running 3rd query!!!!!!!!!! Calculating the average age of atheletes 26.405433646812956
This is how we can run Hive queries through Java programs using hive-jdbc conenctor.
We hope this blog helped you in running Hive queries through Java programs. Keep visiting our site www.acadgild.com for more updates on big data and other technologies.