As we know, Apache Hive is a data warehouse software that facilitates reading, writing and managing large data sets residing in distributed storage using SQL.

Let’s consider a scenario, where the user is looking forward to performing an operation on Hive server, and the Hadoop cluster or Hive software setup is not installed in his/her system. The solution for the above scenario is that the user can write codes in other languages and access Hive server using Apache Thrift interface.

In this post, we will learn about the concept of Thrift and the working of Hive Thrift server using code sample of Java for accessing the Hive server.

Note:
In order to make good use of this post, the user must have the Hadoop cluster with Hive installed in his/her system to implement the concepts explained, to access Hive server using Apache Thrift.

So, what is Apache Thrift?

Apache Thrift is a software framework for scalable cross-language services development, which combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Perl, C#, JavaScript, Node.js and other languages.

When should you use Thrift?

Thrift can be used when developing a web service that uses a service developed in one language access that is in another language.

What is a HiveServer?

HiveServer is a service that allows a remote client to submit requests to Hive, using a variety of programming languages, and retrieve results. It is built on Apache Thrift, therefore it is sometimes called as the Thrift server.

In the context of Hive, Java language can be used to access Hive server. The Thrift interface acts as a bridge, allowing other languages to access Hive, using a Thrift server that interacts with the Java client.
Now, let’s look at an example of accessing Hive Server using Thrift in Java.

Example of accessing Hive Server using Thrift in Java:

In the below example, we are creating a table named testHiveDriverTable1 with columns named key and value in Hive server using Apache thrift interface in Java language.

import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient
{
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
/**
* @param args
* @throws SQLException
*/
public static void main(String args[]) throws SQLException{
try{
Class.forName(driverName);
}
catch(ClassNotFoundException e){
//TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
//replace "hive" here with the name of the user the queries should run
Connection con                                 = DriverManager.getConnection("jdbc:hive2://localhost:10000/default","acadgild","");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable1";
stmt.execute("drop table if exists " +tableName);
stmt.execute("create table " +tableName+ "(key int, value string)");
//show tables
String sql = "show tables " +tableName+ "";
System.out.println("Running: " +sql);
ResultSet res = stmt.executeQuery(sql);
if(res.next()){
System.out.println(res.getString(1));    
}
//describe table
sql = "describe " +tableName;
System.out.println("Running: " +sql);
res = stmt.executeQuery(sql);
while(res.next()){
System.out.println(res.getString(1) + "\t" +res.getString(2));
}
}
}

 

Code Explanation:

  • In line 6, we are taking a class named HiveJdbcClient.
  • In line 8, we are declaring a private static string variable named driverName, which will store the string “org.apache.hive.jdbc.HiveDriver” .
  • In line 14, we are declaring a try catch block.
  • In line 15, the Class.forName(driverName) method returns the Class object associated with the class or interface with the given string name, using the given class loader.
  • Line 17 throws an error ClassNotFoundException, if the driverName class not found and exits the program.
  • In line 23, we are trying to establish a connection with hive server where localhost:10000 is the Hive server port number and acadgild is the password of the url localhost:10000.
  • In line 24, we are using createstatement() method to create a statement instance for sending SQL statements to the database. Here, Statement is an interface that represents an SQL statement.
  • In line 25, we are declaring a String variable named tableName, which will store the string “testHiveDriverTable1“.
  • In line 26, in order to execute an SQL query, we should use execute method of the interface statement. Here “drop table if exists�? is a statement which compares the table name and drops the table if it already exists in the Hive server default database.
  • In line 27, we are creating a table named testHiveDriverTable1 and its columns as key and value and there data types are int and string, respectively.
  • In line 29, we are declaring a string named sql, where we are storing the value as the command show tables with the table name testHiveDriverTable1.
  • In line 30, we are printing the string sql variable value, command show tables with the table name testHiveDriverTable1.
  • In line 31, we are declaring a ResultSet interface object res and storing show tables command value of String variable sql into ResultSet interface object res.
  • In line 32, we are declaring an if condition; if interface res has next value, it will print the table present in the hive server.
  • In line 33, we are storing the string “describe�? with the table name testHiveDriverTable1 into the string variable sql.
  • In line 36, we are printing the string sql variable value.
  • In line 38, we are executing SQL command describe on the table name testHiveDriverTable1 and to store that table contents into the ResultSet interface object res.
  • In line 49, we are declaring a while condition until Object res has next value performs the operation within the while loop.
  • Line 50 prints the tables present in the Hive server table using show tables command and print the description of the existing table.

Output:

Before executing the Java application, make sure all the Hadoop daemons are in running state and then use the below command to start the Hive Thrift server.

hive –service hiveserver2:

Once the Hive Thrift server service is started in the terminal, then the user can execute the Java application to get the output.

Once we execute the program, we can observe in the below image that we have created a new table named testHiveDriverTable1 in Hive database, with two columns as key and value with their data types as int and string.
As from the above steps we can observe we have created a new table named “testHiveDriverTable1”  in Hive Server using Apache thrift.

Hope this post has been helpful in understanding the concept of Thrift Server in Hive. Keep visiting our website for more post on Big Data and other technologies.