Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Querying HBase Using Apache Drill

Querying HBase Using Apache Drill

 July 14  | 0 Comments

In this blog we will be discussing the ways to perform query on HBase using Apache Drill.

We recommend users to go through our previous blog to install Apache Drill before proceeding ahead in this post.

We will start our discussion with starting of Drill session which is done moving into the bin folder of our Drill installed directory and by typing ./drill-embedded.

Next, open your browser and type localhost:8047 to enter into Drill’s Web UI.

By default, the Zookeeper port will be 2181; if you have changed this port, you need to give the new port number of Zookeeper here or else leave the configurations as it is.

Let’s create a table in HBase using the below command:

Syntax: create ‘<table-name>’,’<column-family1>’ ,’<column-family2>’ …….

 

hbase(main):021:0> create 'customers','address','order'
0 row(s) in 0.4030 seconds
=> Hbase::Table - customers
hbase(main):022:0> list
TABLE
customers
1 row(s) in 0.0110 seconds
=> ["customers"]

Insert some data into HBase

hbase(main):026:0> put 'customers','john','address:city','Boston'
0 row(s) in 0.0290 seconds
hbase(main):027:0> put 'customers','john','address:state','Mashitushes'
0 row(s) in 0.0060 seconds
hbase(main):028:0> put 'customers','john','address:street','street1'
0 row(s) in 0.0130 seconds
hbase(main):029:0> put 'customers','john','order:number','ORD-15'
0 row(s) in 0.0260 seconds
hbase(main):030:0> put 'customers','john','order:amount','15'
0 row(s) in 0.0120 seconds

Checking the Data in HBase Table:

hbase(main):011:0* scan 'customers'
ROW COLUMN+CELL
john column=address:city, timestamp=1463487168899, value=Boston
john column=address:state, timestamp=1463487190136, value=Mashitushes
john column=address:street, timestamp=1463487198088, value=street1
john column=order:amount, timestamp=1463487218363, value=15
john column=order:number, timestamp=1463487213095, value=ORD-15
1 row(s) in 0.0250 seconds

Now, open the Drill session and type use hbase to change the storage to HBase.

Now, the default storage has been changed to HBase. You can now perform Drill queries on HBase tables.

Now, let’s see the contents of the table customers, using the command select * form customers.

You can see that we did not get the output in proper format. Drill uses UTF8 text format but HBase stores its data in byte arrays. Therefore, we need to add CONVERT_FROM statement to view the output in proper format.

Now, let’s write the query to convert the byte arrays to UTF8 format. The query looks as shown below:

SELECT CONVERT_FROM(row_key, 'UTF8') AS customer_name, CONVERT_FROM(customers.address.city, 'UTF8') AS city, CONVERT_FROM(customers.address.state, 'UTF8') AS state, CONVERT_FROM(customers.address.street, 'UTF8') AS street, CONVERT_FROM(customers.order.amount, 'UTF8') AS order_value FROM customers;

From the above screenshot, you can see it has shown the name, city, state and street of the customers.

Note: To access the data in an HBase table through a Drill query, we need to access it through the dot operator. The syntax you need to follow is tablename.columnfamilyname.columnnname.

Hope this post has been helpful in understanding how to configure Drill to query on tables HBase. In case of any questions, feel free to comment below and we will get back to you at the earliest.

Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.

>