Big Data Hadoop & Spark

Querying HBase Using Apache Drill

In this blog we will be discussing the ways to perform query on HBase using Apache Drill.

We recommend users to go through our previous blog to install Apache Drill before proceeding ahead in this post.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

We will start our discussion with starting of Drill session which is done moving into the bin folder of our Drill installed directory and by typing ./drill-embedded.

Next, open your browser and type localhost:8047 to enter into Drill’s Web UI.

Drill web ui

Now, click on ‘Storage’ and enable HBase.

Drill storage plugins

Now, click on ‘update’ and you can see the default configurations of HBase.

Hbase drill configuration

By default, the Zookeeper port will be 2181; if you have changed this port, you need to give the new port number of Zookeeper here or else leave the configurations as it is.

Let’s create a table in HBase using the below command:

Syntax: create ‘<table-name>’,’<column-family1>’ ,’<column-family2>’ …….

Hadoop

hbase(main):021:0> create 'customers','address','order'
0 row(s) in 0.4030 seconds
=> Hbase::Table - customers
hbase(main):022:0> list
TABLE
customers
1 row(s) in 0.0110 seconds
=> ["customers"]

Insert some data into HBase

hbase(main):026:0> put 'customers','john','address:city','Boston'
0 row(s) in 0.0290 seconds
hbase(main):027:0> put 'customers','john','address:state','Mashitushes'
0 row(s) in 0.0060 seconds
hbase(main):028:0> put 'customers','john','address:street','street1'
0 row(s) in 0.0130 seconds
hbase(main):029:0> put 'customers','john','order:number','ORD-15'
0 row(s) in 0.0260 seconds
hbase(main):030:0> put 'customers','john','order:amount','15'
0 row(s) in 0.0120 seconds

Checking the Data in HBase Table:

hbase(main):011:0* scan 'customers'
ROW COLUMN+CELL
john column=address:city, timestamp=1463487168899, value=Boston
john column=address:state, timestamp=1463487190136, value=Mashitushes
john column=address:street, timestamp=1463487198088, value=street1
john column=order:amount, timestamp=1463487218363, value=15
john column=order:number, timestamp=1463487213095, value=ORD-15
1 row(s) in 0.0250 seconds

Now, open the Drill session and type use hbase to change the storage to HBase.

selecting hbase storage in drill

Now, the default storage has been changed to HBase. You can now perform Drill queries on HBase tables.

displaying hbase tables from drill

Now, let’s see the contents of the table customers, using the command select * form customers.

You can see that we did not get the output in proper format. Drill uses UTF8 text format but HBase stores its data in byte arrays. Therefore, we need to add CONVERT_FROM statement to view the output in proper format.

Now, let’s write the query to convert the byte arrays to UTF8 format. The query looks as shown below:

SELECT CONVERT_FROM(row_key, 'UTF8') AS customer_name, CONVERT_FROM(customers.address.city, 'UTF8') AS city, CONVERT_FROM(customers.address.state, 'UTF8') AS state, CONVERT_FROM(customers.address.street, 'UTF8') AS street, CONVERT_FROM(customers.order.amount, 'UTF8') AS order_value FROM customers;

converting bytes to utf8 in drill

From the above screenshot, you can see it has shown the name, city, state and street of the customers.

Note: To access the data in an HBase table through a Drill query, we need to access it through the dot operator. The syntax you need to follow is tablename.columnfamilyname.columnnname.

Hope this post has been helpful in understanding how to configure Drill to query on tables HBase. In case of any questions, feel free to comment below and we will get back to you at the earliest.

Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.

Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close