In this blog we will be discussing the ways to perform query on HBase using Apache Drill.
We recommend users to go through our previous blog to install Apache Drill before proceeding ahead in this post.
We will start our discussion with starting of Drill session which is done moving into the bin folder of our Drill installed directory and by typing ./drill-embedded.
Next, open your browser and type localhost:8047 to enter into Drill’s Web UI.
By default, the Zookeeper port will be 2181; if you have changed this port, you need to give the new port number of Zookeeper here or else leave the configurations as it is.
Let’s create a table in HBase using the below command:
Syntax: create ‘<table-name>’,’<column-family1>’ ,’<column-family2>’ …….
hbase(main):021:0> create 'customers','address','order' 0 row(s) in 0.4030 seconds => Hbase::Table - customers hbase(main):022:0> list TABLE customers 1 row(s) in 0.0110 seconds => ["customers"]
Insert some data into HBase
hbase(main):026:0> put 'customers','john','address:city','Boston' 0 row(s) in 0.0290 seconds hbase(main):027:0> put 'customers','john','address:state','Mashitushes' 0 row(s) in 0.0060 seconds hbase(main):028:0> put 'customers','john','address:street','street1' 0 row(s) in 0.0130 seconds hbase(main):029:0> put 'customers','john','order:number','ORD-15' 0 row(s) in 0.0260 seconds hbase(main):030:0> put 'customers','john','order:amount','15' 0 row(s) in 0.0120 seconds
Checking the Data in HBase Table:
hbase(main):011:0* scan 'customers' ROW COLUMN+CELL john column=address:city, timestamp=1463487168899, value=Boston john column=address:state, timestamp=1463487190136, value=Mashitushes john column=address:street, timestamp=1463487198088, value=street1 john column=order:amount, timestamp=1463487218363, value=15 john column=order:number, timestamp=1463487213095, value=ORD-15 1 row(s) in 0.0250 seconds
Now, open the Drill session and type use hbase to change the storage to HBase.
Now, the default storage has been changed to HBase. You can now perform Drill queries on HBase tables.
Now, let’s see the contents of the table customers, using the command select * form customers.
You can see that we did not get the output in proper format. Drill uses UTF8 text format but HBase stores its data in byte arrays. Therefore, we need to add CONVERT_FROM statement to view the output in proper format.
Now, let’s write the query to convert the byte arrays to UTF8 format. The query looks as shown below:
SELECT CONVERT_FROM(row_key, 'UTF8') AS customer_name, CONVERT_FROM(customers.address.city, 'UTF8') AS city, CONVERT_FROM(customers.address.state, 'UTF8') AS state, CONVERT_FROM(customers.address.street, 'UTF8') AS street, CONVERT_FROM(customers.order.amount, 'UTF8') AS order_value FROM customers;
From the above screenshot, you can see it has shown the name, city, state and street of the customers.
Note: To access the data in an HBase table through a Drill query, we need to access it through the dot operator. The syntax you need to follow is tablename.columnfamilyname.columnnname.
Hope this post has been helpful in understanding how to configure Drill to query on tables HBase. In case of any questions, feel free to comment below and we will get back to you at the earliest.
Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.