Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Querying HBase using Apache Spark

Querying HBase using Apache Spark

 July 8  | 0 Comments

In this blog, we will see how to access and query HBase tables using Apache Spark.

Spark can work on data present in multiple sources like a local filesystem, HDFS, Cassandra, Hbase, MongoDB etc.

To get the basic understanding of HBase refer our Beginners guide to Hbase

Now, we will see the steps for accessing hbase tables through spark.

The first step first, you must start HMaster.

Create an HBASE_PATH environmental variable to store the hbase paths

Start the spark shell by passing HBASE_PATH variable to include all the hbase jars.

Now we have started hbase and spark we will create the connection to hbase through spark shell

Import the required libraries as given below:

import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.{HTableDescriptor,HColumnDescriptor}
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.{Put,HTable}

// create hbase configuration object

val conf = HBaseConfiguration.create()
val tablename = "Acadgild_spark_Hbase"

// create Admin instance and set input format

conf.set(TableInputFormat.INPUT_TABLE,tablename)
val admin = new HBaseAdmin(conf)

//Create table

if(!admin.isTableAvailable(tablename)){
print("creating table:"+tablename+"\t")
val tableDescription = new HTableDescriptor(tablename)
tableDescription.addFamily(new HColumnDescriptor("cf".getBytes()));
admin.createTable(tableDescription);
} else {
print("table already exists")
}

//Check the create table exists or not

admin.isTableAvailable(tablename)

If the table exists, it will return ‘True’.

Now we will put some data into it;

val table = new HTable(conf,tablename);
for(x <- 1 to 10){
var p = new Put(new String("row" + x).getBytes());
p.add("colfamily1".getBytes(),"column1".getBytes(),new String("value" + x).getBytes());
table.put(p);
}

Now we can create the HadoopRDD from the data present in HBase using newAPIHadoopRDD by InputFormat , output key and value class.

We can perform all the transformations and actions on created RDD

We hope this blog helped you in understanding integration of Spark HBase. Keep visiting our site www.acadgild.com for more updates on Big data and other technologies.

>