Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Querying HBase using Apache Spark

Querying HBase using Apache Spark

 July 19  | 0 Comments

In this blog we will see how to access hbase tables using spark.

Spark can work on data present in multiple sources like HDFS,Cassandra,Hbase,MongoDB etc.

To get the basic understanding of hbase refer our Beginners guide to Hbase

According to Spark documentation , “RDDs can be created from Hadoop InputFormats.” InputFormat in the Hadoop are abstraction for anything that can be processed in a MapReduce job. HBase uses a TableInputFormat, it makes easy to use Spark with HBase.

Now we will see the steps for accessing hbase tables through spark.

First start Hbase server

 

Create an HBASE_PATH environmental variable to store the hbase paths

 

Start the spark shell by passing HBASE_PATH variable to include all the hbase jars.

 

Now we have started hbase and spark we will create the connection to hbase through spark shell

Import the required libraries

 

// create hbase configuration object

 

// create Admin instance and set input format

 

//Create table

 

//Check the create table exists or not

 

 

Now we have created the table we will put some data into it

 

Now we can create the HadoopRDD from the data present in HBase using newAPIHadoopRDD by InputFormat , output key and value class.

 

We can perform all the transformations and actions on created RDD

 

We hope this blog helped you in understanding integration of Spark HBase. Keep visiting our site www.acadgild.com for more updates on Big data and other technologies.

>