Big Data Hadoop & Spark

Loading Data Into HBase Using PIG Scripts.

In this blog we will be discussing the loading of data into HBase using Pig scripts.

Before going further into our explanation we will be recalling our Pig and HBase basic concepts with the given blog for beginners on Pig and HBase.

Link for HBase and Pig blogs:

Beginners-Guide-for-HBase

Beginners-Guide-for-Pig

To implement the concepts discussed further in the blog , user is expected to have a Hadoop cluster with Pig and HBase running on it.

Note: You need to download the following versions of Hadoop, HBase and Pig to implement the steps discussed to load the data into HBase using Pig.

Moving forward to the aim of this blog let us see step by step clarification regarding transferring data into HBase using Pig.

We are taking sample data set of student which will be loaded into HBase. We have attached snapshot with every step for better understanding.

You can download this sample data set for your own practice from the below link.

DATASET

Please refer the description for the above data set containing  seven columns named as:

StudentName, sector, DOB, qualification, score, state, randomName.

We will be copying the data set in to HDFS which will be further loaded into HBase.

We will be including few jar files of HBase to the Pig classpath.

PIG_CLASSPATH=/home/hadoop/HADOOP/hbase-0.98.4-hadoop2/lib/hbase-server-0.98.4-hadoop2:/home/hadoop/HADOOP/hbase-0.98.4-hadoop2/lib/hbase-*.jar;

We will now start HBase shell and create a table.

We only need this table as skeleton so PIG can Store data inside this by referring the table name.

We can come out from HBase by typing exit and switch to PIG grunt shell.

Hadoop

Once we are inside PIG mode we can load data from HDFS to Alias relation.

Now we can transfer the data inside HBase by STORE command.

We need to ensure that we give the correct name for table name created inside HBase. Also the parameters should be kept in mind to avoid mistake.

Once the success message comes as shown below , it is confirmed our data is loaded inside HBase.

The result can be displayed through scan command followed by table name inside quotes( ‘ ‘ ).

Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Hadoop

2 Comments

  1. Just to point, the column defined for the Hbase table do not in sync with actual data. For example you can see the value for DOB.
    I think,
    The student Name is considered as rowKey. Hence no need to mention that column at the STORE command in grunt shell.

  2. If you observe in the example Student_data:DOB has the value MBBS which is incorrect,similarly student_data:Name has the value corresponding to sector.
    Instead the LOAD query should be
    grunt> STORE rawD INTO ‘hbase://acadstudent’ USING org.apache.pig.backend.hadoop.hbase.HBaseStorage(‘studentdata:sector,studentdata:dob,studentdata:qualification,studentdata:score,studentdata:state,studentdata:randomName’);
    MOCK column=studentdata:dob, timestamp=1506875384143, value=20-10-2000
    MOCK column=studentdata:qualification, timestamp=1506875384143, value=BCOM
    MOCK column=studentdata:randomName, timestamp=1506875384143, value=madison`
    MOCK column=studentdata:score, timestamp=1506875384143, value=100
    MOCK column=studentdata:sector, timestamp=1506875384143, value=goverenment
    MOCK column=studentdata:state, timestamp=1506875384143, value=alabama
    Please rectify the example.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close