All CategoriesBig Data Hadoop & Spark - Advanced

Data Migration from SQL to HBase Using MapReduce

Data Migration from SQL to NoSQL

Data migration is the process of transferring data from one system to another by changing the storage or database or the application. In this tutorial, let us learn how to migrate the data present in MySQL to HBase which is a NoSQL database using Mapreduce.

MySQL is one of the most widely used Relational Database systems. But due to the rapid growth of data nowadays people are searching for better alternatives to store and process their data. This is how Hbase came into existence which is a Hadoop database capable of storing a huge amount of data in the clusters and can scale massively.

Let us now see how to migrate the data present in MySQL to HBase using Hadoop’s map reduce.

Here, for reading the data in MySQL, we will be using DBInputFormat which is as follows:

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.sql.ResultSet;
import java.sql.PreparedStatement;
import java.sql.SQLException;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.lib.db.DBWritable;
public class DBInputWritable implements Writable, DBWritable
{
	private int id;
	private String name;
	//read() is used when reading from Database
	public void readFields(DataInput in) throws IOException { }
	public void readFields(ResultSet rs) throws SQLException
	 //Resultset object represents the data returned from a SQL statement
	 {
	 id = rs.getInt(1);
	 name = rs.getString(2);
	 }
	//write() is required when saving to Database
	public void write(DataOutput out) throws IOException { }
	public void write(PreparedStatement ps) throws SQLException
	 {
	 ps.setInt(1, id);
	 ps.setString(2, name);
	 }
	public int getId()
	 {
	 return id;
	 }
	public String getName()
	 {
	 return name;
	 }
}

Using DBInput format, our MapReduce code will be able to read the data from MYSQL. In the table which we are using in this example, we have two fields emp_id & emp_name. So we will take the two fields from MYSQL table and store them in HDFS.

Here is our data present in MYSQL. In the database Acadgild we have employee table and in that table, we have two columns emp_id & emp_name as shown in the below screenshot.

Our DBInputFormat will read this data, so this will be the input of our mapper class. To store this data in Hbase, we need to create a table in Hbase. You can use the below Hbase command to create a table.

create 'employee','emp_info'

In the above screenshot, you can see that employee table has been created in HBase. emp_info is the column family which contains the information of the employee.

The mapper class which reads the input from MySQL table is as follows.

import java.io.IOException;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.util.Bytes;
public class DBInputFormatMap extends Mapper<LongWritable, DBInputWritable, ImmutableBytesWritable, Text>
{
   protected void map(LongWritable id, DBInputWritable value, Context context)
   {
     try
     {
    	 String line = value.getName();
    	 String cd = value.getId()+"";
    	 context.write(new ImmutableBytesWritable(Bytes.toBytes(cd)),new Text(line));
     }
	 catch(IOException e)
     {
        e.printStackTrace();
     }
	 catch(InterruptedException e)
     {
        e.printStackTrace();
     }
   }
}

Above is the Mapper class implementation which can read the data from a MySQL table. The output of this Mapper class is the emp_id as key and emp_name as value. HBase writes data into it as bytes, so we need to take the key as ImmutableBytesWritable.

So from this mapper class MySQL table data is read and the data is kept as key and value. Key will be the same in both MySQL and HBase.

Now the key and the rest of the columns of the MySQL table will be sent to the reducer. For writing data into the HBase a reducer class called TableReducer is called. Using TableReducer class we need to write the MySQL data into Hbase and is as follows:

Hadoop

import java.io.IOException;
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableReducer;
import org.apache.hadoop.hbase.util.Bytes;
import org.apache.hadoop.io.Text;
public class Reduce extends TableReducer<
ImmutableBytesWritable, Text, ImmutableBytesWritable> {
public void reduce(ImmutableBytesWritable key, Iterable<Text> values,
 Context context) throws IOException, InterruptedException {
 String name=null;
 for(Text val : values)
 {
 name =val.toString();
 }
 // Put to HBase
 Put put = new Put(key.get());
 put.add(Bytes.toBytes("emp_info"), Bytes.toBytes("name"),Bytes.toBytes(name));
 context.write(key,  put);
 }
}

This Tablereducer receives key and the rest of the columns as values. Now we need to write a for-each loop to iterate the rest of the columns. In this particular table, we have only two columns so we have taken a variable name which stores the emp_name and using Put class provided by HBase we write the data into HBase column family.

Put put = new Put(key.get());
put.add(Bytes.toBytes("emp_info"), Bytes.toBytes("name"),Bytes.toBytes(name));

The above two lines will write the data into our HBase table. emp_info is the column family here and below is the driver class implementation of this program.

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil;
import org.apache.hadoop.hbase.mapreduce.TableOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.mapreduce.lib.db.DBInputFormat;
import org.apache.hadoop.io.Text;
public class RDBMSToHDFS
	{
	 public static void main(String[] args) throws Exception
	 {
	 Configuration conf = new Configuration();
	 DBConfiguration.configureDB(conf,   //Mysql user information
	 "com.mysql.jdbc.Driver",
	 "jdbc:mysql://localhost/Acadgild",   //Mysql database URI
	 "root",					//Mysql User_name
	 "root_usr_password");			//Mysql user password
	 Job job = new Job(conf);
	 job.getConfiguration().setInt("mapred.map.tasks", 1);
	 job.setJarByClass(RDBMSToHDFS.class);
	 job.setMapperClass(DBInputFormatMap.class);
	 TableMapReduceUtil.initTableReducerJob("employee",Reduce.class, job);
	 job.setMapOutputKeyClass(ImmutableBytesWritable.class);
	 job.setMapOutputValueClass(Text.class);
	 TableMapReduceUtil.initTableReducerJob("employee",Reduce.class, job);
	 job.setInputFormatClass(DBInputFormat.class);
	  job.setOutputFormatClass(TableOutputFormat.class);
	 job.setInputFormatClass(DBInputFormat.class);
	 job.setNumReduceTasks(1);
	 DBInputFormat.setInput(
	 job,
	 DBInputWritable.class,
	 "employee",  //input table name
	 null,
	 null,
	 new String[] { "emp_id", "emp_name"} // table columns
	 );
	 System.exit(job.waitForCompletion(true) ? 0 : 1);
	 }
}

We have built this program using Maven and here are the Maven dependencies for this program.

<dependencies>
  <!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-server -->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-server</artifactId>
    <version>1.1.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-client -->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-client</artifactId>
    <version>1.1.2</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.hbase/hbase-common -->
<dependency>
    <groupId>org.apache.hbase</groupId>
    <artifactId>hbase-common</artifactId>
    <version>1.1.2</version>
</dependency>
<dependency>
    <groupId>mysql</groupId>
    <artifactId>mysql-connector-java</artifactId>
    <version>5.1.36</version>
</dependency>
</dependencies>

We can build an executable jar with these dependencies by adding Maven assembly plugin into the pom.xml file. The plugin is as follows:

<plugin>
      <artifactId>maven-assembly-plugin</artifactId>
      <configuration>
        <archive>
          <manifest>
            <mainClass>Mysql_to_Hbase.RDBMSToHDFS</mainClass>
          </manifest>
        </archive>
        <descriptorRefs>
          <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
      </configuration>
</plugin>

Now we can use the command mvn clean compile assembly:single to build an executable jar for this program. After running this command, you need to get the success message as shown in the below screenshot.

In your project directory, inside target folder, you will be able to see the jar file created.

We will now run this jar as a normal Hadoop jar. After which we can check for the output in HBase table.

In the above screenshot, you can see that the job has been completed successfully. Let us now check for the output in our HBase table.

In the above screenshot, you can see the data in HBase table after running the jar file. We have successfully migrated the data present in MySQL table to HBase table using MapReduce.

Hope this blog helped you in understanding how to transfer data present in MYSQL to HBase using MapReduce. Keep visiting our site www.acadgild.com for more updates on big data and other technologies.

Hadoop

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

Close