Big Data Hadoop & Spark

Exporting Files From HDFS To MySQL Using SQOOP

Apache Sqoop is a tool designed to efficiently transfer bulk data between Hadoop and structured datastores such as relational databases.
In this blog, we will see how to export data from HDFS to MySQL using sqoop, with weblog entry as an example.

Getting Started

Before you proceed, we recommend you to go through the blogs mentioned below which discuss importing and exporting data into HDFS using Hadoop shell commands:
HDFS commands for beginners
Integrating MySQL and Sqoop in Hadoop
If you wish to import data from MySQL to HDFS, go through this.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

Steps to Export Data from HDFS to MySQL

Follow below steps to transfer data from HDFS to MySQL table:
Step1:
Create a new database in the MySQL instance.
CREATE DATABASE db1;
NOTE: It is not mandatory to create a new database all the time. You can ‘use’ preexisting databases as well.
Step 2:
Create a table named acad.
USE db1;
CREATE TABLE acad (
emp_id int(2),
emp_name varchar(10),
emp_sal int(10),
date date);
The image below indicates that the table inside MySQL is empty.

Figure 1

Give a command, describe <table name>, to show the various fields and types of it.
This will help in comparing the type of data present inside HDFS which is ready to be mapped.

Figure 2

The files inside HDFS must have the same format as that of MySQL table, to enable the mapping of the data.
Refer the screenshot below to see two files which are ready to be mapped inside MySQL.

Figure 3

Hadoop
Step3:
Export the input.txt and input2.txt file from HDFS to MySQL
sqoop export  –connect jdbc:mysql://localhost/db1 –username sqoop –password root –table acad –export-dir /sqoop_msql/ -m 1
Where: -m denotes the number of mapper you want to run
NOTE: The target table must exist in MySQL.
To obtain a filtered map, we can use the following option:
–input-fields-terminated-by ‘/t’ –MySQL-delimiters
Where ‘/t’ denotes tab.
Once the table inside MySQL and data inside HDFS is ready to be mapped, we can execute the export command. Refer the screenshot below:

Figure 4

Once you give the export command, the job completion statement should be displayed.

Figure 5

Note that only Map job needs to be completed. Other error messages will be displayed because of software version compatibility. These errors can be ignored.

How it Works

Sqoop calls the JDBC driver written in the –connect statement from the location where Sqoop is installed. The –username and –password options are used to authenticate the user and Sqoop, internally generates the same command against the MySQL instance.
The –table argument defines the MySQL table name, that will receive the data from HDFS. This table must be created prior to running the export command. Sqoop uses the number of columns, their types, and the metadata of the table to validate the data inserted from the HDFS directory. When the export statement is executed, it initiates and creates INSERT statements in MySQl. For example, the export-job will read each line of the input.txt file from HDFS and produces the following intermediate statements.

INSERT INTO acad VALUES (5,”HADOOP”,50000,’2011-03-21′);
INSERT INTO acad VALUES(6,”SPARK”,600000,’2011-03-22′);
INSERT INTO acad VALUES(7,”JAVA”,700000,’2011-03-23′);
By default, Sqoop export creates INSERT statements. If the –update-key argument is stated, UPDATE statements will be created instead.
The -m argument sets the number of map jobs for reading the file splits from HDFS. Each mapper will have its own connection to the MySQL Server.
Now, on querying inside MySQL, we see that all the data is mapped inside the table.

Figure 6

Hope this blog was useful in understanding the process of exporting data from HDFS to MySQL, using sqoop. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Related Popular Courses:

HADOOP CERTIFICATION COST

GOOGLE ANDROID DEVELOPER COURSE

KAFKA ARCHITECTURE

BEST ONLINE DATA SCIENCE PROGRAMS

DATA ANALYTICS CERTIFICATION COURSES

Hadoop

prateek

An alumnus of the NIE-Institute Of Technology, Mysore, Prateek is an ardent Data Science enthusiast. He has been working at Acadgild as a Data Engineer for the past 3 years. He is a Subject-matter expert in the field of Big Data, Hadoop ecosystem, and Spark.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close