Big Data Hadoop & Spark

Exporting Files From HDFS To MySQL Using SQOOP

Apache Sqoop is a tool designed to efficiently transfer bulk data between Hadoop and structured datastores such as relational databases.
In this blog, we will see how to export data from HDFS to MySQL using sqoop, with weblog entry as an example.

Getting Started

Before you proceed, we recommend you to go through the blogs mentioned below which discuss importing and exporting data into HDFS using Hadoop shell commands:
HDFS commands for beginners
Integrating MySQL and Sqoop in Hadoop
If you wish to import data from MySQL to HDFS, go through this.

Steps to Export Data from HDFS to MySQL

Follow below steps to transfer data from HDFS to MySQL table:
Create a new database in the MySQL instance.
NOTE: It is not mandatory to create a new database all the time. You can ‘use’ preexisting databases as well.
Step 2:
Create a table named acad.
USE db1;
emp_id int(2),
emp_name varchar(10),
emp_sal int(10),
date date);
The image below indicates that the table inside MySQL is empty.

Figure 1

Give a command, describe <table name>, to show the various fields and types of it.
This will help in comparing the type of data present inside HDFS which is ready to be mapped.

Figure 2

The files inside HDFS must have the same format as that of MySQL table, to enable the mapping of the data.
Refer the screenshot below to see two files which are ready to be mapped inside MySQL.

Figure 3

Export the input.txt and input2.txt file from HDFS to MySQL
sqoop export  –connect jdbc:mysql://localhost/db1 –username sqoop –password root –table acad –export-dir /sqoop_msql/ -m 1
Where: -m denotes the number of mapper you want to run
NOTE: The target table must exist in MySQL.
To obtain a filtered map, we can use the following option:
–input-fields-terminated-by ‘/t’ –MySQL-delimiters
Where ‘/t’ denotes tab.
Once the table inside MySQL and data inside HDFS is ready to be mapped, we can execute the export command. Refer the screenshot below:

Figure 4

Once you give the export command, the job completion statement should be displayed.

Figure 5

Note that only Map job needs to be completed. Other error messages will be displayed because of software version compatibility. These errors can be ignored.

How it Works

Sqoop calls the JDBC driver written in the –connect statement from the location where Sqoop is installed. The –username and –password options are used to authenticate the user and Sqoop, internally generates the same command against the MySQL instance.
The –table argument defines the MySQL table name, that will receive the data from HDFS. This table must be created prior to running the export command. Sqoop uses the number of columns, their types, and the metadata of the table to validate the data inserted from the HDFS directory. When the export statement is executed, it initiates and creates INSERT statements in MySQl. For example, the export-job will read each line of the input.txt file from HDFS and produces the following intermediate statements.

INSERT INTO acad VALUES (5,”HADOOP”,50000,’2011-03-21′);
INSERT INTO acad VALUES(6,”SPARK”,600000,’2011-03-22′);
INSERT INTO acad VALUES(7,”JAVA”,700000,’2011-03-23′);
By default, Sqoop export creates INSERT statements. If the –update-key argument is stated, UPDATE statements will be created instead.
The -m argument sets the number of map jobs for reading the file splits from HDFS. Each mapper will have its own connection to the MySQL Server.
Now, on querying inside MySQL, we see that all the data is mapped inside the table.

Figure 6

Hope this Sqoop export tutorial was useful in understanding the process of exporting data from HDFS to MySQL. Keep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Related Popular Courses:








An alumnus of the NIE-Institute Of Technology, Mysore, Prateek is an ardent Data Science enthusiast. He has been working at Acadgild as a Data Engineer for the past 3 years. He is a Subject-matter expert in the field of Big Data, Hadoop ecosystem, and Spark.

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles