Free Shipping

Secure Payment

easy returns

24/7 support

Sqoop Tutorial Beginners Guide

 July 19  | 0 Comments

In this tutorial, we will be discussing about the basics of Sqoop. You will also learn how to import data from RDBMS to HDFS and to export data from HDFS into RDBMS using Sqoop.

Note: We hope that Sqoop is already installed in your system and all the required connectors and jar files are imported into lib folder of your Sqoop.

Importing a Table from RDBMS to HDFS

Here we are using a sample MySQL data and importing it into HDFS using Sqoop. Let us see how to create a table in MySQL.

  • Login to your MySQL shell.
  • Create a database by using the below command:
create database Acadgild;
  • To work in the created database, use the following command:
use Acadgild

Now we are in Acadgild database.

  • Create a table employee by using the below command:
create table employee(emp_id INT NOT NULL AUTO_INCREMENT,emp_name VARCHAR(100),emp_sal INT,PRIMARY KEY(emp_id));

A table has been created with name employee and with the columns emp_id,emp_name,emp_sal. The scheme of this table can be checked using the following command:

describe employee;
  • Let’s now try inserting some sample data into the created table by using the below command:
INSERT INTO employee(emp_id,emp_name,emp_sal) VALUES (1,'kk',50000), (4,'onk',6000),(7,'mj',70000);
  • We can check the inserted using this command: select * from employee;

We have successfully created a table in MySQL, and we will now import the same into HDFS by using Sqoop.
The following command can be used to import the table into HDFS.
Syntax:

sqoop import --connect jdbc:mysql://localhost/db_name --username $user_name --password $password --table tbl_name -m1

Here we are connecting to MySQL through JDBC connectors and using the database Acadgild. Here it is necessary to specify the MySQL ‘s username and password and the table name.
In our case, the command will be as shown below:

sqoop import --connect jdbc:mysql://localhost/Acadgild --username root --password kiran.0541 --table employee -m1

Here ‘-m’ specifies the number of map task that can be run simultaneously and ‘m1’ means that only one map task can run.
Note: If we do not use -m1 at the end of the statement, for each record in the MySQL table we will get separate files in the HDFS.
Now the data in RDBMS has been successfully imported into HDFS. By default, the files will be stored here: /user/$user_name/table_name/part-m-00000 file.
Refer to the below image for the same.
Note: If we do not use -m1 at the end of the statement, for each record in the MySQL table we will get separate files in the HDFS.
Now the data in RDBMS has been successfully imported into HDFS. By default, the files will be stored here: /user/$user_name/table_name/part-m-00000 file.
Refer to the below image for the same.

>