All CategoriesBig Data Hadoop & Spark - Advanced

How to Export Data From Hive to MongoDB

In this blog, we will be discussing steps to Export table contents from Hive to MongoDB. Before, moving further let us have a glimpse on Hive, the difference between SQL and NoSQL, and MongoDB.

What is Hive in Hadoop?


Hive is an open source query based data warehousing tool which works on top of Hadoop Distributed File System. Hive was created to make it possible for analysts with strong SQL skills (but meager Java programming skills) to run queries on the huge volumes of data stored in HDFS. Today, Hive is a successful Apache project used by many organizations as a general-purpose, scalable data processing platform.
For more information on Hive and it’s working you can refer our below blogs:
Hive Beginners Guide
Partitioning in Hive
Bucketing in Hive
Pokemon Data Analysis Using Hive
Transactions in Hive

Difference between SQL and NoSQL

Since, SQL isn’t ideal for every big data problem – which will be not a good fit for building complex machine-learning algorithms, and not great for many analyses, Organizations are moving forward to store and process data using Next generation databases known as NoSQL databases.  We can refer below few important differences between SQL and NoSQL databases.

SQL NoSQL
SQL databases are known as Relational Database (RDBMS). NoSQL database is primarily called as non-relational or distributed database.
SQL databases are table based databases. This means that SQL databases represent data in form of tables which consists of n number of rows. Schema is Predefined. NoSQL databases are the collection of the key-value pair, documents, graph databases or wide-column stores which do not have standard schema definitions which it needs to adhered to.
SQL databases use SQL ( structured query language ) for defining and manipulating the data, which is very powerful. But, the schema-based approach used by relational databases are the poor fit for semi-structured and unstructured data. NoSQL databases have the dynamic schema for unstructured data. In NoSQL database, queries are focused on the collection of documents. Sometimes it is also called as UnQL (Unstructured Query Language). The syntax of using UnQL varies from database to database. And it can handle semi and unstructured data too.

Since we know now, the advantages and differences between SQL and NoSQL database. Let us know what is MongoDB and the procedure to transfer data from hive to MongoDB.
Hadoop

MongoDB

MongoDB is one of the most used popular NoSQL database tool where data or records are stored in Json (javascript object notation) like syntax as documents.
Developed by the founders of DoubleClick, written in C++.  The following are some of MongoDB benefits and strengths:
Scalability: It is horizontally scalable i.e, you can reduce the workload by increasing the number of nodes in our resource pool instead of relying on a stand alone node.
Speed: For simple queries, it gives the good performance, as all the related data are in a single document which eliminates the join operations.
Manageable: It is easy to use for both developers and administrators. This also gives the ability to share the database.
Dynamic Schema: It provides flexibility to evolve the data schema without modifying the existing data.
For more information on MongoDB and it’s working you can refer our below blogs:
Introduction to MongoDB
MongoDB GUI
Now let us see the procedure to transfer data from a Hive to MongoDB.
Start Hive:
Let us start hive shell first by using hive command in the terminal.
hive

Table Academp:
In our example, we will be using an existing table Academp from hive default database.
We can use below command to display the contents of table Academp.
select * from Academp;

ADD JARS:
To integrate hive with MongoDB we need to copy jars – mongo-hadoop-core-1.5.2, mongo-hadoop-hive-1.5.2, mongo-java-driver-3.2.1.
Please find the above jars from the below link and copy it in the hive lib directory.

Hive to Mongodb Jars

ADD JAR /usr/local/hive/mongo-hadoop-core-1.5.2.jar;

ADD JAR /usr/local/hive/mongo-hadoop-hive-1.5.2.jar;

ADD JAR /usr/local/hive/mongo-java-driver-3.2.1.jar;

Start MongoDB:
Now after copying the above jars in the hive lib directory, use below command in a new terminal to start MongoDB service.
sudo service mongod start

Once the MongoDB service is started use below command to enter MongoDB shell.
mongo

Create External Table:
Now, use below command to create an external table Academp_mongo to store and to and transfer contents of table Academp into the MongoDB table Academp_mongo.
create external table Academp_mongo(id int,name string,sal float)stored by ‘com.mongodb.hadoop.hive.MongoStorageHandler’ with serdeproperties(‘mongo.columns.mapping’=’{“id”:”id”,”name”:”name”,”sal”:”sal”}’)tblproperties(‘mongo.uri’=’mongodb://localhost:27017/default.Academp’);

Insert into Table Academp_mongo:

Let us store all the contents of table Academp into table Academp_mongo using below command.
Insert into table Academp_mongo select * from Academp;

Show databases:
Now, use below command in the MongoDB shell to list the databases in MongoDB.
show databases

Use database:
Our collection (table) Academp is created in the database default. Thus, let us use below command to change the database to default.
use default

Show collections:
We can use below command to list the collections present in the database default.
show collections

As we can observe from above image Academp collection is been created in the database.

Query Data:

Finally, to query data from MongoDB collection, we should use MongoDB’s find() method to list the records.
Now let us use find() Method to list the records which are available in our collections Academp.
db.Academp.find()

We can observe from the above image we have successfully imported contents from a Mysql table to HBase table using Sqoop. For more updates on Big Data Hadoop and other technologies visit Acadgild blog section.
Hadoop

Tags

3 Comments

  1. Hi,
    It really nice article.
    I followed exactly same steps to export data from Hive to MongoDB but I get following error on create external table command.
    hive> create external table Academp_mongo(id int,name string,sal float)stored by “com.mongodb.hadoop.hive.MongoStorageHandler” with serdeproperties(“mongo.columns.mapping”=”{“
    id”:”id”,”name”:”name”,”sal”:”sal”}”)tblproperties(“mongo.uri”=”mongodb://localhost:27017/default.Academp”);
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON
    I am trying this on Horton works sandbox 2.4 . Hive and MongoDB seperately running fine but can’t transfer the data because of above error.
    Please let me know how to proceed on this.
    thanks

  2. create external table app.app_test(
    fund_code string
    )
    STORED BY
    ‘com.mongodb.hadoop.hive.MongoStorageHandler’
    WITH SERDEPROPERTIES (
    ‘mongo.columns.mapping’='{“fund_code”:”id”}’)
    TBLPROPERTIES (
    ‘mongo.uri’=’mongodb://localhost:6666/mongo_test.test_data’);
    The external table created, and i can get the data using “select * from app.app_test”
    But when I exec the insert script “insert into table app.app_fund_recommend_full_test
    select ‘400027’;”, it throw errors:
    Error: java.lang.RuntimeException: Hive Runtime Error while closing operators
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:217)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
    Caused by: com.mongodb.BulkWriteException: Bulk write operation error on server 10.10.203.128:6666. Write errors: [[email protected]].
    at com.mongodb.BulkWriteHelper.translateBulkWriteException(BulkWriteHelper.java:57)
    at com.mongodb.DBCollection.executeBulkWriteOperation(DBCollection.java:2202)
    at com.mongodb.DBCollection.executeBulkWriteOperation(DBCollection.java:2188)
    at com.mongodb.BulkWriteOperation.execute(BulkWriteOperation.java:121)
    at com.mongodb.hadoop.output.MongoOutputCommitter.commitTask(MongoOutputCommitter.java:167)
    at com.mongodb.hadoop.hive.output.HiveMongoOutputFormat$HiveMongoRecordWriter.close(HiveMongoOutputFormat.java:94)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator$FSPaths.closeWriters(FileSinkOperator.java:185)
    at org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:960)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:598)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:610)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.close(ExecMapper.java:199)
    … 8 more

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close