Big Data Hadoop & Spark

Merging Files in HDFS

In this blog, we will discuss about merging files in HDFS and creating a single file. Before proceeding further, we recommend you to refer to our blogs on HDFS. The links are provided below:
Beginners-Guide-For-HDFS
HDFS-Commands-For-Beginners
Merging multiple files is useful when you want to retrieve the output of a MapReduce computation with multiple reducers, where each reducer produces a part of the output.
The HDFS getmerge command can copy the files present in a given path in HDFS to a single concatenated file in the local filesystem.
hadoop fs -getmerge /user/hadoop/demo_files merged.txt
The getmerge command has the following syntax:
Hadoop fs -getmerge -nl <source file path> <local system destination path>
The getmerge command has three parameters:

  • <src files> is the HDFS path to the directory that contains the files to be concatenated
  • <dist file> is the local filename of the merged file
  • [-nl] is an optional parameter that adds a new line in the result file.

Steps to merge the files
Step1:
We need to place more than 1 file inside the HDFS directory.
In the figure below, you can see that there are three files named acadgild, hadoop and FlumeData, on which we will perform merging operation.

The content of the files is shown in the below screenshot.

Hadoop
Step 2:
We now have to type the command as shown in the screenshot, to merge the files.
We have used -nl as an optional parameter to add extra line after the content of each file.

A file will be created in a specific location of your local machine with merged content. In this case, a new file with the name merged_file will be created, having the content from acadgild, hadoop and FlumeData.
You can directly open the file to see the merged content. Refer the figure below.

From the above figure, you can see that a single file is created after merging the content of three individual files.
For more information, refer to the blog section on www.acadgild.com/blogKeep visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Related Popular Courses:

BIG DATA CERTIFICATION

ANDROID PROGRAMMING CERTIFICATION

KAFKA CONSUMER

PMP CERTIFICATION

DATA ANALYSIS COURSE

Hadoop

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close