Big Data Hadoop & Spark

Merging Files in HDFS

In this blog, we will discuss about merging files in HDFS and creating a single file. Before proceeding further, we recommend you to refer to our blogs on HDFS. The links are provided below:
Merging multiple files is useful when you want to retrieve the output of a MapReduce computation with multiple reducers, where each reducer produces a part of the output.
The HDFS getmerge command can copy the files present in a given path in HDFS to a single concatenated file in the local filesystem.
hadoop fs -getmerge /user/hadoop/demo_files merged.txt
The getmerge command has the following syntax:
Hadoop fs -getmerge -nl <source file path> <local system destination path>
The getmerge command has three parameters:

  • <src files> is the HDFS path to the directory that contains the files to be concatenated
  • <dist file> is the local filename of the merged file
  • [-nl] is an optional parameter that adds a new line in the result file.

Steps to merge the files
We need to place more than 1 file inside the HDFS directory.
In the figure below, you can see that there are three files named acadgild, hadoop and FlumeData, on which we will perform merging operation.

The content of the files is shown in the below screenshot.

Step 2:
We now have to type the command as shown in the screenshot, to merge the files.
We have used -nl as an optional parameter to add extra line after the content of each file.

A file will be created in a specific location of your local machine with merged content. In this case, a new file with the name merged_file will be created, having the content from acadgild, hadoop and FlumeData.
You can directly open the file to see the merged content. Refer the figure below.

From the above figure, you can see that a single file is created after merging the content of three individual files.
For more information, refer to the blog section on visiting our website Acadgild for more updates on Big Data and other technologies. Click here to learn Big Data Hadoop Development.

Related Popular Courses:








An alumnus of the NIE-Institute Of Technology, Mysore, Prateek is an ardent Data Science enthusiast. He has been working at Acadgild as a Data Engineer for the past 3 years. He is a Subject-matter expert in the field of Big Data, Hadoop ecosystem, and Spark.


  1. Hello ,
    When I try to do this example,I am getting this error
    hadoop fs -getmerge /user/okan/data/* /user/okan/get/merged.txt

    getmerge: Mkdirs failed to create file:/user/okan/get (exists=false, cwd=file:/Users/okan)

    I couldn’t understand where is my fault .
    Please help me.

  2. Hello,
    I have tried following steps but When I do this example,I am getting following errors

    hadoop fs -getmerge /user/okan/data/* /user/okan/get/merged.txt

    getmerge: Mkdirs failed to create file:/user/okan/get (exists=false, cwd=file:/Users/okan)

    Please help me,

    1. Please check that the first path is of your HDFS directory and second is your local file in local file system.
      Since , you are using /user/okan/ in your syantax, i am suspecting you are not using the right directories path somewhere.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles