Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Integrating Azure Data Lake with Hadoop 3.x

Integrating Azure Data Lake with Hadoop 3.x

 July 13  | 0 Comments

Hadoop 3.x comes with a native support to change the storage system from HDFS to Microsoft Azure Data Lake Storage.

In this blog, we will be discussing about how to integrate your Azure data lake with HDFS. Before going through this blog, we recommend our users to go through our previous blogs in this series

Introduction to Microsoft Azure

Introduction to Azure Data Lake Store

Hadoop 3.x installation guide

Azure Data Lake needs OAUTH2.0 for authenticating your request, for that purpose you need to create a user in Azure active directory and you need to add it into your Data lake.

OAuth2 Support

Usage of Azure Data Lake Storage requires OAuth2 bearer token to be present as part of the HTTPS header as per OAuth2 specification. Valid OAuth2 bearer token should be obtained from Azure Active Directory for valid users who have access to Azure Data Lake Storage Account.

Azure Active Directory (Azure AD) is Microsoft’s multi-tenant cloud based directory and identity management service

Creating Service principle using Azure Active Directory

  1. Open your Azure portal and click on Azure Active Directory

    So, we will summarize what you have generated till now in terms of Hadoop3 configurations

    Application ID — Client ID

    OAUTH 2.0 Token End point – OAUTH 2.0 Refresh URL

    Key value — OAUTH 2.0 Credential or Client secret

    Now you need to add these properties in your core-site.xml to make the changes effect.

     <property>
            <name>dfs.adls.oauth2.access.token.provider.type</name>
            <value>ClientCredential</value>
      </property>
      <property>
          <name>dfs.adls.oauth2.refresh.url</name>
          <value>YOUR TOKEN ENDPOINT</value>
      </property>
      <property>
          <name>dfs.adls.oauth2.client.id</name>
          <value>YOUR CLIENT ID</value>
      </property>
      <property>
          <name>dfs.adls.oauth2.credential</name>
          <value>YOUR CLIENT SECRET</value>
      </property>
      <property>
          <name>fs.adl.impl</name>
          <value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
      </property>
      <property>
          <name>fs.AbstractFileSystem.adl.impl</name>
          <value>org.apache.hadoop.fs.adl.Adl</value>
      </property>

    After adding these properties, save and close the file and now open your hadoop-env.sh file and add the class path of Hadoop tools (Azure support comes from the Hadoop tools library)

    set HADOOP_CLASSPATH=$HADOOP_HOME/share/hadoop/tools/lib/*

    After adding the class path, save and close the hadoop-env.sh file

    Now without starting the HDFS daemons, you can interact with your ADL, here is the data present in my ADL storage.

>