The Azure Data Lake Store is an enterprise-wide hyper-scale repository for managing big data analytic workloads. Azure Data Lake can capture data of any size, type, and ingestion speed in one single place for operations and analytics.
Azure Data Lake Store can be accessed from Hadoop3.x and Microsoft’s HDInsight cluster using the Compatible REST APIs. It is specifically designed to enable analytics on the stored data and is tuned for performance for data analytics scenarios. It has all the enterprise-grade capabilities like scalability, security, reliability, manageability, and availability—essential for real-world enterprise use cases.
Before going to this blog, we recommend our users to go through our previous blog on Introduction to Microsoft Azure
The Azure Data Lake store is an Apache Hadoop file system compatible with Hadoop Distributed File System (HDFS) and can be integrated with the Hadoop ecosystem. Your existing HDInsight applications or services that use the WebHDFS API can easily integrate with the Data Lake Store.
Data stored in the Data Lake Store can be easily analyzed using the Hadoop analytic frameworks such as MapReduce or Hive. Microsoft Azure HDInsight clusters are configured to directly access the data stored in Data Lake Store.
- Sign in to the new Azure Portal.
- Click on More Services, navigate to Storage services, and click on Data Lake Store.
You will be redirected to the Data Lake Store creation page as shown below.
Here, click on Add, now you will be re-directed to the Data Lake Store creation page as shown below.
Enter the resource name of your Data Lake Store, this will be your Data Lake Store name. Your resource, henceforth, will be identified by this name itself. Our Data Lake Store resource name is acdkiran.azuredatalakestore.net and this name will be the permanent resource name of our Data Lake Store.
After this, enter the Resource Group name. The resource group stores all the resources of a particular type. Suppose, if you take 5 Data Lake Stores, all those 5 can be kept under a single Resource Group.
You can also choose the location of your Data Lake Store. As of now, there are only 3 locations:
East US2, Central US, and North Europe. You can select the location which is nearest to you.
You can also select the pricing from the various price ranges specified near the Pricing field.
After filling the details, click on Create. If you wish to pin the Data Lake Store on to your Dashboard, you can check the Pin to dashboard check box.
Now, your Data Lake Store will start Deploying and will be created within a few minutes. You can see the status in the Notifications as shown in the screenshot below:
After completion of the deployment, you can see the message Deployments succeeded and your Data Lake Store configuration page as shown in the screen shot below:
This is how your Data Lake Store should look like. You can see the Status and URL of your Data Lake Store.
If you click on Data Explorer, you can see the UI of your Data Lake Store and you can perform various storage operations in your Data Lake Store. This is how the UI should look like:
You can create a New Folder by clicking on the New Folder option in the UI as shown below:
On the screenshot displayed below, you can see that the Datasets folder has been created.
And you can upload files into your Data Lake Store by clicking on the Upload option as shown in the screenshot below.
We have a file called Olympix_data.csv file in our local file system and we are uploading it into the Datasets folder which we had created earlier. Click on Add Selected files to upload.
Once you click on Add Selected Files, you can see the progress of upload as shown below:
Once the upload is completed, you can see the file in the specified folder.
You can also set permissions for different users by clicking on the Access option in the UI as shown below:
You can also delete the files or folder by clicking on the delete option in the menu.
You can see a few more options in the Quick Start menu as shown below:
This is how you can create a Data Lake Store in Azure and you can perform various storage operations using its UI. In the next blog, we will be discussing how to install the Hadoop3.x single node cluster and how to integrate data lake with Hadoop3.x. Keep visiting our website
Enroll for Big Data and Hadoop Certification conducted by Acadgild and become a successful big data developer.