In this blog, we will be discussing how a user can integrate Cloudera Hive with Tableau to visualize the data and results.
Before forwarding to the procedure to integrating hive with tableau, we should be aware of concepts like Data visualization and Tableau for better insights. Let us begin then.
The concept of using pictures and graphs to understand data has been around for many years. As day by day, the data is getting increased it is a challenge to visualize these data and provide productive results within the lesser amount of time. Thus, Data visualization comes to the rescue to convey concepts in a universal manner and to experiment in different scenarios by making slight adjustments.
Data visualization is a process of describing information in a graphical or pictorial format which helps the decision makers to analyze the data in an easier way.
- Data visualization just not makes data more beautiful but also provides insight into complex data sets by communicating with the key aspects more intrude on the meaningful ways.
- Helps in identifying areas that need attention or improvement.
- Clarify which factors influence customer behavior
- Helps to understand which fields to place where
- Helps to predict scenarios and more.
As we have understood a glimpse of Data visualization let us now, know what is tableau.
Tableau is a data visualization tool which helps to create beautiful and visually appealing interactive visualizations like graphs, charts, reports and dashboard on our existing data.
Tableau provides an easy to use, drag and drop interface which takes only seconds or minutes rather than months or years to create interactive visualizations on the data.
We can connect to a wide variety of data sources, including files, SQL databases, web data and cube (multidimensional) databases using Tableau.
Tableau is also designed to support and visualize data which are present in Hadoop platforms (Hadoop Cloudera Hive and Hadoop MapR Hive).
Thus, we can perform visualization techniques on the databases and tables which are resided in the Hadoop cluster too.
Hive Environment setup:
You can follow our below link blogs to understand working of the hive and its installation.
Tableau Environment setup
Tableau offers different products catering to diverse visualization needs for professionals and organizations, and they are:
Tableau: Desktop: Made for individual use
Tableau Server: Collaboration for any organization
Tableau Online: Business Intelligence in the Cloud
Tableau Mobile: Get insight on your tablet or phone
Tableau Public: For journalists or anyone to publish interactive data online
Tableau Reader Lets you read files saved in Tableau Desktop.
Download Tableau –
In this blog, we will be referring to Tableau Desktop version product which is publicly available for 14 days. Tableau software works only with Windows or in MAC OS platforms. Thus, we will be downloading and install tableau software in our windows machine and integrate with Cloudera Hadoop Hive meta store.
Once you have entered to tableau desktop page click on the Try it for the free button to download the tableau desktop version software.
Register your mail id to complete the download procedure.
Install Tableau –
Now, double click on the downloaded tableau desktop version software and start the installation process. A window will be prompted requesting to allow the installation of the above tool Accept it and Click on Run to continue the installation process.
A License agreement will be prompted now choose the “I have read and accept the terms of this license agreement” option. Then Click “Install” button.
On completion of installation, the screen prompts you with the option to start the trial now or later. You may select Start trial now option.
Note: If you have purchased tableau then you may enter the license key.
Enter all the required details in the registration window and then click on the register option button.
Once the registration has completed a window will be prompted you can click on the continue button to start working on the tableau.
You have successfully installed your Tableau desktop version on your system which you can use it for the next 14 days.
Integrate Hive with Tableau:
Let us now, know how to integrate Hive tables with Tableau.
Tableau currently supports Cloudera and MapR version of Hadoop platforms. Thus, in this blog we are using CDH vm to integrate hive tables with the tableau.
First, we need to start Hive thrift server, which allows a remote client to submit requests to Hive using a variety of programming languages. You can follow below command to start Hive Thrift server
Once the Hive Thrift server service is started. You can open the tableau software by double clicking on the shortcut icon on the desktop
Select Continue trial option. Now we can observe Tableau window will be opened to the user to connect with different sources.
Now select More => and Cloudera Hadoop option under To a Server header, to connect with Cloudera Hadoop Hive database.
Now Cloudera Hadoop window will pop up. Enter the IP address of your CDH
To know the IP address of your CDH VM you can use the ifconfig command in the terminal.
Now in tableau a Cloudera Hadoop registration window will be prompted; enter the below details to connect Tableau and Cloudera Hadoop database.
By default, Hive server port number will be 10000, Enter server IP address, type as HiveServer2, Authentication as Username, and username as Cloudera. Now click on Sign in button to establish a connection between Tableau and Cloudera Hadoop database.
There are three basic steps involved in creating any Tableau data analysis report.
Connect to a data source: It involves locating the data and use an appropriate type of connection to read the data.
Choose Dimensions and Measures: This involves selecting the required columns from the source data for analysis.
Apply Visualization technique: This involves applying required visualization methods like a specific chart or graph type to the data being analyzed.
Connect to a Data Source:
Once the tableau window prompted we need to enter the below details to extract the exact table contents from hive meta store:
Schema ( Database Name ): select the database name from which you want to use.
Table: select the table name from the selected database to perform visualization.
You can use Starts with the option to search the table which starts with a particular letter.
Choose the Dimensions and Measures:
Next, we choose the data to be analyzed by deciding on the dimensions and measures. Dimensions are the descriptive data while measures are numeric data. When putting together, they help us visualize the performance of the dimensional data with respect to the data which are measures.
Apply Visualization Technique
In the previous step, we see that the data is available only as numbers. We have to read and calculate each of the values to judge the performance. But we can see them as graphs or charts with different colors to get a quicker judgment.
We drag and drop the id and name in column field and sal in row field. The table showing the numeric values of the salary of each employee now turns into a bar chart automatically.
From the above image we can observe, we have successfully integrated Hive with Tableau and performed simple visualization step to display salary of each employee in a bar graph.