Data visualization is the representation of information in the form of Graphs, Charts, Diagrams, etc., Big Data analytics is all about analyzing the large sets of data you have and deriving valuable outcomes from the analysis which will help in all-round business development. The biggest challenge now after big data analysis is that how to visualize the outcomes after analysis. Zeppelin is one of the solution for visualizing the outcomes of your analysis.
Big Data Visualization Using Zeppelin
Zeppelin is an open-source multi-purpose Notebook offering the following features to your data:
Data Visualization and Collaboration
Apache Zeppelin acts as a cross platform providing interpreters with many languages so that you can compile the code through Zeppelin itself and visualize the outcomes.
In our previous blogs, we have shown how to visualize results of a hive query and the results of Spark jobs.
In this blog, we will be giving a demo on how to visualize the output of Pig scripts using Zeppelin.
Zeppelin added the Pig interpreter from Zeppelin-0.7.0. You can download the latest version of Zeppelin from here.
After downloading the tar file of Zeppelin, untar is uses the following command:
tar -xvzf zeppelin-0.7.0-bin-netinst.tgz
After untaring, open the conf directory inside zeppelin-0.7.0-bin-netinst and make a copy of zeppelin-env.sh.template as zeppelin-env.sh
After creating a copy, open the zeppelin-env.sh file and add the following configurations:
export JAVA_HOME=/home/kiran/jdk1.8.0_65 #path to your JAVA_HOME export ZEPPELIN_PORT=9900 #port number to run Zeppelin export SPARK_HOME=/home/kiran/spark-1.5.1-bin-hadoop2.6 #path to your SPARK_HOME export HADOOP_CONF_DIR=/home/kiran/hadoop-2.7.1/etc/hadoop #path to your HADOOP_CONF directory
After adding the above configurations, save and close the file.
Before starting Zeppelin, you need to install the interpreters to compile your programs or scripts.
To install interpreters, move into the bin folder of zeppelin-0.7.0-bin-netinst directory and run the command.
The above command will install all the interpreters that are coming with zeppelin-0.7.0.
Note: If you have installed zeppelin-0.7.0-bin-all.tgz, then you need not install the interpreters separately.
Now, we will see how to run Pig scripts using Zeppelin and how to visualize them.
Open the Zeppelin server web UI by using the port number that you have given in the configuration folder. We have given the port number as 9900, now in the web browser type: localhost:9900
Zeppelin UI will look like this
All the codes or scripts should be written in a Notebook in the zeppelin. So in order to create a Zeppelin notebook, click on create new note and give a name to your notebook, and below you can select the default interpreter for this notebook. We have selected the default interpreter as pig.
If you select the default interpreter, you need not specify the prefix for compiling the code.
Now a notebook will be created with the name Zeppelin_pig. Open the notebook and you can see an empty paragraph.
Let us run a sample Pig script using Zeppelin. For that, we have taken daily show analysis. You can refer more about this here.
We will come across, Find the top five kinds of GoogleKnowlege_Occupation people who were guests in the show, in a particular time period.
We will copy the Pig script in this paragraph, and we will run to check for the output.
%pig A = load '/dialy_show_guests' using PigStorage(',') AS (year:chararray,occupation:chararray,date:chararray,group:chararray,gusetlist:chararray); B = foreach A generate occupation,date; C = foreach B generate occupation,ToDate(date,'MM/dd/yy') as date; D = filter C by ((date> ToDate('1/11/99','MM/dd/yy')) AND (date<ToDate('6/11/99','MM/dd/yy'))); E = group D by occupation; F = foreach E generate group, COUNT(D) as cnt; G = order F by cnt desc; H = limit G 5; dump
In the above screen shot, we have successfully run the Pig script through Zeppelin and got the output. But the data is not visualized.
To visualize this data, in the another paragraph, you need to run a foreach loop. You need to make the prefix as pig.query and you need to pull out the columns from the relation using a foreach loop as shown below.
%pig.query foreach H generate $0,$1;
There are two columns in the output and the relation name is ‘H,’ so we have written a foreach loop to generate the two columns as output.
Now let us run this script now. In the below screen shot below, you can see that the results are being visualized.
For the same results, you can see the pie chart displayed below:
This is how you can visualize the outcomes of the Pig scripts using zeppelin.
We hope this blog helped you in learning how to run Pig scripts and visualize the outcomes of Pig scripts using Zeppelin. Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.