Uncategorized

Connect hive with beeline | Hive installation with Mysql metastore

Objective:

In our previous blog, we have discussed Apache Hive Architecture in detail. This blog gives you a detailed footprint to install Apache Hive on Ubuntu and how to connect hive using beeline. We believe that java and Hadoop application software are pre-installed. You can refer our blog hadoop-3-x-installation-guide if Hadoop needs to be installed.

What you will learn :

How To Install Mysql?

How To Configure Mysql?

How To Install Hive?

How To Configure Hive Metastore?

How Beeline is used to connect Hive?

Prerequisites:

  1. Ubuntu ( Any Version )
  2. Stable Internet Connection
  3. 200 GB HDD
  4. Processor – Dual-core or above

So let’s get started with our first step which is required for the hive installation.

Install Mysql

Step 1: Update the repositories

sudo apt-get update -y

Step 2:   Install MySQL

sudo apt install mysql-server

Step 3: Configure Mysql

sudo mysql_secure_installation

In order to use a password to connect to MySQL as root, you will need to switch its authentication method from auth_socket to mysql_native_password.

To do this, open up(not required) the MySQL prompt from your terminal

Step 4: Open MySQL Prompt

sudo mysql

You can see that the root user does, in fact, authenticate using the auth_socket plugin. To configure the root account to authenticate with a password, run the following ALTER USER command. Be sure to change the password to a strong password of your choosing, and note that this command will change the root password.

Step 5: Change the root password

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '[email protected]';
FLUSH PRIVILEGES;
exit;

Now your MySQL root user password is changed and successfully configured MySQL.

Now we will start with the installation of Hive

Step 1: Create a directory for the hive and Download the hive tarball from the below link.

mkdir hive
cd hive
wget http://mirrors.estointernet.in/apache/hive/hive-3.1.2/apache-hive3.1.2-bin.tar.gz

Step 2: Extract the tarball

tar -zxvf <filename>

Step 3: Now we have to update the bashrc file so for that we need the path where the hive is installed.

cd apache-hive-3.1.2-bin

pwd

Now you will get the path where your hive is installed, Copy that path.

Step 4: Open a new terminal and update the bashrc file by entering the below export statements for installing hive.

Open the .bashrc file from your home directory

sudo vi .bashrc

Add the below statements in the .bashrc file

export HIVE_HOME=/home/hadoop/install/hive/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin

Use the command esc + : + wq! To save and exit the .bashrc file

Step 5: Now run the below command to update the .bashrc file.

source .bashrc

Changing Default Metastore Of Hive

Step 6: Download the hive-site.xml file from the below link and place it to the conf directory in the hive.

Note: Assuming that your file is in download.

Now go to the download directory and copy the hive-site.xml to hive conf directory

cd downloads
cp hive-site.xml /home/hadoop/install/hive/apache-hive-3.1.2-bin/conf/

Step 7: Download Mysql Connector Jar file from the below link and Copy to the hive lib directory

Note: Assuming your connector file is in the download folder

cd downloads
cp mysql-connector-java-5.1.48.jar /home/hadoop/install/hive/apache-hive-3.1.2-bin/lib/

Now we have to initialize MySQL schema because we have changed the metastore database to MySQL

Step 7: Initialize Schema

schematool -dbType mysql -initSchema

From the above screenshot, we can observe we have successfully installed hive with MySQL metastore.

Now, we will see what is beeline and its purpose.

Soon, the Hive CLI tool will not have support to authenticate and authorize the hive directly.

When the direct access to the hive CLI is deprecated for security reasons to avoid direct access to the data on HDFS or Mapreduce Jobs, a beeline can be used to access the hive.

Beeline is a Hive client that is included on the master nodes of your cluster. Beeline uses JDBC to connect to HiveServer2, a service hosted on your cluster. You can also use Beeline to access Hive on remotely over the internet.

Step 8: Add Below line to core-site.xml which is present in Hadoop conf directory.

<property> 
<name>hadoop.proxyuser.ABC.groups</name> 
<value>*</value>
</property>
<property>
 <name>hadoop.proxyuser.ABC.hosts</name>
 <value>*</value>
</property>

Note: In the above configuration lines present ABC as User now just replace it with your username.

Step 9: Start the hive server

hiveserver2 start

Note: Do not close the hiveserver2 terminal once started until the task is completed successfully or when you do not want to use hive.

Note: Before you go to the next step start your Hadoop services first.

Step 10: Using the following beeline command.

beeline -n hadoop -u jdbc:hive2://localhost:10000

Note: In the above command Hadoop is my username so you can give your username accordingly.

As you can see above in the screenshot we have successfully connected to the hive using beeline client.

Conclusion:

We have successfully installed hive and also we have(not required) configured hive metastore using MySQL database. Then we have connected hive using beeline client.

I hope this blog helps you in the future while installing hive, configuring MySQL metastore for hive as well as using beeline to connect and execute queries through HiveServer2.

In the case of any queries, feel free to comment below. Happy Learning.

Ajit Khutal

Ajit Khutal has been working with AcadGild as an Associate Big Data analyst with expertise in Big Data Technologies like Hadoop, Spark, Kafka, Nifi. He has been a Python enthusiast and been associated with the implementation of many Analytics project related to various domains like E-commerce, Banking, and Education.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close