In our previous blog, we have discussed Apache Hive Architecture in detail. This blog gives you a detailed footprint to install Apache Hive on Ubuntu and how to connect hive using beeline. We believe that java and Hadoop application software are pre-installed. You can refer our blog hadoop-3-x-installation-guide if Hadoop needs to be installed.
What you will learn :
How To Install Mysql?
How To Configure Mysql?
How To Install Hive?
How To Configure Hive Metastore?
How Beeline is used to connect Hive?
- Ubuntu ( Any Version )
- Stable Internet Connection
- 200 GB HDD
- Processor – Dual-core or above
So let’s get started with our first step which is required for the hive installation.
Step 1: Update the repositories
sudo apt-get update -y
Step 2: Install MySQL
sudo apt install mysql-server
Step 3: Configure Mysql
In order to use a password to connect to MySQL as root, you will need to switch its authentication method from auth_socket to mysql_native_password.
To do this, open up(not required) the MySQL prompt from your terminal
Step 4: Open MySQL Prompt
You can see that the root user does, in fact, authenticate using the auth_socket plugin. To configure the root account to authenticate with a password, run the following ALTER USER command. Be sure to change the password to a strong password of your choosing, and note that this command will change the root password.
Step 5: Change the root password
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY '[email protected]'; FLUSH PRIVILEGES; exit;
Now your MySQL root user password is changed and successfully configured MySQL.
Now we will start with the installation of Hive
Step 1: Create a directory for the hive and Download the hive tarball from the below link.
mkdir hive cd hive
Step 2: Extract the tarball
tar -zxvf <filename>
Step 3: Now we have to update the bashrc file so for that we need the path where the hive is installed.
cd apache-hive-3.1.2-bin pwd
Now you will get the path where your hive is installed, Copy that path.
Step 4: Open a new terminal and update the bashrc file by entering the below export statements for installing hive.
Open the .bashrc file from your home directory
sudo vi .bashrc
Add the below statements in the .bashrc file
export HIVE_HOME=/home/hadoop/install/hive/apache-hive-3.1.2-bin export PATH=$PATH:$HIVE_HOME/bin
Use the command esc + : + wq! To save and exit the .bashrc file
Step 5: Now run the below command to update the .bashrc file.
Changing Default Metastore Of Hive
Step 6: Download the hive-site.xml file from the below link and place it to the conf directory in the hive.
Note: Assuming that your file is in download.
Now go to the download directory and copy the hive-site.xml to hive conf directory
cd downloads cp hive-site.xml /home/hadoop/install/hive/apache-hive-3.1.2-bin/conf/
Step 7: Download Mysql Connector Jar file from the below link and Copy to the hive lib directory
Note: Assuming your connector file is in the download folder
cd downloads cp mysql-connector-java-5.1.48.jar /home/hadoop/install/hive/apache-hive-3.1.2-bin/lib/
Now we have to initialize MySQL schema because we have changed the metastore database to MySQL
Step 7: Initialize Schema
schematool -dbType mysql -initSchema
From the above screenshot, we can observe we have successfully installed hive with MySQL metastore.
Now, we will see what is beeline and its purpose.
Soon, the Hive CLI tool will not have support to authenticate and authorize the hive directly.
When the direct access to the hive CLI is deprecated for security reasons to avoid direct access to the data on HDFS or Mapreduce Jobs, a beeline can be used to access the hive.
Beeline is a Hive client that is included on the master nodes of your cluster. Beeline uses JDBC to connect to HiveServer2, a service hosted on your cluster. You can also use Beeline to access Hive on remotely over the internet.
Step 8: Add Below line to core-site.xml which is present in Hadoop conf directory.
<property> <name>hadoop.proxyuser.ABC.groups</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.ABC.hosts</name> <value>*</value> </property>
Note: In the above configuration lines present ABC as User now just replace it with your username.
Step 9: Start the hive server
Note: Do not close the hiveserver2 terminal once started until the task is completed successfully or when you do not want to use hive.
Note: Before you go to the next step start your Hadoop services first.
Step 10: Using the following beeline command.
beeline -n hadoop -u jdbc:hive2://localhost:10000
Note: In the above command Hadoop is my username so you can give your username accordingly.
As you can see above in the screenshot we have successfully connected to the hive using beeline client.
We have successfully installed hive and also we have(not required) configured hive metastore using MySQL database. Then we have connected hive using beeline client.
I hope this blog helps you in the future while installing hive, configuring MySQL metastore for hive as well as using beeline to connect and execute queries through HiveServer2.
In the case of any queries, feel free to comment below. Happy Learning.