Big Data Hadoop & Spark

Troubleshooting Hive Metadata

All the metadata for Hive tables and partitions are accessed through the Hive Metastore. Metadata is persisted using JPOX ORM solution (Datanucleus) so any database that is supported by it can be used by Hive. Most of the commercial relational databases and many open source databases are supported.

By default, Hive uses derby as metastore storage engine but, in production, we always use an external database like Mysql. Although, we can use any database server, but Mysql is the most commonly used server integrated with Hive.

Let’s look at some common metastore errors.

Case 1:  Mysql Jdbc driver is missing. The property javax.jdo.option.ConnectionDriverName” talks about the driver to connect to Mysql. The Mysql driver “mysql-connector-java-5.1.39-bin.jar”, must be present under the directory “hive/lib”.

hive metadata troubleshooting

Error:

hive meta data troubleshooting

Solution: To resolve the above error, we need to copy “mysql-connector-java-5.1.39-bin.jar” to “hive/lib”

hive metadata troubleshooting

Case 2: The Hive prompt just hangs and waits for the timeout before giving any error.

hive metadata troubleshooting

Reason/Solution: Not able to reach the mysql server due to Firewall. Test it by using Telnet as shown below.

hive metadata troubleshooting

Once Firewall is allowing the connection:

Case 3: Hive prompt can also hang due to wrong server or port or typo in the name of mysql address.

Solution: This could be due to incorrect MySQL server or port specified in the configuration.

Notice that the port is wrong as correct port is “3306”

Once these corrections are made, we can see the Hive prompt.

Case 4: Mysql has incorrect grants for the hive user and metastore table.

Solution: Make sure the user “Hadoop” or whichever is running hive has permission to modify the HiveDB.

Case 5: Hive Error “Too many counters” Error”

Solution: This is when there are too many counters emitted by Hive, which is set to 120 by default. We can increase this by changing “mapreduce.job.counters.max” in mapred-site.xml .

Case 6: Sometimes when we set our classpath in “.bashrc” it might conflict with the other jars.

Solution:

We can get rid of this error by overriding the precedence of the classpath using “export HADOOP_USER_CLASSPATH_FIRST=true”

We hope this blog helped you in understanding the troubleshooting of Hive metadata. Keep visiting our site www.acadgild.com for more details on Big Data and other technologies.

Tags

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close