Big Data Hadoop & Spark

ACID Transactions – Transactions in Hive

In this blog post, we have explained about the row-level transactions available in Hive. This post will provide you a good idea of how to implement the row-level transactions on the Hive table.

Before beginning with the transactions in Hive, let’s look at the ACID properties, which are vital for any transaction.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

What is ACID? What are the ACID properties of a transaction?

ACID stands for Atomicity, Consistency, Isolation, and Durability.

Atomicity means, a transaction should complete successfully or else it should fail completely i.e. it should not be left partially. Consistency ensures that any transaction will bring the database from one valid state to another state. Isolation states that every transaction should be independent of each other i.e. one transaction should not affect another. And Durability states that if a transaction is completed, it should be preserved in the database even if the machine state is lost or a system failure might occur.

These ACID properties are essential for a transaction and every transaction should ensure that these properties are met.

ACID Transactions in Hive

Transactions in Hive are introduced in Hive 0.13, but they only partially fulfill the ACID properties like atomicity, consistency, durability, at the partition level. Here, Isolation can be provided by turning on one of the locking mechanisms available with zookeeper or in memory.

But in Hive 0.14, new API’s have been added to completely fulfill the ACID properties while performing any transaction.

Transactions are provided at the row-level in Hive 0.14. The different row-level transactions available in Hive 0.14 are as follows:

  1. Insert
  2. Delete
  3. Update

There are numerous limitations with the present transactions available in Hive 0.14. ORC is the file format supported by Hive transaction. It is now essential to have ORC file format for performing transactions in Hive. The table needs to be bucketed in order to support transactions.

Row-level Transactions Available in Hive 0.14

Let’s perform some row-level transactions available in Hive 0.14. Before creating a Hive table that supports transactions, the transaction features present in Hive needs to be turned on, as by default they are turned off.

The below properties needs to be set appropriately in hive shell , order-wise to work with transactions in Hive:

hive>set hive.support.concurrency = true;
hive>set hive.enforce.bucketing = true;
hive>set hive.exec.dynamic.partition.mode = nonstrict;
hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
hive>set hive.compactor.initiator.on = true;
hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;

If the above properties are not set properly, the ‘Insert’ operation will work but ‘Update’ and ‘Delete’ will not work and you will receive the following error:

FAILED: SemanticException [Error 10294]: Attempt to do update or delete usingtransaction manager thatdoes not support these operations.

Creating a Table That Supports ACID Hive Transactions

CREATE TABLE college(clg_id int,clg_name string,clg_loc string) clustered by (clg_id) into 5 buckets stored as orc TBLPROPERTIES('transactional'='true');

The above syntax will create a table with name ‘college’ and the columns present in the table are ‘clg_id, clg_name, clg_loc’. We are bucketing the table by ‘clg_id’ and the table format is ‘orc’, also we are enabling the transactions in the table by specifying it inside the TBLPROPERTIES as ‘transactional’=’true’

We have successfully created a table with name ‘college’ which supports row-level transactions of Hive.

The create table can be checked using the command show tables.

Inserting Data into a Hive Table

INSERT INTO table college values(1,'nec','nlr'),(2,'vit','vlr'),(3,'srm','chen'),(4,'lpu','del'),(5,'stanford','uk'),(6,'JNTUA','atp'),(7,'cambridge','us');

The above command is used to insert row wise data into the Hive table. Here, each row is seperated by ‘( )’ brackets.

Now, we have successfully inserted the data into the Hive table.

The contents of the table can be viewed using the command select * from college

From the above image, we can see that the data has been inserted successfully into the table.

Now if we try to re-insert the same data again, it will be appended to the previous data as shown below:

Hadoop

Updating the Data in Hive Table

UPDATE college set clg_id = 8 where clg_id = 7;

The above command is used to update a row in Hive table.

From the above image, we can see that we have received an error message. This means that the Update command is not supported on the columns that are bucketed.

In this table, we have bucketed the ‘clg_id’ column and performing the Update operation on the same column, so we have go the error

FAILED: SemanticException[Error 10302]: Updating values of bucketing columns is not supported. Column clg_id

Now let’s perform the update operation on Non bucketed column

UPDATE college set clg_name = 'IIT' where clg_id = 6;

We have successfully updated the data.

The updated data can be checked using the command select * from college.

We can see that the data has been updated successfully.

Now let’s perform the Delete operation on the same table.

Deleting a Row from Hive Table

delete from college where clg_id=5;

The above command will delete a single row in the Hive table.

We have now successfully deleted a row from the Hive table. This can be checked using the command select * from college.


Hadoop
We can see that there is no row with clg_id =1. This means that we have successfully deleted the row from the Hive table.

This is how the transactions or row-wise operations are performed in Hive.

Follow our blog to stay updated of our latest blog posts on Big Data technologies.

Got a question for us? Please mention them in the comments section and we will address them as soon as possible.

Keep visiting our site www.acadgild.com for more updates on Bigdata and other technologies. Click here to learn Bigdata Hadoop from our Expert Mentors

Hadoop

Tags

10 Comments

  1. Tanks a lot for your information its very useful and can you provide any information regarding for apache oozie with real time scenario?

  2. HEllo Kiran
    after setting the below properties
    hive>set hive.support.concurrency = true;
    hive>set hive.enforce.bucketing = true;
    hive>set hive.exec.dynamic.partition.mode = nonstrict;
    hive>set hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
    hive>set hive.compactor.initiator.on = true;
    hive>set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
    Its getting hanged and no operation can be done after that ……can you please help on it
    I am using acadgild VM on 64 bit

    1. Hi Amol,
      set hive.compactor.worker.threads = a positive number on at least one instance of the Thrift metastore service;
      For this you need to give any positive value for example
      set hive.compactor.worker.threads = 5;

  3. I did that i have given 2
    after setting all those properties whatever next hive statement i give it hangs for ever.
    i am trying win hive-site.xml now , mean time if you have any inputs please let me know.

  4. Hello Satyam ,
    Thanks for quick help , setting properties from command prompt didn’t worked for me ,
    as i mentioned its just hanging for ever .
    I tried another way and it is working for me
    I have update above all properties in hive-site.xml + one extra property i added
    then restarted the service …( just shut down VM and restarted ….) and its working now.
    —————————————————————————————————————————————————–my hive-site.xml looks like below
    ( may be useful for others who may stuck with same error i got)
    javax.jdo.option.ConnectionURL
    jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true
    metadata is stored in a MySQL server
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
    MySQL JDBC driver class
    javax.jdo.option.ConnectionUserName
    hiveuser
    user name for connecting to mysql server
    javax.jdo.option.ConnectionPassword
    password
    password for connecting to mysql server
    hive.support.concurrency
    true
    hive.enforce.bucketing
    true
    hive.exec.dynamic.partition.mode
    nonstrict
    hive.txn.manager
    org.apache.hadoop.hive.ql.lockmgr.DbTxnManager
    hive.compactor.initiator.on
    true
    hive.compactor.worker.threads
    2
    hive.in.test
    true

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close