It is important for learners to understand the components of any architecture. In this post, we will be looking at HBase and its components, and what happens when a client machine sends a request.
It’s best to look at these posts on Beginners Guide to HBase and the DML/CRUD Operations,before heading on with this post.
Let’s now look at the step-by- step procedure which takes place within the HBase architecture that allows it to complete its basic operation.
Components of HBase:
The three major components of HBase, which takes part in an operation are as follows:
These three components work together to make HBase a fully functional and efficient database.
The journey of an operation starts with the Client sending a request to the HBase. The following are the steps in the order of its execution.
●Hmaster gives instruction to Hregion.
● Hmaster takes information from Zookeeper if any services fail to respond.
● Hmaster is the one, which responds and takes the request in.
● Hmaster on startup coordinates & monitors Region Server also assign Region
● Hmaster creates Region and let know the Hregion Server to store data to the following region. Which ones?
● Hmaster is responsible for creating a table.
● Hmaster is responsible for load balancing.
● HMaster monitors nodes to discover all available region servers, and also monitors these nodes for server failures.
● HMaster reassigns the regions from the crashed server to active Region servers.
● There can be multiple Hmaster, just in case of backup as an inactive state.
● The active HMaster sends heartbeats to Zookeeper, and the inactive HMaster listens for notifications of the active HMaster failure.
● HMaster splits the WAL belonging to the crashed region server into separate files and stores these file in the new region servers’ data nodes.
Similar to this, several requests might be coming to Hmaster, making it too busy to perform all these work by itself.
● Region Server actually stores data.
● Region server does both the work of reading and writing data into the table.
● Write Ahead Log (WAL) gives fault tolerant feature, which is also known as Hlog.
● A region of a table is served to the client by a Region Server.
● A region server can serve about 1,000 regions (which may belong to the same table or different tables).
● When accessing data, the clients communicate with HBase Region Servers directly.
● Region Server has BlockCach, which is a read cache that frequently stores the read data in memory. Least Recently Used data is removed when the cache is full.
● Region Server have MemStore, which is the write cache. It keeps the new data which has not yet been written to disk. There is one MemStore available per column family per region.
● Hfiles store the rows as sorted KeyValues on disk.
● Multiple Hfile make 1 region.
The servers lay in different nodes in distributed system and are slave nodes. The master node is known as Hmaster.
● Manage configuration across nodes – If you have dozens or hundreds of nodes, it becomes hard to keep configuration in sync across nodes and quickly make changes. ZooKeeper helps you to quickly push the configuration changes.
● Implement reliable messaging – With ZooKeeper, you can easily implement a producer/consumer queue that guarantees delivery, even if some consumers or even one of the ZooKeeper servers fails.
● Implement redundant services – With ZooKeeper, a group of identical nodes (e.g. database servers) can elect a leader/master and let ZooKeeper refer all clients to that master server. If the master fails, ZooKeeper will assign a new leader and notify all clients.
● Synchronize process execution – With ZooKeeper, multiple nodes can coordinate the start and end of a process or calculation. This ensures that anyfollow-up processing is done only after all nodes have finished their calculations.
Hope this post has been helpful in understanding the components of HBase and its role in various operation. In case of any queries, feel free to comment below and we will get back to you at the earliest.