ZooKeeper is a software providing an open source distributed configuration service, synchronization service, and naming registry for large distributed systems.It was a sub-project of Hadoop but is now a top-level project in its own right.
It is used by many companies including Rackspace, Yahoo! Odnoklassniki and eBay as well as open source enterprise search systems like Solr.
100% Free Course On Big Data Essentials
Subscribe to our blog and get access to this course ABSOLUTELY FREE.
For example, it makes it easier to:
Manage configuration across nodes. If you have dozens or hundreds of nodes, it becomes hard to keep configuration in sync across nodes and quickly make changes. It helps you quickly push configuration changes.
Implement reliable messaging. With this, you can easily implement a producer/consumer queue that guarantees delivery, even if some consumers or even one of the servers fails.
Implement redundant services. With ZooKeeper, a group of identical nodes (e.g. database servers) can elect a leader/master and let ZooKeeper refer all clients to that master server. If the master fails, ZooKeeper will assign a new leader and notify all clients.
Synchronize process execution. With ZooKeeper, multiple nodes can coordinate the start and end of a process or calculation. This ensures that any follow-up processing is done only after all nodes have finished their calculations.
The interface provided by ZooKeeper is quite low-level. However, ZooKeeper will ensure all clients are notified reliably and the order of configuration messages is maintained.
The functionality provided by ZooKeeper is often developed as part of Hadoop applications.
Firing Zookeeper daemon as standalone mode in single node Hadoop cluster.
Hadoop Daemons up and running
Download the tar file from the below-given link
Extract the file and save to a location where you won’t alter it.
Go to the conf directory inside of the extracted folder to see zoo_sample.cfg.
Make a copy of the file and rename it to zoo.cfg . Why to make a copy? As alterations will be done in configuration, so whenever we alter configuration we leave the main file and make changes in copied file.
We will be changing dataDir location path from temp to a permanent location where zookeeper will keep its collected data.
In your system create a directory named zookeeper-data , which should not be relocated. Set the path in dataDir of zoo.cfg file.
Also check these three parameters(tickTime,clientPort,syncLimit) is set same as shown in above figure.
NOTE:- make directory(zookeeper-data) in area where permission is not restricted for owner and group, as they will be accessing the directory.
Now we start the daemon for zookeeper.
Accessing form terminal, change the path to bin and find the zkServer.sh script file.
Type the following command to start zookeeper daemon.
Note :- command is case sensitive
To check if the daemon running status ,just follow the below command.
We can also check it through jps command.
We will be discussing how to synchronize cluster with zookeeper in pseudo-distributed mode in next blog.