Big Data Hadoop & Spark

Zookeeper Installation Guide

ZooKeeper is a software providing an open source distributed configuration service, synchronization service, and naming registry for large distributed systems.It was a sub-project of Hadoop but is now a top-level project in its own right.

It is used by many companies including Rackspace, Yahoo! Odnoklassniki and eBay as well as open source enterprise search systems like Solr.

For example, it makes it easier to:

  • Manage configuration across nodes. If you have dozens or hundreds of nodes, it becomes hard to keep configuration in sync across nodes and quickly make changes. It helps you quickly push configuration changes.

  • Implement reliable messaging. With this, you can easily implement a producer/consumer queue that guarantees delivery, even if some consumers or even one of the servers fails.

  • Implement redundant services. With ZooKeeper, a group of identical nodes (e.g. database servers) can elect a leader/master and let ZooKeeper refer all clients to that master server. If the master fails, ZooKeeper will assign a new leader and notify all clients.

  • Synchronize process execution. With ZooKeeper, multiple nodes can coordinate the start and end of a process or calculation. This ensures that any follow-up processing is done only after all nodes have finished their calculations.

The interface provided by ZooKeeper is quite low-level. However, ZooKeeper will ensure all clients are notified reliably and the order of configuration messages is maintained.

The functionality provided by ZooKeeper is often developed as part of Hadoop applications.

Firing Zookeeper daemon as standalone mode in single node Hadoop cluster.

Prerequisites :-

Hadoop Daemons up and running

Download the tar file from the below-given link

https://drive.google.com/open?id=0B1QaXx7tpw3SSGp0bEJPUU9RM0U

Extract the file and save to a location where you won’t alter it.

Go to the conf directory inside of the extracted folder to see zoo_sample.cfg.

Make a copy of the file and rename it to zoo.cfg . Why to make a copy? As alterations will be done in configuration, so whenever we alter configuration we leave the main file and make changes in copied file.

We will be changing dataDir location path from temp to a permanent location where zookeeper will keep its collected data.

Hadoop

In your system create a directory named zookeeper-data , which should not be relocated. Set the path in dataDir of zoo.cfg file.

Also check these three parameters(tickTime,clientPort,syncLimit) is set same as shown in above figure.

NOTE:- make directory(zookeeper-data) in area where permission is not restricted for owner and group, as they will be accessing the directory.

Now we start the daemon for zookeeper.

Accessing form terminal, change the path to bin and find the zkServer.sh script file.

Type the following command to start zookeeper daemon.

zkServer.sh start

Note :- command is case sensitive

To check if the daemon running status ,just follow the below command.

zkServer.sh status

We can also check it through jps command.

We will be discussing how to synchronize cluster with zookeeper in pseudo-distributed mode in next blog.

Keep visiting our site www.acadgild.com for more updates on Bigdata and other technologies. click here to learn zookeeper from the industry experts.

Hadoop

Tags

One Comment

  1. I see your website needs some fresh content. Writing manually is time consuming, but there
    is tool for this task. Just search in gooogle for – Avurker’s essential tools

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close