Big Data Hadoop & Spark

Beginner's Guide for Apache Storm

In this post, we will be discussing about Apache storm and its installation in a single node Hadoop cluster.

Let’s start our discussion with what is Apache storm

What is Apache Storm?

Apache storm is an open source distributed system for real-time processing. It can process unbounded streams of Bigdata very elegantly. Storm can be used with any language because at the core of Storm is a Thrift Definition for defining and submitting topologies. Thrift can be used in any language and topologies can be defined and submitted from any language.

Storm has many use cases: real-time analytics, online Machine Learning, continuous computation, distributed RPC, ETL, and more. Storm is fast and is a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant and guarantees that your data will be processed. It is also very easy to set up and operate.

Storm can integrate with the queuing and database technologies already in use. A Storm topology consumes streams of data and processes them in arbitrarily complex ways, repartitioning the streams between each stage of the computation.

You can read more about in this tutorial.

Installing Storm in a Single Node Cluster

For installing Storm, you need to have Zookeeper in your single node cluster. So, let’s start with the installation of Zookeeper.

Zookeeper Installation:

Step 1: Download Zookeeper from the below link:

http://mirror.fibergrid.in/apache/zookeeper/zookeeper-3.4.6/

Step 2: After downloading, untar it by using the command tar -xvzf zookeeper-3.4.6.tar.gz.

Extracting zookeeper

Step 3: Now, create a directory for storing the Zookeeper data, as it needs to store the PID’s of the processes that are running. Here we have created the folder with the name zookeeper in the zookeeper-3.4.6 directory.

Folder to store zookeeper data

Step 4: Next, move into the conf folder of the installed Zookeeper and copy the zoo_sample.cfg file as zoo.cfg using the command cp zoo_sample.cfg zoo.cfg.

Now, open the zoo.cfg and give the path of the directory created for storing zookeeper’s data. You can refer to the below screen shot for the same.


We are done! We have successfully installed the Zookeeper service in your single node Hadoop cluster.

Let’s now export the path of Zookeeper into bashrc file. Move into your home directory using cd command and open the bashrc file using the command gedit .bashrc and type the below lines in the bashrc file.

#set Zookeeper home
export ZOOKEEPER_HOME=/home/kiran/zookeeper-3.4.6[Here you should give the path of the installed zookeeper directory]
export PATH=$PATH:$ZOOKEEPER_HOME/bin

After adding the lines, close and save the file.

Now, source the file using the command source .bashrc.

source bashrc

Step 5: Start the Zookeeper service by typing the command ZkServer.sh start.

If you get the above message, then your Zookeeper service has been started successfully.

Now, let’s move on to the installation of Storm.

Hadoop

Storm Installation:

Step 1: Download Storm from the below link:

http://www.apache.org/dyn/closer.lua/storm/apache-storm-1.0.1/apache-storm-1.0.1.tar.gz

Step 2: After downloading, untar it by using the below command:

tar -xvzf apache-storm-1.0.1.tar.gz

extracting storm

Step 3: Now, create a folder in the location of your choice, for storing the Storm data. We have created a folder in the apache-storm-1.0.1 directory itself.

Folder to store storm data

Step 4: Now, move into the conf folder and open the storm.yaml file and add the below specified properties. Before that please note that in the storm.local.dir you need to give the path of the directory created for storing Storm data.

storm.zookeeper.servers:
- "localhost"
storm.local.dir: "/home/kiran/apache-storm-1.0.1/data" [Here you need to give the untared apache-storm-1.0.1 directory]
nimbus.host: "localhost"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703

After adding these properties, close and save the file.

Step 5: Now open the bashrc file from your home directory and export the path for Storm. Add the below lines in your bashrc file.

#set Storm home
export STORM_HOME=/home/kiran/apache-storm-1.0.1
export PATH=$PATH:$STORM_HOME/bin

After adding the lines, close and save the file.

Next, source the file using the command source .bashrc.

sourcing bashrc

Step 6: Start the Storm services using the below commands:

As like a Hadoop cluster, storm cluster also has two kinds of nodes

1.Master Node

2.Worker Nodes

The master node runs a daemon called “Nimbus” that is similar to Hadoop’s “JobTracker”. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.

To Start storm nimbus, open a new terminal and move into the bin directory of installed Storm and type the command ./storm nimbus.

starting storm nimbus

Each worker node runs a daemon called the “Supervisor”. The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it.

To start the Storm supervisor, open a new terminal and move into the bin directory of the installed Storm and type the command ./storm supervisor.

Starting storm supervisor

Storm jobs can be traced using its web interface. Storm provides a web user interface from the default port 8080.
To start the Storm UI, open a new terminal and move into the bin directory of installed Storm and type the command ./storm ui.

Starting storm UI

Now, you can check the Storm services running by using the jps command. You can refer to the below screenshot for the same.

Checking the storm daemons using jps

You can also check the status of your Storm cluster by using the UI. For that, open any web browser and type localhost:8080, where 8080 is the port where Storm is running. You can check the status of your Storm using web UI. You can refer to the below screen shot for the same.

Storm web ui

In the above screen shot, we can see the web UI of Storm in a single node cluster.

We hope this post has been helpful in understanding Storm and installing it in a single node cluster. In case of any queries, feel free to comment below and we will get back to you at the earliest.

Keep visiting our site www.acadgild.com for more updates on Big Data and other technologies.

Hadoop

One Comment

  1. I am beginner in Apache Storm. Finally, I got my answer after many hours of surfing the web by reading your article. hope to see more posts on this technology.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close