In this post, we will be discussing about Apache storm and its installation in a single node Hadoop cluster.
Let’s start our discussion with what is Apache storm
What is Apache Storm?
Apache storm is an open source distributed system for real-time processing. It can process unbounded streams of Bigdata very elegantly. Storm can be used with any language because at the core of Storm is a Thrift Definition for defining and submitting topologies. Thrift can be used in any language and topologies can be defined and submitted from any language.
Storm has many use cases: real-time analytics, online Machine Learning, continuous computation, distributed RPC, ETL, and more. Storm is fast and is a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant and guarantees that your data will be processed. It is also very easy to set up and operate.
Storm can integrate with the queuing and database technologies already in use. A Storm topology consumes streams of data and processes them in arbitrarily complex ways, repartitioning the streams between each stage of the computation.
You can read more about in this tutorial.
Installing Storm in a Single Node Cluster
For installing Storm, you need to have Zookeeper in your single node cluster. So, let’s start with the installation of Zookeeper.
Step 1: Download Zookeeper from the below link:
Step 2: After downloading, untar it by using the command tar -xvzf zookeeper-3.4.6.tar.gz.
Step 3: Now, create a directory for storing the Zookeeper data, as it needs to store the PID’s of the processes that are running. Here we have created the folder with the name zookeeper in the zookeeper-3.4.6 directory.
Step 4: Next, move into the conf folder of the installed Zookeeper and copy the zoo_sample.cfg file as zoo.cfg using the command cp zoo_sample.cfg zoo.cfg.
Now, open the zoo.cfg and give the path of the directory created for storing zookeeper’s data. You can refer to the below screen shot for the same.
We are done! We have successfully installed the Zookeeper service in your single node Hadoop cluster.
Let’s now export the path of Zookeeper into bashrc file. Move into your home directory using cd command and open the bashrc file using the command gedit .bashrc and type the below lines in the bashrc file.
#set Zookeeper home export ZOOKEEPER_HOME=/home/kiran/zookeeper-3.4.6[Here you should give the path of the installed zookeeper directory] export PATH=$PATH:$ZOOKEEPER_HOME/bin
After adding the lines, close and save the file.
Now, source the file using the command source .bashrc.
Step 5: Start the Zookeeper service by typing the command ZkServer.sh start.
If you get the above message, then your Zookeeper service has been started successfully.
Now, let’s move on to the installation of Storm.
Step 1: Download Storm from the below link:
Step 2: After downloading, untar it by using the below command:
tar -xvzf apache-storm-1.0.1.tar.gz
Step 3: Now, create a folder in the location of your choice, for storing the Storm data. We have created a folder in the apache-storm-1.0.1 directory itself.
Step 4: Now, move into the conf folder and open the storm.yaml file and add the below specified properties. Before that please note that in the storm.local.dir you need to give the path of the directory created for storing Storm data.
storm.zookeeper.servers: - "localhost" storm.local.dir: "/home/kiran/apache-storm-1.0.1/data" [Here you need to give the untared apache-storm-1.0.1 directory] nimbus.host: "localhost" supervisor.slots.ports: - 6700 - 6701 - 6702 - 6703
After adding these properties, close and save the file.
Step 5: Now open the bashrc file from your home directory and export the path for Storm. Add the below lines in your bashrc file.
#set Storm home export STORM_HOME=/home/kiran/apache-storm-1.0.1 export PATH=$PATH:$STORM_HOME/bin
After adding the lines, close and save the file.
Next, source the file using the command source .bashrc.
Step 6: Start the Storm services using the below commands:
As like a Hadoop cluster, storm cluster also has two kinds of nodes
The master node runs a daemon called “Nimbus” that is similar to Hadoop’s “JobTracker”. Nimbus is responsible for distributing code around the cluster, assigning tasks to machines, and monitoring for failures.
To Start storm nimbus, open a new terminal and move into the bin directory of installed Storm and type the command ./storm nimbus.
Each worker node runs a daemon called the “Supervisor”. The supervisor listens for work assigned to its machine and starts and stops worker processes as necessary based on what Nimbus has assigned to it.
To start the Storm supervisor, open a new terminal and move into the bin directory of the installed Storm and type the command ./storm supervisor.
Storm jobs can be traced using its web interface. Storm provides a web user interface from the default port 8080.
To start the Storm UI, open a new terminal and move into the bin directory of installed Storm and type the command ./storm ui.
Now, you can check the Storm services running by using the jps command. You can refer to the below screenshot for the same.
You can also check the status of your Storm cluster by using the UI. For that, open any web browser and type localhost:8080, where 8080 is the port where Storm is running. You can check the status of your Storm using web UI. You can refer to the below screen shot for the same.
In the above screen shot, we can see the web UI of Storm in a single node cluster.
We hope this post has been helpful in understanding Storm and installing it in a single node cluster. In case of any queries, feel free to comment below and we will get back to you at the earliest.