Big Data Hadoop & Spark

Cassandra Installation on CentOS

If you need scalability and high availability without compromising performance, then the Apache Cassandra Database is the right choice for you. Cassandra is a distributed database; initially developed by Facebook, it later on came under the Apache forum.

In this post, we will be cover basic topics as an introduction of Cassandra and it’s Installation, followed by advanced topics in our next post.

100% Free Course On Big Data Essentials

Subscribe to our blog and get access to this course ABSOLUTELY FREE.

Let’s start with what makes Cassandra so popular:-

  • Highly scalable and high performing distributed database designed to manage huge amounts of data.
  • Provides high availability with no single point of failure.
  • It is a NoSQL database.
  • It is fault-tolerant and consistent.
  • Designed on Google’s Bigtable data model.
  • Provides Transaction support ( it supports ACID properties).
  • Performs very fast writes, without failure in read efficiency.

According to Apache Cassandra forum, Cassandra is used by Constant Contact, CERN, Comcast, eBay, GitHub, GoDaddy, Hulu, Instagram, Intuit, Netflix, Reddit, The Weather Channel, and over 1500 more companies that have large and active data sets.

Cassandra can be accessed using CQLSH ( Cassandra query Language shell) as well as different languages drivers.

Let’s now look at the steps for installing Cassandra:

Step 1: Create a sudo user for Cassandra.

useradd acadgild

useradd acadgild

passwd acadgild

paswd acadgild

Then, add this user to root’s ‘visudo’ file to provide sudo privileges as shown below.

Step 2: Setup SSH using the below command.

sudo yum install openssh-server  -y


When it’s installed, run the below command to generate public and private key for this user.

ssh-keygen -t rsa

Go to .ssh directory :

cd .ssh

then, copy id_rsa.pub to authorized_keys file

sudo cat id_rsa.pub >> authorized_keys


Also set permission to access this key:

sudo chmod 600 .ssh/authorized_keys

7

Now, Restart ssh services :-

sudo service sshd restart

Run below mentioned command to start ssh services on system startup:

sudo chkconfig sshd on

You can also  verify it, whether ssh services has been started or not –

ssh localhost

If you are able to login at localhost without any authentication, it means ssh services has been started and running .

Hadoop

Step 3: Java Installation –

Use this link to download Java:

http://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html

On clicking the above link, a screen prompts you to select the required version. Select the option highlighted in the below image.

On clicking the above option, it will begin to download and gets saved in the Downloads folder.

You can transfer this to your dedicated directory or wherever you want. I’m moving it to my /home/acadgild directory.

Untar the zip file and extract it using the command shown in the below screenshot.

Enter the command ls to see the extracted Java in the same folder /home/acadgild.

To make Java available to all users, you have to move extracted jre directory into the location “/usr/local/”.

Then, install the Java Native Access (JNA), which will improve Cassandra’s memory usage using the below command.

sudo yum install jna -y

Next, Use following command to add a symbolic link to the Oracle Java SE Runtime Environment installation, so that your system uses the Oracle JRE instead of the OpenJDK JRE.

sudo alternatives --install /usr/bin/java java /usr/local/jre1.8.0_91/bin/java 20000

Then, use the alternatives command to set default Java environment.

alternatives --config java

Verify default java using below command.

java -version

Step 4: Install Cassandra.

create datastax repository, if it’s not available.

sudo vi /etc/yum.repos.d/datastax.repo

Add the following line to this repo.

Now, run the below command to install Cassandra.

sudo yum install dsc30

Step 5: Set path for Java into .bashrc.

export JAVA_HOME=/usr/local/jre1.8.0_91/

export PATH=$PATH:/usr/local/jre1.8.0_91/bin/

Now, apply these updates in the running system using bellow command-

source .bashrc

Step 6: Running Cassandra.

When configuring Cassandra for single node cluster, you don’t need to add extra changes. You can straight away start Cassandra services using the below command.

sudo service cassandra start

Next, type cqlsh to start cqlsh shell:-

You can also check the status of the Cassandra service using the below command.

Sudo service cassandra status

You can use the below command to check the status of Cassandra node.

nodetool status

Thereafter, We will  proceeds to cover CQLSH operations in our next post. 

Hope this post has been helpful in understanding the steps involved in installing Cassandra. In case of any queries, feel free to comment below and we will get back to you at the earliest.

For more resources on Big Data and other technologies, keep visiting www.acadgild.com

Spark

One Comment

  1. create datastax repository, if it’s not available.

    Add the following line to this repo.
    “Can you please add what needs to be added”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close
Close