Big Data Hadoop & Spark

How Hadoop is Used in Organizations

There were 5 Exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days.” These words were quoted by Ex Google CEO Eric Schmidt in 2010. We must understand the background of his statements. In the past decades the volume and the variety of the recorded information have increased drastically and the existing data storage and processing tools were not able handle all the large amounts of data that started to get created after the Internet revolution and this caused the Hadoop as one of the preferred tools for companies which have data.

We recommend users to go through blogs on understanding Big Data and Big Data Terminologies before proceeding ahead in this blog.

Hadoop in Facebook:

There are many data driven companies which are using hadoop at a great scale but in this blog we will be discussing the implementation of hadoop in few companies like Facebook,Yahoo,IBM,health care organizations. Messaging in facebook has been one of its popular feature since its inception.

Another features of facebook such has like button or status updates are done in Mysql database but applications such as facebook messaging system runs on the top of HBASE which is hadoop’s NoSql database framework.

The data warehousing solution of facebook’s lies in HIVE which is built on the top of HDFS.

The reporting needs of the FACEBOOK is also achieved by using HIVE.

Post 2011 with increase in the magnitude of data and to improve the efficiency facebook started implementing apache corona which works very much like Yarn framework.

In apache corona the a new scheduling framework is used which separates cluster resource management from job coordination.

Hadoop in yahoo:

When it comes about the size of the hadoop cluster,yahoo beats all by having the 42000 nodes in about 20 YARN (aka MapReduce 2.0)clusters with 600 petabytes of data on HDFS to serve the company’s mobile, search, advertising, personalization, media, and communication efforts.

Yahoo uses hadoop to block around 20.5 billion messages and checks it to enter it into its email server.Yahoo’s spam detection abilities has increased to manifolds since it started using hadoop.

In the ever growing family of hadoop,yahoo has been one of the major contributor.

Yahoo has been the pioneer of many new technologies which have already embraced itself into hadoop ecosystem.

Few notable technologies which yahoo has been using apart from mapreduce and hdfs is Apache tez and spark.

One of the main vehicle of yahoo’s hadoop chariot is pig which started in yahoo and it still tops the chart as 50-60 percent of jobs are processed using pig scripts.

Hadoop in Health care companies:

Hadoop in Cancer treatment:

The response of a patients having same type of cancer is different for same cancer medicine and this is because of the each one’s individual genome.

Each person’s genome contains around 1.5gigabytes of the data and to understand how a particular drug responds to a particular genome requires the genomic data to be stored and combined with other data like demographics and trial outcomes and finally an analysis to be done to know which medicine is suitable for which kind of gentic spectrum.

Many top cancer research institutes have applied this hadoop technology to elevate the success rate of their cancer treatments.

Hadoop in checking re-occurrence of heart cardiac attack:

UC Irvine Health in USA while discharging heart patients is equipping them with a wireless scale so that weight measured by them in home could be transferred automatically and wirelessly to the hadoop cluster established in the hospital inside which hadoop algorithm running determines a chance for reoccurrence for heart attack by analyzing the risk factor associated with the received weight data.

Hadoop in Telecom industries:

Telecommunication sector is one of the most data driven industry.

Apart from processing millions of call per seconds it is also providing services for web browsing,videos,television,streaming music,movies,text messages and email.

All these sources have flooded the telecom companies with drastic increase in the data due to which storing and process overhead have increased manifolds.

Some of the case studies related to implementation of Hadoop in telecom sectors has been discussed below:

Analyzing call data records

To reduce the rate of call drop and improve the sound quality,the call details pouring in to the company’s database in real time has to be analyzed to maximum precision.

Telecom companies have been using tools like Flume to ingest the millions of call records per second into hadoop and then using Apache storm for processing them in real time to identify the troubling patterns.

Timely servicing of the equipments

Replacing the equipments from transmission tower of telecom companies is very much costlier than the repairing.

To determine an optimum schedule for maintenance(not too early,not too late),hadoop has been used by the companies for storing unstructured, sensor and streaming data.

Machine learning algorithms are applied on these data to reduce maintenance cost and to do timely repair of the equipments before it gets any problem.

Hadoop in Financial sectors:

Companies in the financial sectors have been using hadoop to do deeper analysis on the data to improve operational margins and to detect the malicious activities which gets unnoticed in the normal scenario.

Some of the case studies which are in practice in financial sectors are as follows:

Anti money laundering practice

Before hadoop,finance companies used to follow the approach where selective storing of the data used to take place by discarding historical data due to storage limitations.

So the sample data available for analytics was not suffice to give a full proof results which could be used to check money laundering.

But now companies have been using hadoop framework for greater storing and processing abilities and to determine the sources of black money and keep it out of the system.

Companies are now able to manage millions of customer names and their transactions in real time and the rate of detecting the suspicious transactions have increased drastically after implementing hadoop ecosystem.

Hadoop in Banks

Many banks across the world have been using Hadoop platform to collect and analyze all the data pertaining to their customers like daily transactional data,data coming from interaction from multiple customer touch points like call centers, home value data and merchant records.

All these data can be analyzed by banks to segregate customers into one or more sections based on their needs in terms of banking product and services,their sales,promotion and marketing accordingly.

Using Big data Hadoop architecture , many credit card issuing banks has been implementing fraud detection system which detects

Suspicious activity by analyzing one’s past history with spending patterns and trends and have been disabling the cards of the suspects.

We hope this blog helped you in understanding various use cases of hadoop in different sectors.Keep visiting our website for more blogs on Big Data and other technologies.


Satyam Kumar

With more than 5 Years of experience, Satyam Kumar is a Subject Matter Expert in Big Data Solutions and has used his depth of experience to help bring new Big Data technologies to production. He has worked on several projects involving Hadoop, HDFS, MapReduce, Kafka, Flume, Hive and Spark.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles