This video intends to be a part of a short discussion about the problems faced with Big Data. There are some common challenges with Big Data like, Real-Time Analysis, Traditional Storage, Processing, Computation, etc.
Hadoop is an open-source framework for processing Big Data. These days, there are many Hadoop distributions to choose from, and one of them happens to be Apache Hadoop distribution, which comes under Apache Software Foundation. This distribution is free and has a huge community behind it.
Let’s begin with the fundamental in Hadoop, the challenges faced by it and some case studies to understand it better.
The biggest problem with Big Data is that, it is incomprehensible to humans at scale. We can’t get machines to help us enough and yet Big Data keeps getting bigger and bigger. Basically, we are drowning in our own data.
The rise of universal computing and ever more endpoints communicating in their own feedback loops with the Cloud keeps the data growth going at double-digit rates. We are finding it hard to keep up with it.
- Real Time Analysis – Real time analysis is not achievable on the entire data, if it is stored traditionally.
- Storage – Data Analysis is not attainable on traditionally stored data. Traditional storage strategy is to achieve old data and make only a portion of the whole data available for real-time analysis.
- Processing – The problem is that the processing tools are not making use of Distributed processing. Even though we have tools like Teradata and SQL, they fail to process petabytes of data. As RDBMS uses single node processing, in this scenario, it failed to process huge amount of data.
- Computing– Traditional client-server architecture is unable to meet the challenges of real-time complex data processing, which is needed in Big Data scenario.