We live in, what many refer to as, the digital age. It is a world that runs on data. To give you a sense of how BIG the data is in this world, here are a few figures. Approximately, ninety per cent of all data in the world today was created within the past decade. Some estimates suggest that more data was created in the last two years than in the rest of human history. And data is expected to continue growing at this alarming rate.
By 2020, every human being will create over 1.5 megabytes of data per second on an average. In 2025, the sum of digital data will add up to 180 zettabytes, which is over 1600 trillion gigabytes. Considering these numbers, it seems like an understatement to say that data is only BIG. Data is huge. And it is changing the way modern societies and businesses function. The following article outlines everything you need to know about data analytics, and introduces you to some key software – R, Tableau and Excel – that analyse data.
What is Data Analytics?
It is quite simply a scientific way of interpreting data (structured and unstructured) in meaningful ways for the purposes of profit and nonprofit organizations. Structured data refers to highly organized information that is easily accessible using simple search algorithms, and useful in creating relational databases – databases that recognize relations between items of information.
Unstructured data is less organized and more difficult to work with. The lack of structure in this form of data makes compiling it a time and energy-consuming task. People often confuse data analytics with data mining, which is the process of retrieving necessary information from a large volume of raw data. Data analytics is distinct, however, because it strictly involves drawing inferences on the basis of relevant information. In this sense, it is a step greater than data mining.
Even “big data”, which data analytics uses, has different meanings for different people. Different sectors and organizations use it differently, and its definition is dependent on a number of factors. The definition varies according to the job description of the person you speak with, the size of their organization, the context in which they acquire data, and how they deem to use it – all of these will have a bearing on how big data is defined by the individual.
At the macro level, it is the total volume of 2.5 quintillion bytes of data generated every day. Big data essentially refers to the data running through various sources in real-time (or close to real-time) that don’t necessarily speak to each other, but may be important to acquire business intelligence. At the micro-level, big data typically means bigger data – a larger data set than organizations have grappled with before. It is more data than organizations are used to.
Advantages of Data Analytics
Data analytics offers a number of opportunities for organizations to leverage the big data advantage. Any company or institution, irrespective of their size, can capitalize on these opportunities to gain critical insights and change their future. By incorporating big data, companies can begin to predict outcomes rather than reacting to them, make informed decisions and choices, and focus their energies better. Good use of big data is guaranteed to improve overall performance of an organization by allowing them to:
– Improve operational processes by streamlining supply chains and increasing productivity.
– Detect fraud and flaws by keeping a close vigil.
– Refine financial processes by increasing visibility, providing insight and granting better control.
– Reduce risk by being predictive instead of being reactive to environment and change.
– Innovate and create new models for growth.
– Improve IT economics by increasing agility, flexibility and abilities of systems.
– Increase transparency of systems and processes in business.
– Reduce cost of managing systems and operations.
– Make quicker, cost-effective decisions.
– Create products and services based on insights.
Skills Required for Data Analysis
Digital companies have been capitalizing on the advantage that data has been giving them for quite some time now. Having seen its potential in the online world, even brick and mortar companies are now increasingly investing in big data and technology to stay ahead in the game.
Despite increased interest in data analytics, however, there is an acute shortage of professionals with good data analytical skills. Only 0.5% of the data we produce is analyzed. Banks utilize only a third of the data that they sit on, and most organizations are still working their way to maximize their data analytical potential. These figures are hardly surprising when we consider that the job of a data scientist did not even exist about a decade ago.
Big data is significantly different from traditional forms of data, and therefore, requires skills that traditional data analysis did not require. The data scientist must possess a varied skill-set that draws from different disciplines like computer science, statistics and business management to thrive in the relatively new, and booming field of data science. The skills a data scientist must possess are as follows:
– Programming Skills: Data scientists working with big data must know coding. The reason is very simple – big data is still evolving. There aren’t standard systems yet, which can deal with complex and large data sets. Part of the job of a data scientist, therefore, is to create and modify those processes that deal with structured, and more importantly, unstructured data. There are many languages that one can learn for coding. For the data scientist Python, R, and Java are only some of the languages that are important to know. The more languages one knows, the better data scientist one can be.
– Technical Skills: Besides programming, data scientists have to be proficient in other technical programs and platforms like Hadoop, Hive, Spark, etc., that allow them to retrieve, manage, and analyze various forms of data from different sources.
– Warehousing Skills: Data warehousing is the process of gathering data from multiple sources for the purpose of analysis and inferring. It is the process of relating different items of information. The process requires the data scientist to innovate and possess good analytical skills. Any data scientist with good warehousing skills will be popular with the employers due to the shortage of these skills in the market.
– Quantitative & Statistical Skills: While technology is a key component of big data analysis, quantitative and statistical skills that are essential for traditional data analysis, are also important for data analytics.
– Analytical & Interpretation Skills: Data analytics obviously cannot be done without the know-how and knack to analyse and interpret data. Big data being bigger, more complex and unorganized than traditional data, requires strong analytical and interpretive skills.
– Business Acumen: Lastly, the data scientist must possess business acumen to be able to use the data effectively and improve the various aspects such as operations, finance, productivity, etc.
The Future for Data Analysts
Data analysts are soaring worldwide. They are extremely popular among employers and job seekers alike. The increasing reliance on data for all business operations has made those capable of analyzing data useful human resources. And their widespread demand has attracted more professionals than ever before to the booming field of data analytics.
“Data is useless without the skill to analyze it”, as Jeanne Harris (Senior Executive at Accenture Institute for High Performance) rightly pointed out. According to McKinsey Global, the US alone could face a shortage of 140,000 to 190,000 data analytics professionals. And a survey by Robert Half Technology revealed that more than half of the CIOs interviewed in their survey felt they were understaffed and not utilizing their data analytical potential.
The shortage of data analysts is a problem in many parts of the world. In India too, a report by the Times of India suggests that there could be a shortage of around 200,000 data analytics professionals until 2020. This mismatch between the demand and supply for data analysts, and the competition between organizations from all sectors for their services means that data analysts have more options for employment, along with the privilege of demanding higher salaries in comparison with other professionals.
Srikanth Velamakanni, the CEO of Fractal Analytics, thinks that this is just the beginning. He predicts that the coming years will see the significance of data analytics grow over three times within the global IT industry. Studies conducted by QuinStreet Inc. and ‘Peer Research – Big Data Analytics’ also conclude that data analytics is a top priority for business organizations, and that it improves overall performance. Most organizations are already using data analytics (at least partially), or are in the process of adopting analytics for business operations.
The growing significance of analytics is also crucial for India particularly. Despite the shortage of data analysts, the country is concentrating greatly on this surging field as countries from around the globe are outsourcing their work to Indian professionals. Higher wage structures in the developed countries make analytics more expensive there. In comparison, India is a more viable solution for business to solve their analytical problems.
Talk About Moolah
Data analysts, as discussed above, can demand greater salaries than most other professions due to the shortage of professionals in the market, and their increased demand in the business world. A report published by the Institute of Analytics Professions of Australia (IAPA) in 2015 placed the median annual salary of data analysts surveyed at $ 130,000, which is 184% of the median full-time salary in the country.
At the same time in 2015, the Great Lakes Institute of Management in India produced a report on the trends in the Indian Analytics Industry. According to the report, the average salary of analytics professionals had increased by 21% in comparison to the previous year, and 14% of the professionals earned more than Rs 15 lakhs per annum.
In UK too, the salaries of data analysts increased from £ 55,000 in 2015 to £ 62,500 in early 2016. The change in median salary was of 13.63% in the positive direction. There is no doubting this upward trend in salaries for data analysts is here to stay for a while. Sooner rather than later all organizations will have to turn to data simply to keep up with the competition. It is as Velamakanni suggested – the beginning of the analytics revolution.
As the significance of data grows in the business world, there will be a corresponding increase in significance of professionals working in analytics. This growing significance is already reflected in the varied positions data analysts occupy in organizations presently. The trend, once again, indicates that this list is still in the growing phase. Let us look, however, at some of the job profiles that are already popular:
- Data Analyst
- Analytics Consultant
- Business Analyst
- Analytics Manager
- Data Architect
- Metrics and Analytics Specialist
- Analytics Associate
These are only some of the job titles that data analysts acquire in business organizations. The list is presumably greater. Depending on responsibilities and abilities, data analysts can work under n number of job designations to perform several analytical functions.
Introduction to R
R is a language and environment for statistical computing and statistical graphics. It is a GNU project that is similar to S – a language developed by Bell Laboratories. Codes of S also work in R. The open source platform offers a number of features such as linear and nonlinear modelling, time-series analysis, classification and clustering of data, classic statistical testing, etc., that are useful in statistical analysis and representation.
R is easy to use and allows you to design your publication well. You can include mathematical symbols and formulae if necessary, and produce quality plots using this platform. R lets you pay attention to details. It provides many minor options in design to give the user greater control over graphics. It is a free software under the terms of GNU’s General Public License, which runs on a number of platforms and systems like FreeBSD, Linux, Windows and the MacOS.
The R Advantage
R is a bundle of software facilities that is useful in manipulating data, calculating it, and displaying it graphically. It allows you to handle data well and store it effectively. The platform offers a number of coherent and integrated tools for data analysis and representation. It is a simple programming language that allows you to set conditions and loop data. The statistical computing program offers a number of input and output options along with user-defined recursive functions. It is an environment because it offers everything a data scientist needs for analysis, unlike other software which start with specific, inflexible tools, and then develop gradually in an incremental fashion.
Like in S, R lets users increase functionality by adding new functions. Most of the language is easy for users to follow as it is a dialect of S. For intensive computation tasks, you may link and use other languages like C and C++, or Fortran. Those users more comfortable with C may use its code to change objects in R directly. Lastly, R is extendable. You can use packages from R distributions or CRAN family of internet sites to increase its functionality.
Introduction to Tableau
Tableau is becoming increasingly popular in data science circles. According to Fortune, it has “pioneered the concept of visual analytics”. Tableau is a software firm based in Seattle that released its first public offering in 2013. Their idea was to make spreadsheets, databases, and similar information sources easy to use for the layperson. To this end, the co-founders Christian Chabot (CEO), Chris Stolte (Chief Development Officer), and Pat Hanrahan (Chief Scientist) began working on a database visualization language called Visual Query Language (VizQL). The language laid the foundation for the company to release its eponymous software that can query relational databases, cloud them and spreadsheets, and generate a variety of graph types. You can even collate these graphs on the dashboard, and share the using a computer network or the internet.
Data visualization has now become an art that Tableau has made easy. It is the art of presenting data in a comprehensible and aesthetic fashion. When done correctly, it can produce masterpieces that communicate useful information and deliver key insights to help businesses make better decisions. For this reason, data visualization has become an important part of business analytics. With increased access to data, managers have turned to visualization to ease understanding and decision-making. Tableau and QlikView are the most popular software data visualization software at the moment.
Clients and Products
Tableau’s popularity reflects in its client base. It boasts of over 28,000 user accounts which includes organizations like Coca-Cola, Exxon Mobil, Homeland Security, Nike, Adobe, and the World Bank, to name a few. Cornell University uses it for a number of purposes like managing contributor relations, visualize statistics pertaining to faculty salaries, and to record the enrollment choices of their students. Zillow uses it to gather business intelligence. In 2015, Tableau was one of the “leaders” in Gartner’s annual report on business intelligence and analytics. The title, which they have won three times in a row, is testament of the software’s capabilities. They offer the following products:
– Desktop version: for users to collect data and perform queries without having to write code. You can also navigate and visualize the data easily.
– Tableau Server: for users of any organization to access data in real-time using a web browser or mobile phone.
– Online version: the hosted version of Tableau Server that doesn’t need to be setup.
– Tableau Reader: to publish interactive data online.
– Public version: lets users to access files saved on Tableau Desktop.
*Only Tableau Reader and Public are free.
Introduction to Excel
Compared to R and Tableau, Excel might seem outdated. Having said that, it is still quite popular for two reasons – accessibility and usability. It also allows you to integrate with other components of MS Office to make presentations and reports easily. Considering how hugely popular the Office suite is, this is an advantage.
Excel is useful for managing, manipulating and presenting data. It offers a number of functions for statistical analysis – descriptive and inferential, and provides a standard spreadsheet that you can use for a variety of tasks like updating databases, or representing data graphically. Excel can also be useful in collating data for analysis using a number of other software.
Four features of Excel are particularly useful for data analysis. Excel has a number of built-in statistical functions, which includes tests. It offers an add-in Data Analysis ToolPak that you can use to perform more extensive functions like inference testing. Charts, which became popular before the advent of advanced data visualization software, continue to be an easy and effective way to explore and explain data. And lastly, pivot tables offer an easy way to generate summaries and organize data for tasks. They are useful for carrying out cross-tabulations, creating contingency tables, or table of means and other statistical summaries.
To conclude, data analysts are extremely popular among employers and job seekers alike. Organizations are interested in analysts to maximize their data potential, while professionals are interested in capitalizing on the analytical crunch in many parts of the world. India and other countries are focusing greatly on analytics as it allows them to gain revenue from domestic and international markets. The increased salaries of analytics professionals in different parts of the world also indicate this emerging trend. This is only the beginning, however. The number of jobs for analytics professionals, and their significance in the data driven world looks set to soar further.