Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Soccer Data Analysis Using Apache Spark SQL (Use Case)

Soccer Data Analysis Using Apache Spark SQL (Use Case)

 July 19  | 0 Comments

In this blog series, we will discuss a real-time industry scenario where the spark SQL will be used to analyze the soccer data. Nowadays spark is boon for technology.it is the most active open big data tool which is used to reshape the big data market. spark is 100 times faster than the Hadoop and 10 times faster than the accessing data from the disk.

Advantages of spark for data analysis:

 

1) Inbuilt machine learning libraries.

2) Efficient in interactive queries and iterative algorithm.

3) Provides highly reliable fast in-memory computation

4) Provides processing platform for streaming data using spark streaming

5) Fault tolerance capabilities because of immutable primary abstraction named RDD.

6) Highly efficient in real-time analytics using spark streaming and Spark SQL.

Now we will start our analysis and we have to do some certain steps

 

Step 1: Launch Spark shell

Code:

spark-shell –packages com.databricks:spark-csv_2.10:1.5.0

Code Explanation:

Here we have launched spark shell to write application or code. We have used the package from data bricks which will help us to read the data from CSV easily.

Output on the console:

 

 

Step 2: Create and Import SQL context.

Code:

import org.apache.spark.sql.SQLContext<br>
val sqlContext = new org.apache.spark.sql.SQLContext(sc)<br>

Code explanation:

To use the functionalities of spark SQL we create class SQLContext and we import it. By default, the spark context object is initialized with the name sc when we start spark shell.

Output on the console:

3) Step 3: Download importing soccer_data_set.csv file & Creating data frame

Download the Dataset from the below link.

https://s3.amazonaws.com/acadgildsite/wordpress_images/bigdatadeveloper/Olympics_Analysis/Soccer_Data_Set.docx.csv

Code:

val data = sqlContext.read.format(“com.databricks.spark.csv”).option(“header”, “true”).option(“delimiter”,“,”).load(“/home/acadgild/Desktop/Soccer_Data_Set.docx.csv”)

Code explanation:

We have created one variable called data and we have read the CSV file after reading the CSV file we have stored all the data into variable data

Output on the console:

4) Step 4: Create a table from the data.

Code:

data.registerTempTable(“olympics”)

Code explanation:

first of all, you can see in the previous we have created a data frame from the CSV file. also, we are creating a temporary table from the data frame in order to execute the queries which we want.

Output on the console:

 

As a result, we have created a table and so we can fire a set of the query to get the desired output.

Now we have some realtime problems for which we have to find out the solution here are some problems mentioned below with the solution.

 

1) Find the total number of bronze medals won by each country in Football.

Code: 

val result1= sqlContext.sql(“select Country,count(Medal) as Medal from olympics where<br> Sport==’Football’ and Medal==’Bronze’ group by Country”).show()

Output on the console:

 

 

2) Find the number of Medals won by the USA grouped by sport.

Code:

val result2= sqlContext.sql(“select Count(Medal) as Medal,country,sport from olympics where<br> Country==’USA’ group by Sport,country”).show()

Output on the console:

 

 

3) Find the total number of medals won by each country displayed by type of medal.

Code:

val result3= sqlContext.sql(“select Country,Medal,Count(Medal) as Count from olympics group<br> by Medal,country”).show()

Output on the console:

 

 

4) Find how many Silver medals have been given to Mexico and the year of each.

Code:

val result4= sqlContext.sql(“select Country,Medal,Count(Medal) as Count,year from olympics<br> where Medal=’Silver’ and Country=’MEX’ group by Medal,country,year”).show(false)

Output on the console:

 

 

Finally, we have done our analysis and showed the result on the spark console.

 

We hope the above blog helped you to understand the detailed functioning of HDFS. Keep visiting our site for more updates on Big data and other technologies. Click here to learn Scala language which is used in spark.

>