Free Shipping

Secure Payment

easy returns

24/7 support

  • Home
  • Blog
  • Spark SQL Use Case – Machine and Sensor Data Analysis

Spark SQL Use Case – Machine and Sensor Data Analysis

 July 19  | 0 Comments

In this post, we shall be discussing machine and sensor data analysis using Spark SQL. Here, we have the temperatures collected every minute, from 20 top buildings all over the world. After this analysis, we can conclude the building in which country has the most number of temperature variation.

For this data analysis, you can download the necessary dataset from this link.

In the above link there are two datasets; building.csv contains the details of the top 20 buildings all over the world and HVAC.csv contains the target temperature and the actual temperature along with the building Id.

HVAC (heating, ventilating/ventilation, and air conditioning) is the technology of indoor and vehicular environmental comfort. Its goal is to provide thermal comfort and acceptable indoor air quality. Through the HVAC sensors, we will get the temperature of the buildings.

Here are the columns that are present in the datasets:

Building.csv – BuildingID, BuildingMgr, BuildingAge, HVACproduct,Country

HVAC.csv – Date, Time, TargetTemp, ActualTemp, System, SystemAge, BuildingID

Now, let’s perform analysis on the HVAC dataset to obtain the temperature changes in the building. We are performing this analysis using Spark SQL. The following is the code for performing this analysis.


  • The first three lines of code will remove the header from the CSV file.

  • In the 4th line, we are writing a case class holding the schema of the dataset.

  • In the 5th line, we are splitting each row of the dataset with the delimiter ‘as’ and we are mapping the columns to our case class and finally, we are converting it into a data frame.

  • In the 6th line, we are creating a table HVAC for our dataframe.

  • In the 7th line, we are performing an SQL query on the table, which creates one new column tempchange, which will set to if there is a temperature change of either +5 or -5 between the actual_temperature and the target_temperature.

  • In the 8th line, we are registering that table as HVAC1.

Now, let’s create a table for the building dataset.


  • The first 3 lines of the dataset will remove the header from the dataset.

  • In the 4th line, we have defined a case class holding the schema of the building dataset.

  • In the 5th line, we are mapping the dataset to the case class which we have built.

  • In the 6th line, we are registering the build dataframe as a table building.

Now, we will join both the tables building and hvac1 as shown below.