All Categories

Spark Use Case: Pokémon Go Data Analysis

In this article, we use Spark queries to analyze a dataset from Pokémon Go.

What is Pokémon Go?

Pokémon Go is a location-based augmented reality game from Niantic. It is available on iOS and Android devices. The game was released in select countries in July 2016, and is a free download. It features virtual creatures called Pokémon from the Pokémon World. The object of the game is to locate, capture, train, and battle with these evolving creatures. You can catch Pokémon using Pokéballs. To buy Pokéballs, you need PokéCoins, which can be purchased from the in-game store, or earned by meeting certain in-game criteria.


The dataset for analysis is available here. It consists of 11 columns with the following information:

  1. Pokémon ID number.

  2. Name of the Pokémon.

  3. (Type 1)Type of Pokémon. For example, Charmander is of the fire type. Pokémon don’t necessarily have to be of one type only. Which means, Bulbasaur could be a grass type, as well as, a poison type Pokémon. There are 18 types in all, 324 ways of assigning these types

  4. Same as 3(Type 2)
  5. Character points, which is the sum of all capabilities in attack and defense.

  6. Hit points (HP), which determines how much damage can a Pokémon take. It is the stat that changes most often. When a Pokémon’s HP is down to zero, it will faint and lose the battle.

  7. Attack stat.

  8. Defense stat.

  9. Special attack stat.

  10. Special defense stat.

  11. Speed.

Type 1: This column represents the property of a Pokémon.

Type 2: This column represents the extended property of the same Pokémon.

With the current 18-type system, there

are 324 possible ways to assign these types to Pokémon, along with 171 unique combinations.

As of Generation VI, 133 different type combinations have been used.

Data Analysis in Spark

Let’s begin by creating a table that can contain the data set.


Source Code :

case class pokemon(Number:Int,Name:String,Type_1:String,Type_2:String,Total:Int,HP:Int,Attack:Int,Defense:Int,Sp_Atk:Int,Sp_Def:Int,Speed:Int)

val pokemons = sc.textFile(“file:///home/acadgild/prateek/Pokemon.csv”).map(x => x.split(“,”)).map(x => pokemon(x(0).toInt,x(1),x(2),x(3),x(4).toInt,x(5).toInt,x(6).toInt,x(7).toInt,x(8).toInt,x(9).toInt,x(10).toInt)).toDF.foreach(println)


1.Find the average HP (Hit Points) of all the Pokémon, using the following query.

Val HP = spark.sql(“select avg(HP) from pokemon”).collect

2.Create and insert values of the existing table “pokemon” into a new table “pokemon1,”

with an additional column “power_rate” to find the count of “powerful” and “moderate”

from the table “pokemon1.”

val pok = spark.sql(“create table pokemon1 as select *, IF(HP>69.25875, ‘powerful’, IF(HP<69.25875, ‘Moderate’,’powerless’)) AS power_rate from Pokemon”)

val pok1 = spark.sql(“select * from pokemon1”).collect

3.Find the number of powerful and moderate HP Pokémon in the data set with this query.

val num = spark.sql(“select COUNT(name),power_rate from pokemon1 group by power_rate”).collect

4. List the top 10 Pokémon (according to HP) using the following query.

val top10 = spark.sql(“select Name,HP from pokemon1 order by HP desc limit 10”).collect

4.List the top 10 Pokémons (according to attack stat) using the following query.

val top10_atk = spark.sql(“select name,attack from pokemon1 order by attack desc limit 10”).collect

5.Find the top 10 Pokémon (according to defense stat) using this query.

val top10_defense = spark.sql(“select name,defense from pokemon1 order by defense desc limit 10”).collect

6.Rank the top 10 Pokémon (according to total power) using the following query.

val top10_power = spark.sql(“select name,total from pokemon1 order by total desc limit 10”).collect

7.Find the top 10 Pokémon with the most change in their attack and sp.attack,

using this query.

val top10_diff = spark.sql(“select name,(attack-sp_atk) as atk_diff from pokemon1 order by atk_diff limit 10”).collect

8.Find which 10 Pokémon with the most change in their defense and special defense using this query.

val top10_diff_defense = spark.sql(“select name,(defense-sp_def) as def_diff from pokemon1 order by def_diff limit 10”).collect

9.List the 10 fastest Pokémon using this query.

val top10_fast = spark.sql(“Select name, speed from pokemon order by speed desc limit 10”).collect

That brings us to the end of this blog article. Our next article will use data analytics to select a team for the Pokémon Fight League (PFL). Subscribe to our blog to receive notification. Alternately, you can visit our website for updates on the Big Data Ecosystem and other technologies.




Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles