Big Data Hadoop & Spark

Running Spark in Eclipse Using SBT

This article aims to provide a clear idea of running Spark application in Eclipse using SBT.  SBT stands for Simple Build Tool, which is used for pulling the dependencies required to build the project. It is just like Maven, Ant or Gradle.

Before proceeding to running Spark we recommend you to install SBT. If you have not installed the above-mentioned tool, you can go through this link for a clear guidance on how to install SBT.

Once SBT is installed, you need to add the SBT-eclipse plugin details, so that you don’t have to repeat the same steps for every project you create. To do this,  create a plugin directory inside SBT using the below syntax.

Steps to Create a Plugin Directory

mkdir -p .sbt/sbt-version/plugins

Example:

mkdir -p .sbt/0.13/plugins

Then, create a plugin.sbt file inside the .sbt/0.13/plugins directory and add the following details.

sudo gedit .sbt/0.13/plugins/plugins.sbt

After this, you have to create a project directory that will contain the Spark code and build the .sbt file.

Also, make sure that the main directory has sub-directories src/main/scala and .scala file inside this sub-directory.

mkdir sparkproj

Note:You can give any name to your project directory. Here, we have used sparkproj as project directory.

Change the path from current working directory to sparkproj.

cd sparkproj

Initially, this project directory will not contain anything. Now, as stated earlier, let’s create sub-directories src/main/scala in the same order.

mkdir -p src/main/scala

Inside the Scala directory, we need to create our Spark code ( .scala file). We have used the file name myspark.scala and this will contain the logic.

gedit myspark.scala

Let’s put the below logic in the file. The code is self-explanatory and easy to understand. We are finding the number of occurrences for the word sumit,satyam in the input file.

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object myspark {
  def main(args: Array[String]) {
	val inpfile = "/home/acadgild/Desktop/sparkinp.txt"  // location of input file which is there in local FS.
	val conf = new SparkConf().setAppName("myapp").setMaster("local[2]")
	val sc = new SparkContext(conf)
	val data = sc.textFile(inpfile, 2).cache()
	val sumitcount = data.filter(line => line.contains("sumit")).count()
	val satyamcount = data.filter(line => line.contains("satyam")).count()
	println("Lines with sumit: %s, Lines with satyam: %s".format(sumitcount, satyamcount))
  }
}

Now, save and close the .scala file, and revert to the main project directory to create a build.sbt file.

Build.sbt will contain all the dependencies required by our project. As we have already written a simple Spark code, we just need to add Spark dependencies. Below are the details that need to be present in the file.

Running Spark

As the Scala and Spark version in Acadgild spark VM is 2.10.4 and 1.6 respectively, same details are available in the build.sbt file. If you are using your own setup, make sure to fill the version details accordingly. Now we are ready to build the project using SBT-eclipse. Before we proceed, let’s cross-check the project structures once again.

Next, we need to go back to the main project (sparkproj) directory and fire the command sbt eclipse from the terminal to build it.

Note: When you run SBT for the first time, it will take a longer time to download the dependencies.

Once the project is built successfully, it can be imported to eclipse.

Open eclipse → File→ import→ general → existing project into workspace.

Next, Click Browse and select the main project (myproj) directory. Once the project is imported, wait for some time so that eclipse can build the workspace. Once done, run the code inside eclipse. After running spark, required results can be seen in the eclipse output console.

Hope this post will be helpful to understand running Spark in eclipse using SBT. In the case of any queries, feel free to comment below and we will get back to you at the earliest.
Keep visiting www.acadgild.com for more updates on the courses

Tags

One Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Related Articles

Close