Here is your very first Apache Spark program using Java: the equivalent of the Kernighan and Ritchie’s “Hello, World”.

package net.jgp.labs.spark;

import org.apache.spark.SparkConf;
import org.apache.spark.SparkContext;

public class HelloSpark {

	public static void main(String[] args) {
		SparkConf conf = new SparkConf().setAppName("Hello Spark").setMaster("local");
		SparkContext sc = new SparkContext(conf); 
		System.out.println("Hello, Spark v." + sc.version());
	}

}

You can download it from GitHub:

https://github.com/JGPnet/net.jgp.labs.spark.git

Basically, the key is to create a local configuration – our conf object, then a context from where you will do everything, including displaying the version number.

I used Maven, and I simply added in the dependencies:

  	<dependency>
  		<groupId>org.apache.spark</groupId>
  		<artifactId>spark-core_2.10</artifactId>
  		<version>1.6.1</version>
  	</dependency>

The output is pretty rough, as I left logging on:

16/06/26 20:00:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52832.
16/06/26 20:00:54 INFO NettyBlockTransferService: Server created on 52832
16/06/26 20:00:54 INFO BlockManagerMaster: Trying to register BlockManager
16/06/26 20:00:54 INFO BlockManagerMasterEndpoint: Registering block manager localhost:52832 with 1140.4 MB RAM, BlockManagerId(driver, localhost, 52832)
16/06/26 20:00:54 INFO BlockManagerMaster: Registered BlockManager
Hello, Spark v.1.6.1
16/06/26 20:00:54 INFO SparkContext: Invoking stop() from shutdown hook
16/06/26 20:00:54 INFO SparkUI: Stopped Spark web UI at http://10.0.100.100:4040
16/06/26 20:00:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/06/26 20:00:54 INFO MemoryStore: MemoryStore cleared
16/06/26 20:00:54 INFO BlockManager: BlockManager stopped
16/06/26 20:00:55 INFO BlockManagerMaster: BlockManagerMaster stopped
16/06/26 20:00:55 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/06/26 20:00:55 INFO SparkContext: Successfully stopped SparkContext
16/06/26 20:00:55 INFO ShutdownHookManager: Shutdown hook called
16/06/26 20:00:55 INFO ShutdownHookManager: Deleting directory /private/var/folders/vs/kl6qlcvx30707d07txrm_xnw0000gn/T/spark-c3f8f992-b75a-4d69-944b-e851658c75a2

Note:
If you have the following error:

16/06/26 19:59:08 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:401)
	at net.jgp.labs.spark.HelloSpark.main(HelloSpark.java:10)

It means that you forgot to specify where the master is by using. You can resolve this by using: setMaster("local").

One thought on “Your Very First Apache Spark Application

Leave a Reply