Problem with configuring Scala Spark Application

I want to pass the configuration for the memory of the executors, the number of executor instances, the cores they are allowed to use.

Working solutions

val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("spark://localhost:7077")
.set("spark.executor.memory", "2G")

But only the spark.executor.memory seems to be in effect. Other settings such as:

val conf = new SparkConf()
.setAppName("WordCount")
.setMaster("spark://localhost:7077")
.set("spark.executor.cores", "2")
.set("spark.executor.instances", "2")

does not work.

Finding the answer

This is from the Spark Web UI. Pay attention to the time.

scala-spark-configuration.png

I’m checking the spark log (the default directory is ${SPARK_HOME}/logs, but you can always find the log location by checking the ${SPARK_HOME}/conf/spark-env.sh ). And here is what I found. Somehow the configuration was set way before the app was running. So changing the configuration within the application code does not have any effect. I saw that for the minute 11:26 the memory was set to 2048M or 2G.

16/07/19 11:26:57
INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-cp" "/usr/local/spark-1.6.1-bin-without-hadoop/conf/:/usr/local/spark-1.6.1-bin-without-hadoop/lib/spark-assembly-1.6.1-hadoop2.2.0.jar:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar"
"-Xms2048M" "-Xmx2048M"
"-Dspark.driver.port=33442" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.1.102:33442" "--executor-id" "0" "--hostname" "192.168.1.102" "--cores" "2" "--app-id" "app-20160719112657-0016" "--worker-url" "spark://Worker@192.168.1.102:39021"
16/07/19 11:23:59
INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-cp" "/usr/local/spark-1.6.1-bin-without-hadoop/conf/:/usr/local/spark-1.6.1-bin-without-hadoop/lib/spark-assembly-1.6.1-hadoop2.2.0.jar:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar"
"-Xms1024M" "-Xmx1024M"
"-Dspark.driver.port=33018" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.1.102:33018" "--executor-id" "0" "--hostname" "192.168.1.102" "--cores" "8" "--app-id" "app-20160719112359-0015" "--worker-url" "spark://Worker@192.168.1.102:39021"

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s