I want to pass the configuration for the memory of the executors, the number of executor instances, the cores they are allowed to use.
Working solutions
val conf = new SparkConf() .setAppName("WordCount") .setMaster("spark://localhost:7077") .set("spark.executor.memory", "2G")
But only the spark.executor.memory
seems to be in effect. Other settings such as:
val conf = new SparkConf() .setAppName("WordCount") .setMaster("spark://localhost:7077") .set("spark.executor.cores", "2") .set("spark.executor.instances", "2")
does not work.
Finding the answer
This is from the Spark Web UI. Pay attention to the time.
I’m checking the spark log (the default directory is ${SPARK_HOME}/logs
, but you can always find the log location by checking the ${SPARK_HOME}/conf/spark-env.sh
). And here is what I found. Somehow the configuration was set way before the app was running. So changing the configuration within the application code does not have any effect. I saw that for the minute 11:26 the memory was set to 2048M or 2G.
16/07/19 11:26:57 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-cp" "/usr/local/spark-1.6.1-bin-without-hadoop/conf/:/usr/local/spark-1.6.1-bin-without-hadoop/lib/spark-assembly-1.6.1-hadoop2.2.0.jar:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar" "-Xms2048M" "-Xmx2048M" "-Dspark.driver.port=33442" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.1.102:33442" "--executor-id" "0" "--hostname" "192.168.1.102" "--cores" "2" "--app-id" "app-20160719112657-0016" "--worker-url" "spark://Worker@192.168.1.102:39021"
16/07/19 11:23:59 INFO ExecutorRunner: Launch command: "/usr/lib/jvm/java-8-openjdk-amd64/bin/java" "-cp" "/usr/local/spark-1.6.1-bin-without-hadoop/conf/:/usr/local/spark-1.6.1-bin-without-hadoop/lib/spark-assembly-1.6.1-hadoop2.2.0.jar:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/etc/hadoop/:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs/:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/hadoop/share/hadoop/mapreduce/lib/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/contrib/capacity-scheduler/*.jar" "-Xms1024M" "-Xmx1024M" "-Dspark.driver.port=33018" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://CoarseGrainedScheduler@192.168.1.102:33018" "--executor-id" "0" "--hostname" "192.168.1.102" "--cores" "8" "--app-id" "app-20160719112359-0015" "--worker-url" "spark://Worker@192.168.1.102:39021"