How to test Spark

Self-Contained ApplicationsProblem: how to verify that we have Spark installed correctly.


Suppose we have set up correctly everything for spark in the directory $SPARK_HOME/conf. Now invoke spark-shell in the command line.

Suppose that we have a file in the machine that we are running the shell. And the file is at the location /home/hadoop/spark/ then in the spark-shell we invoke:

scala> val textFile = sc.textFile("file:/home/hadoop/spark/")
scala> textFile.count()

For more functions to test, you can check out Spark Tutorial – Quick Start


Follow the section Self-Contained Applications in the Quick Start. Remember to modify the variable logFile.

$ spark-submit
Lines with a: 60, lines with b: 29


Or you can also running with the example code from Spark. The result is not going to show in your currently running shell. Now let just execute.

$ spark-submit $SPARK_HOME/examples/src/main/python/ 4

You can check for log directory of spark. Try one of these commands

$ cat $SPARK_HOME/conf/
export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70"
export SPARK_LOCAL_DIRS=/mnt/spark,/mnt1/spark,/mnt2/spark,/mnt3/spark
export SPARK_LOG_DIR=/mnt/var/log/apps
export SPARK_CLASSPATH="/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/distsupplied/*:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar:/usr/share/aws/emr/auxlib/*"


So now try to go to /mnt/var/log/apps directory, and read the stdout file.




