Self-Contained ApplicationsProblem: how to verify that we have Spark installed correctly.
Suppose we have set up correctly everything for spark in the directory $SPARK_HOME/conf. Now invoke
spark-shell in the command line.
Suppose that we have a file in the machine that we are running the shell. And the file is at the location
/home/hadoop/spark/README.md then in the
spark-shell we invoke:
scala> val textFile = sc.textFile("file:/home/hadoop/spark/README.md") scala> textFile.count()
For more functions to test, you can check out Spark Tutorial – Quick Start
Follow the section Self-Contained Applications in the Quick Start. Remember to modify the variable
$ spark-submit SimpleApp.py Lines with a: 60, lines with b: 29
Or you can also running with the example code from Spark. The result is not going to show in your currently running shell. Now let just execute.
$ spark-submit $SPARK_HOME/examples/src/main/python/pi.py 4
You can check for log directory of spark. Try one of these commands
$ cat $SPARK_HOME/conf/spark-evn.sh export SPARK_DAEMON_JAVA_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70" export SPARK_LOCAL_DIRS=/mnt/spark,/mnt1/spark,/mnt2/spark,/mnt3/spark export SPARK_LOG_DIR=/mnt/var/log/apps export SPARK_CLASSPATH="/home/hadoop/spark/conf:/home/hadoop/conf:/home/hadoop/spark/classpath/distsupplied/*:/home/hadoop/spark/classpath/emr/*:/home/hadoop/spark/classpath/emrfs/*:/home/hadoop/share/hadoop/common/lib/*:/home/hadoop/share/hadoop/common/lib/hadoop-lzo.jar:/usr/share/aws/emr/auxlib/*" $ echo $SPARK_LOG_DIR /mnt/var/log/apps
So now try to go to
/mnt/var/log/apps directory, and read the stdout file.