Problem: how to run PySpark in Jupyter notebook.
Some assumption before starting:
- You have Anaconda installed.
- You have Spark installed. District Data Lab has an exceptional article on how to get started with Spark in Python. It’s long, but detailed.
pysparkis in the
There are 2 solutions:
- The first one, it modified the environment variable that
pysparkread. Then the jupyter/ipython notebook with pyspark environment would be started instead of pyspark console.
- The second one is installing the separate spark kernel for Jupyter. This way is more flexible, because the spark-kernel from IBM This solution is better because this spark kernel can run code in Scala, Python, Java, SparkSQL.