Message: Could not initialize class com.databricks.spark.csv.util.CompressionCodecs$
One line answer: make sure that the scala version is the same for
- the scala used to compile spark
- the spark-csv module
- the spark running your system
Continue reading “Cannot write to csv with spark-csv in Scala”
Problem: how to run PySpark in Jupyter notebook.
Some assumption before starting:
- You have Anaconda installed.
- You have Spark installed. District Data Lab has an exceptional article on how to get started with Spark in Python. It’s long, but detailed.
pyspark is in the
There are 2 solutions:
- The first one, it modified the environment variable that
pyspark read. Then the jupyter/ipython notebook with pyspark environment would be started instead of pyspark console.
- The second one is installing the separate spark kernel for Jupyter. This way is more flexible, because the spark-kernel from IBM This solution is better because this spark kernel can run code in Scala, Python, Java, SparkSQL.
Continue reading “PySpark and Jupyter”