Cannot write to csv with spark-csv in Scala


Name: java.lang.NoClassDefFoundError
Message: Could not initialize class com.databricks.spark.csv.util.CompressionCodecs$


One line answer: make sure that the scala version is the same for

  • the scala used to compile spark
  • the spark-csv module
  • the spark running your system

Continue reading “Cannot write to csv with spark-csv in Scala”

PySpark and Jupyter

Problem: how to run PySpark in Jupyter notebook.

Some assumption before starting:

  • You have Anaconda installed.
  • You have Spark installed. District Data Lab has an exceptional article on how to get started with Spark in Python. It’s long, but detailed.
  • pyspark is in the $PATHvariable.

There are 2 solutions:

  1. The first one, it modified the environment variable that pyspark read. Then the jupyter/ipython notebook with pyspark environment would be started instead of pyspark console.
  2. The second one is installing the separate spark kernel for Jupyter. This way is more flexible, because the spark-kernel from IBM This solution is better because this spark kernel can run code in Scala, Python, Java, SparkSQL.

Continue reading “PySpark and Jupyter”