Cannot write to csv with spark-csv in Scala

Problem:

Name: java.lang.NoClassDefFoundError
Message: Could not initialize class com.databricks.spark.csv.util.CompressionCodecs$

spark-scala-jupyter-csv-error.png

One line answer: make sure that the scala version is the same for

  • the scala used to compile spark
  • the spark-csv module
  • the spark running your system

Configuration

  • I’m using the pre-built Spark 1.6.1 bin without Hadoop (I install Hadoop by myself). And this spark was compiled with scala_2.10
  • I’m using Jupyter to run Scala notebook with Spark engine. I’ve already modified the `/usr/local/share/jupyter/kernels/apache_toree_scala/kernels.json` to make sure that the `spark-csv` library is loaded. Pay attention to
    --packages com.databricks:spark-csv_2.10:1.4.0
# kernel.json
{
"display_name": "Apache Toree - Scala",
"language_info": { "name": "scala" },
"argv": [
    ......
],
"env": {
    "SPARK_OPTS": "--master=spark://localhost:7077 --driver-java-options=-Xms2048M --driver-java-options=-Xmx4096M --driver-java-options=-Dlog4j.logLevel=info --packages com.databricks:spark-csv_2.10:1.4.0",
    .....
 }

You can also check whether the spark-csv module is loaded by opening (or creating new) Spark Scala notebook. And then check the console. My console of jupyter runnote ./ is:

spark-csv-jupyter-loaded.png

You see from the modules in use section that spark-csv_2.10 is loaded.

The reason for the error:

If you download the pre-built Spark, it’s compiled with scala version 2.10. Actually, if you pay attention then on the website there’s a small note for scala version 2.11.

scala2.10 vs 2.11.png

Solution:

Check for version of your scala. I need to uninstall scala 2.11, and install scala 2.10.

mbonaci provided a code snippet to install scala:

sudo wget www.scala-lang.org/files/archive/scala-2.10.4.deb
sudo dpkg -i scala-2.10.4.deb
# in case of unmet libjansi-java dependency fire:
# sudo apt-get install -f

After that, check if you have scala 2.10

$ scala -version
Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL

Check everything again

Now I can write Data Frame to csv file, and load that csv file into a Data Frame.

spark-csv-write-successfully

Advertisements

2 thoughts on “Cannot write to csv with spark-csv in Scala”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s