Hive on Spark is not working

Problem: in Hive CLI, the simple command doesn’t return a result.

Solution: make sure you have at least one worker (or slave) for Spark Master

hive> select count(*) from subset1_data_stream_with_cgi;
……

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-06-30 15:09:54,526    Stage-0_0: 0/1    Stage-1_0: 0/1
2016-06-30 15:09:57,545    Stage-0_0: 0/1    Stage-1_0: 0/1
2016-06-30 15:10:00,561    Stage-0_0: 0/1    Stage-1_0: 0/1

Spark Master

Checking the website of Spark Master at http://localhost:8080,

spark-master.png

That Job Id = 0 can keep on running forever

foreachAsync at RemoteHiveSparkClient.java:327

Solution

Start a worker for Spark master. You can do it with command:

start-slave.sh spark://lamar:7077 # spark://lamar:7077 is my hostname for Spark Master.

worker-for-spark-master.png

And then, executing the same command in Hive, I can see that Hive on Spark execute successfully (check the column Stages: Succeeded/Total):

hive-on-spark.png

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s