Problem: in Hive CLI, the simple command doesn’t return a result.
Solution: make sure you have at least one worker (or slave) for Spark Master
hive> select count(*) from subset1_data_stream_with_cgi;
……Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-06-30 15:09:54,526 Stage-0_0: 0/1 Stage-1_0: 0/1
2016-06-30 15:09:57,545 Stage-0_0: 0/1 Stage-1_0: 0/1
2016-06-30 15:10:00,561 Stage-0_0: 0/1 Stage-1_0: 0/1
Spark Master
Checking the website of Spark Master at http://localhost:8080,
That Job Id = 0 can keep on running forever
foreachAsync at RemoteHiveSparkClient.java:327
Solution
Start a worker for Spark master. You can do it with command:
start-slave.sh spark://lamar:7077 # spark://lamar:7077 is my hostname for Spark Master.
And then, executing the same command in Hive, I can see that Hive on Spark execute successfully (check the column Stages: Succeeded/Total):