How to manage `steps` in Amazon EMR

Problem: we submit steps with aws emr command, and then we discovered that the step was failed. How can we trace back the error?


  1. Copy the Hive script into S3
  2. Run with AWS CLI
  3. Check for the log in Amazon EMR

Continue reading “How to manage `steps` in Amazon EMR”


Hive on Spark is not working

Problem: in Hive CLI, the simple command doesn’t return a result.

Solution: make sure you have at least one worker (or slave) for Spark Master

hive> select count(*) from subset1_data_stream_with_cgi;

Status: Running (Hive on Spark job[0])
Job Progress Format
CurrentTime StageId_StageAttemptId: SucceededTasksCount(+RunningTasksCount-FailedTasksCount)/TotalTasksCount [StageCost]
2016-06-30 15:09:54,526    Stage-0_0: 0/1    Stage-1_0: 0/1
2016-06-30 15:09:57,545    Stage-0_0: 0/1    Stage-1_0: 0/1
2016-06-30 15:10:00,561    Stage-0_0: 0/1    Stage-1_0: 0/1

Continue reading “Hive on Spark is not working”

Hive CLI doesn’t start

Problem: Hive CLI turned off suddenly, and I cannot start Hive CLI again

Error message:

java.sql.SQLException: Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=/mnt/storage/DATA/hadoop/metastore_db;create=true, username = APP

Diagnosis: since Derby database allow only 1 connection to its database, it creates a *.lck in the folder databaseName above. So to this folder, and delete those *.lck file.

After I deleted dbex.lck and db.lck, then hive can start as usual.