How to Improve HiveQL Efficiency

The efficiency of data queries influences the user experience extremely. When you’re creating a report by loading the data from the database, you must feel frustrated if it is very slow. I know that feeling and it is the daily life of analytics.

In this post, I’ll summarize four simple methods to improve HiveQL efficiency.

Change execution engine
- set hive.execution.engine=tez;
Set right storage formats
- Create TABLE A STORED AS ORC tblproperties("orc.compress"="SNAPPY") AS SELET * FROM A_Table
- Create TABLE A STORED AS Parquet AS SELET * FROM A_Table
Use vectorized query exectuion
- set hive.vectorized.execution.enabled = true;
Have a query execution plan
- set hive.cbo.enable=true;

Reference:

Shaw, Scott et al. Practical Hive : a Guide to Hadoop’s Data Warehouse System / by Scott Shaw, Andreas Francois Vermeulen, Ankur Gupta, David Kjerrumgaard. Berkeley, CA: Apress, 2016. Web.