The efficiency of data queries influences the user experience extremely. When you’re creating a report by loading the data from the database, you must feel frustrated if it is very slow. I know that feeling and it is the daily life of analytics.
In this post, I’ll summarize four simple methods to improve HiveQL efficiency.
- Change execution engine
set hive.execution.engine=tez;
- Set right storage formats
Create TABLE A STORED AS ORC tblproperties("orc.compress"="SNAPPY") AS SELET * FROM A_Table
Create TABLE A STORED AS Parquet AS SELET * FROM A_Table
- Use vectorized query exectuion
set hive.vectorized.execution.enabled = true;
- Have a query execution plan
set hive.cbo.enable=true;
Reference:
Shaw, Scott et al. Practical Hive : a Guide to Hadoop’s Data Warehouse System / by Scott Shaw, Andreas Francois Vermeulen, Ankur Gupta, David Kjerrumgaard. Berkeley, CA: Apress, 2016. Web.