Vectorization in hive is a feature (available from Hive 0.13.0) which when enabled rather than reading one row at a time it reads a block on 1024 rows . This Improves the CPU Usage for operation like, Scan, Filter, join and aggregations.
Note that, Vectorization is only available if data is stored in ORC format
How to Enable Vectorized Execution?
To Enable Vectorized
set hive.vectorized.execution.enabled = true;
To Disable Vectorized
set hive.vectorized.execution.enabled = false;
Difference Vectorized vs Non-Vectorized Queries.
I have a Product Table with 1560 rows and I want to know how many products has name with Washington in it.
Non-Vectorized Query.
set hive.vectorized.execution.enabled = false; select count(*) from foodmart.product where product.product_name like "%Washington%"
In the below image you will notice that INPUT_RECORDS_PROCESSED is 1560.
Vectorized Query
set hive.vectorized.execution.enabled = true; select count(*) from foodmart.product where product.product_name like "%Washington%"
In Below image you will see that INPUT_RECORDS_PROCESSED is only 2. This is because we have enabled the vectorized which rather than processing one row, it processed 1024 rows in a block. if you will divided 1560 by 1024, you will get two blocks. 1560/1024 = 2 (block has to be int value)
excellent explanation
Very good explanation.