Hive is a append only database and so update and delete is not supported on hive external and managed table.
From hive version 0.14 the have started a new feature called transactional. Which allows to have ACID properties for a particular hive table and allows to delete and update. but let’s keep the transactional table for any other posts.
Here let’s discuss how to update hive table which is not transaction, either external or managed ( External table couldn’t be transactional).
Chances are if you have tried to update the hive table, external or managed (non transactional), you might have got below errors, depends on your hive version.
select * from temp.test_udpate; +-----------------+-------------------+--+ | test_update.id | test_update.name | +-----------------+-------------------+--+ | 1 | test user 1 | | 2 | test user 2 | | 2 | test user 3 | +-----------------+-------------------+--+ delete from temp.test1 where id=1; Error: Error while compiling statement: FAILED: SemanticException [Error 10297]: Attempt to do update or delete on table temp.test1 that does not use an AcidOutputFormat or is not bucketed (state=42000,code=10297)
Then the question is how to update or delete a record in hive table?
Deleting Records in Hive Table
Deleting rerecords is easy, you can use insert overwrite Syntax for this. Let’s says we want to delete a record from above hive table which has name as “test user 3”. Then we need to select all the records which does not have name as “test user 3” and overwrite into same table.
insert overwrite table temp.test_update \ select * from temp.test_update \ where name = "test user 3";
Update Records in Hive Table
updating the record consist of three steps as mentioned below. Let’s say in your test.update table we want to update the id to 3 for all the records which has name as “test user 3”
Create a temporary table which has updated record
create temporary table temp.test \ as select 3 as id, name from temp.test_update \ where name="test user 3";
Delete the records which you want to update from original table
insert overwrite table temp.test_update \ select * from temp.test_update \ where name="test user 3";
Insert the Updated Record(s)
insert into table temp.test_update select * from temp.test;
Thanks for this article
Contents are really useful, knowledgeable doc
Pls modify the overwrite queries :
Equality operator is used instead of inequality operator in the first insert overwrite query