Skip to content

Hive – Convert JSON to complex Data Type

if you have a small (not complex) json file and need to create a corresponding hive table, it’s easy. 

{
	"country":"Switzerland",
	"languages":["German","French","Italian"],
	"religions":
		{
			"catholic":[10,20],
			"protestant":[40,50]
		}
}

However that’s hardly the case in real life. we get JSON file with 100s of nested fields.  Manually parsing that into Hive table is a tedious task. 

To ease the work you can take the help of spark.  don’t worry, it’s just two lines of code 🙂 

first put your file in hdfs location 

hdfs dfs -put sample.json /tmp/

Fetch Schema for Hive Table 

>>> df = spark.read.json("/tmp/sample.json")
>>> df
DataFrame[country: string, languages: array, religions: struct,protestant:array>]

Hive table

your final hive table will look like this, with minor modification in schema and adding json serde and other properties. 

CREATE TABLE temp.test_json(
	  country string, 
	  languages array, 
	  religions struct,protestant:array>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED AS TEXTFILE
location '/tmp/test_json/table/'

if you don’t like making modification to schema,  alternatively you can save you table to hive and get schema using that. 

df.write.saveAsTable("temp.test_json")
And then run following in hive
show create table temp.test_json

Hive Data

either way,  this is how data looks

Published inHadoophive

Be First to Comment

Leave a Reply

Your email address will not be published. Required fields are marked *