Depending on the data you load into Hive/HDFS, some of your fields might be empty. Having Hive interpret those empty fields as nulls can be very convenient. It is easy to do this in the table definition using the serialization.null.format table property.

Here is a an example from the Big Datums GitHub repo :

CREATE EXTERNAL TABLE people_v1 (
  id               INT,
  username         STRING,
  email_address    STRING,
  phone_number     STRING,
  first_name       STRING,
  last_name        STRING,
  middle_name      STRING,
  sex              STRING,
  birthdate        DATE,
  join_date        STRING,
  previous_logins  INT,
  last_ip          STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\t'
LOCATION '/people_v1'
TBLPROPERTIES ('skip.header.line.count'='1','serialization.null.format' = ''); 

Leave a Reply

Set Empty Fields to Null in Hive