-
Notifications
You must be signed in to change notification settings - Fork 62
Home
The XML SerDe allows the user to map the XML schema to Hive data types through the Hive Data Definition Language (DDL), according to the following rules.
CREATE [EXTERNAL] TABLE <table_name> (<column_specifications>)
ROW FORMAT SERDE "com.ibm.spss.hive.serde2.xml.XmlSerDe"
WITH SERDEPROPERTIES (
["xml.processor.class"="<xml_processor_class_name>",]
"column.xpath.<column_name>"="<xpath_query>",
...
["xml.map.specification.<element_name>"="<map_specification>"
...
]
)
STORED AS
INPUTFORMAT "com.ibm.spss.hive.serde2.xml.XmlInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat"
[LOCATION "<data_location>"]
TBLPROPERTIES (
"xmlinput.start"="<start_tag ",
"xmlinput.end"="<end_tag>"
);
For example, the following XML...
<records>
<record customer_id="0000-JTALA">
<demographics>
<gender>F</gender>
<agecat>1</agecat>
<edcat>1</edcat>
<jobcat>2</jobcat>
<empcat>2</empcat>
<retire>0</retire>
<jobsat>1</jobsat>
<marital>1</marital>
<spousedcat>1</spousedcat>
<residecat>4</residecat>
<homeown>0</homeown>
<hometype>2</hometype>
<addresscat>2</addresscat>
</demographics>
<financial>
<income>18</income>
<creddebt>1.003392</creddebt>
<othdebt>2.740608</othdebt>
<default>0</default>
</financial>
</record>
</records>
...would be represented by the following Hive DDL.
CREATE TABLE xml_bank(customer_id STRING, demographics map<string,string>, financial map<string,string>)
ROW FORMAT SERDE ’com.ibm.spss.hive.serde2.xml.XmlSerDe’
WITH SERDEPROPERTIES (
"column.xpath.customer_id"="/record/@customer_id",
"column.xpath.demographics"="/record/demographics/",
"column.xpath.financial"="/record/financial/"
)
STORED AS
INPUTFORMAT ’com.ibm.spss.hive.serde2.xml.XmlInputFormat’
OUTPUTFORMAT ’org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat’
TBLPROPERTIES (
"xmlinput.start"="<record customer",
"xmlinput.end"=""
);
The page is edited by Divya Mahajan on May 5th, 2014.