write to hbase from pyspark shell #34

jyothirmai2309 · 2018-06-21T07:39:51Z

while writing from pyspark to hbase ,data is storing as hexa format what should i do to store data as integer in pyspark write command

jyothirmai2309 · 2018-06-21T09:05:13Z

Do we have any option to convert integer values to integer only while writing dataframe to hbase through pyspark ,by default while writing dataframe to hbase integer values are converting to byte type in hbase table?

Below is the code:
catalog2 = {
"table": {"namespace": "default","name": "trip_test1"},
"rowkey": "key1",
"columns": {
"serial_no": {"cf": "rowkey","col": "key1","type": "string"},
"payment_type": {"cf": "sales","col": "payment_type","type": "string"},
"fare_amount": {"cf": "sales","col": "fare_amount","type": "string"},
"surcharge": {"cf": "sales","col": "surcharge","type": "string"},
"mta_tax": {"cf": "sales","col": "mta_tax","type": "string"},
"tip_amount": {"cf": "sales","col": "tip_amount","type": "string"},
"tolls_amount": {"cf": "sales","col": "tolls_amount","type": "string"},
"total_amount": {"cf": "sales","col": "total_amount","type": "string"}
}
}

import json
cat2=json.dumps(catalog2)

df.write.option("catalog",cat2).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()

output:
\x00\x00\x03\xE7 column=sales:payment_type, timestamp=1529495930994, value=CSH
\x00\x00\x03\xE7 column=sales:surcharge, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tip_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tolls_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:total_amount, timestamp=1529495930994, value=@!\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:fare_amount, timestamp=1529495930994, value=@\x18\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:mta_tax, timestamp=1529495930994, value=?\xE0\x00\x00\x00\x00\x00\x00

expected output:
999 column=sales:fare_amount, timestamp=1529392479358, value=8.0
999 column=sales:mta_tax, timestamp=1529392479358, value=0.5
999 column=sales:payment_type, timestamp=1529392479358, value=CSH
999 column=sales:surcharge, timestamp=1529392479358, value=0.0
999 column=sales:tip_amount, timestamp=1529392479358, value=0.0
999 column=sales:tolls_amount, timestamp=1529392479358, value=0.0
999 column=sales:total_amount, timestamp=1529392479358, value=8.5

bobomeng · 2018-06-21T14:37:09Z

if you are using our package here, the integer values are converted to binary array while writing to hbase.

nathamsr11 · 2020-05-10T19:30:36Z

Do we have any option to convert integer values to integer only while writing dataframe to hbase through pyspark ,by default while writing dataframe to hbase integer values are converting to byte type in hbase table?

Below is the code:
catalog2 = {
"table": {"namespace": "default","name": "trip_test1"},
"rowkey": "key1",
"columns": {
"serial_no": {"cf": "rowkey","col": "key1","type": "string"},
"payment_type": {"cf": "sales","col": "payment_type","type": "string"},
"fare_amount": {"cf": "sales","col": "fare_amount","type": "string"},
"surcharge": {"cf": "sales","col": "surcharge","type": "string"},
"mta_tax": {"cf": "sales","col": "mta_tax","type": "string"},
"tip_amount": {"cf": "sales","col": "tip_amount","type": "string"},
"tolls_amount": {"cf": "sales","col": "tolls_amount","type": "string"},
"total_amount": {"cf": "sales","col": "total_amount","type": "string"}
}
}

import json
cat2=json.dumps(catalog2)

df.write.option("catalog",cat2).option("newtable","5").format("org.apache.spark.sql.execution.datasources.hbase").save()

output:
\x00\x00\x03\xE7 column=sales:payment_type, timestamp=1529495930994, value=CSH
\x00\x00\x03\xE7 column=sales:surcharge, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tip_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:tolls_amount, timestamp=1529495930994, value=\x00\x00\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE7 column=sales:total_amount, timestamp=1529495930994, value=@!\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:fare_amount, timestamp=1529495930994, value=@\x18\x00\x00\x00\x00\x00\x00
\x00\x00\x03\xE8 column=sales:mta_tax, timestamp=1529495930994, value=?\xE0\x00\x00\x00\x00\x00\x00

expected output:
999 column=sales:fare_amount, timestamp=1529392479358, value=8.0
999 column=sales:mta_tax, timestamp=1529392479358, value=0.5
999 column=sales:payment_type, timestamp=1529392479358, value=CSH
999 column=sales:surcharge, timestamp=1529392479358, value=0.0
999 column=sales:tip_amount, timestamp=1529392479358, value=0.0
999 column=sales:tolls_amount, timestamp=1529392479358, value=0.0
999 column=sales:total_amount, timestamp=1529392479358, value=8.5

thanks 7 i had the same problem

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

write to hbase from pyspark shell #34

write to hbase from pyspark shell #34

jyothirmai2309 commented Jun 21, 2018

jyothirmai2309 commented Jun 21, 2018

bobomeng commented Jun 21, 2018

nathamsr11 commented May 10, 2020

write to hbase from pyspark shell #34

write to hbase from pyspark shell #34

Comments

jyothirmai2309 commented Jun 21, 2018

jyothirmai2309 commented Jun 21, 2018

bobomeng commented Jun 21, 2018

nathamsr11 commented May 10, 2020