You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bug report. If you’ve found a bug, please provide a code snippet or test to reproduce it below.
The easier it is to track down the bug, the faster it is solved. .
Issue description
Description
Need advice on handling decimal datatype, would prefer if it can be used as float or double numeric data type rather then string. If needed can loose precision. I need to rely on auto create new index and dynamic mapping. So can't create manual index mapping in first step.
Steps to reproduce
Simply try to read a decimal column from parquet file using spark and store in ES using es-hadoop
Code:
Dataset<Row> col = spark.sql("select money from parquetFile");
EsSparkSQL.saveToEs(col,"spark/docs");
Strack trace:
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Decimal types are not supported by Elasticsearch```
### Version Info
OS: : osx
JVM : JDK 1.8.092
Hadoop/Spark: Spark 2.0.0
ES-Hadoop : elasticsearch-spark-20_2.11:5.0.0-alpha5
ES : 2.3.5
The text was updated successfully, but these errors were encountered:
Hello. We prefer that questions pertaining to troubleshooting or advice are asked on the forum instead of Github. Github issues are for confirmed bugs and actionable features only. Organization is key to success and we thank you for your understanding and cooperation.
To answer the question though, while we're here:
Decimal types are not supported by the connector because there simply is no way to serialize them into Elasticsearch without losing some precision. In practice these types are usually used to represent monetary values, and losses in precision are generally unacceptable in that case. Instead of blindly accepting the precision loss, or transforming it into a type you may not have expected, we throw an error to indicate that you must make a choice on how to proceed.
If precision is not important for that column/field, we advise that a transformation is applied to the column that casts it into either a string or a compatible numeric type. Transformations from DecimalType to other IntegralTypes in Spark are allowed if the decimal precision is lax enough that it won't lose anything. On the other hand, casts to StringTypes are always allowed. When a JSON string type is indexed into a double field in Elasticsearch, the precision will be lost at indexing time instead of throwing a casting error in Spark. To wit:
val data = Seq(
Row("1", Decimal(1200.00)),
Row("2", Decimal(1400.00))
)
val schema = StructType(Array(
StructField("id", StringType),
StructField("number", new DecimalType(10, 2))
))
val conf = Map("es.mapping.id" -> "id")
val rdd = sc.makeRDD(data)
val df = sqc.createDataFrame(rdd, schema)
df.select(df("id"), df("number").cast(StringType)).saveToEs("spark/decimalValues", conf)
As for specifying a mapping for auto-created indices: I would create an Elasticsearch index template using the template APIs before executing the Spark job. You specify an index name pattern when making a template. Any indices that are created that match this name pattern will have the template mappings applied to themselves automatically. You can specify that the field in question should be mapped as a double in the template, and when ES-Hadoop creates an index with a name that matches the pattern, Elasticsearch will automatically apply the mapping from the template without requiring any manual intervention.
What kind an issue is this?
The easier it is to track down the bug, the faster it is solved. .
Issue description
Description
Need advice on handling decimal datatype, would prefer if it can be used as float or double numeric data type rather then string. If needed can loose precision. I need to rely on auto create new index and dynamic mapping. So can't create manual index mapping in first step.
Steps to reproduce
Simply try to read a decimal column from parquet file using spark and store in ES using es-hadoop
Code:
Strack trace:
The text was updated successfully, but these errors were encountered: