Skip to content

Conversation

@wayneguow
Copy link
Contributor

@wayneguow wayneguow commented Aug 2, 2024

What changes were proposed in this pull request?

This PR aims to enhance comparison logic to avoid precision loss in decimal parts of Decimal in Avro data source. It refers to the related logic of convert Decimal type in Parquet data source #44513 .

Before:

  • As long as the length of the integer part(precision - scale) matches, it can be converted.

After:

  • Both the integer(precision - scale) and decimal part(scale) lengths must match.

Why are the changes needed?

Fixed the issue causing missing data accuracy.

Does this PR introduce any user-facing change?

Yes, stricter matching requirements for conversions between Spark DecimalType and Avro Decimal type.

Previously, decimal(12,10) -> decimal(5,3) was allowed, but there would be some loss of precision in the decimal part, such as: 13.1234567890 -> 13.123
After that, the exception avroIncompatibleReadError will be thrown, because 3 is not greater than or equal 10. Unless both the integer and the decimal part are greater than or equal. For example, decimal(15, 13) is OK because 13>=10 and 15-13>=12-10

But users can restore the legacy behavior, set spark.sql.legacy.avro.allowIncompatibleDecimalType to true.

How was this patch tested?

Pass GA and add new test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

@wayneguow wayneguow changed the title [SPARK-49095][SQL] Update DecimalType compatible logic of Avro data source to avoid loss of decimal precision [SPARK-49095][SQL] Update DecimalTypeand Decimal compatible logic of Avro data source to avoid loss of decimal precision Aug 2, 2024
@wayneguow
Copy link
Contributor Author

cc @cloud-fan @LuciferYang when you have time :-)

@cloud-fan
Copy link
Contributor

Can we put more information into Does this PR introduce any user-facing change? Especially what kind of queries will be broken after this change.

@wayneguow
Copy link
Contributor Author

Can we put more information into Does this PR introduce any user-facing change? Especially what kind of queries will be broken after this change.

Updated it.

@allisonwang-db
Copy link
Contributor

@wayneguow, This seems to be a breaking change since the previous working query will now throw exceptions. I would recommend adding a config (by default disabled) and migration docs.

@github-actions github-actions bot added the DOCS label Aug 8, 2024
@wayneguow wayneguow changed the title [SPARK-49095][SQL] Update DecimalTypeand Decimal compatible logic of Avro data source to avoid loss of decimal precision [SPARK-49095][SQL] Update DecimalTypeand decimal fields compatible logic of Avro data source to avoid loss of decimal precision Aug 8, 2024
@wayneguow
Copy link
Contributor Author

@wayneguow, This seems to be a breaking change since the previous working query will now throw exceptions. I would recommend adding a config (by default disabled) and migration docs.

@allisonwang-db Thanks for your suggestions. I added a new legacy config that users can restore the old behavior. Also cc @cloud-fan

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Nov 17, 2024
@github-actions github-actions bot closed this Nov 18, 2024
@wayneguow wayneguow deleted the avro branch February 11, 2025 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants