Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema merge issue Decimal type scale change while merge sql #514

Open
dattawalake opened this issue Sep 9, 2020 · 6 comments
Open

Schema merge issue Decimal type scale change while merge sql #514

dattawalake opened this issue Sep 9, 2020 · 6 comments
Labels
enhancement New feature or request

Comments

@dattawalake
Copy link

Delta version 0.6.1
Spark 2.4.4

Merge sql fails , if source dataframe schema specifically dataype Decimal with scale change . Seems its not auto merging schema I am getting below exception -
Failed to merge decimal types with incompatible scale 0 and 2;

Workaround : I am applying schema diff changes before merge query explicitly, as per version 0.6.1 delta take care auto schema merge to target table.

Thanks

@tdas
Copy link
Contributor

tdas commented Sep 9, 2020

can you give more information on the query and data types of the relevant columns? We would like to understand this well.

@dattawalake
Copy link
Author

Here is a scenario executed with delta core 0.6.1 version:

1.Day 1 run to n runs with version 0.5.0
1.1 Source (Input)
schema -> id : string (nullable = true)
amount: decimal(17,0) (nullable = true)
1.2 Target table(Delta table) created after merge (merge sql) with source
schema -> id : string (nullable = true)
amount: decimal(17,0) (nullable = true)

day1 to n run Result : successfully merge and created output for n runs for no schema differences on format level. Added/deleted column we handled with our code to sync schema.

2.Day n+1 run with updated jar version 0.6.1
2.1 Source (Input)
schema -> id : string (nullable = true)
amount: decimal(17,2) (nullable = true)
2.2 Target table(Delta table) Before merge
schema -> id : string (nullable = true)
amount: decimal(17,0) (nullable = true)
2.3 Trying to execute merge sql as per scala merge syntax based on key

Run result failed with error -> Failed to merge decimal types with incompatible scale 0 and 2;

I tried to sync input file schema to target (delta table) location with .mode("Append") and .option("mergeSchema", "true") before executing merge , but it did not help, failed with same exception.

In previous version 0.5.0 I handled schema merge this way, before merging, finding difference between source and target(delta table) schema and on any difference, writing blank file with input schema to target location.
Seems datatype format level differences not handled in mergeSchema. or i am missing something.

Workaround for above issue I followed is -> re writing target to temp location with cast to decimal (17,2) and copied to target (delta table) location.

Thanks

@tdas
Copy link
Contributor

tdas commented Sep 10, 2020

It's possible that since 0.5.0, we may have fixed a few corner cases regarding incorrect schema evolutions. @brkyvz does any change come to mind?

@vkorukanti vkorukanti added the bug Something isn't working label Oct 7, 2021
@chaudharirohit2810
Copy link

We are facing same issue in version 2.1.0, is there any solution for this?

Our Scenario:
We have decimal column with lower precision for eg. 8 and we want to increase the precision for eg. 13
We tried to do it using mergeSchema option
But it throws the error decimal(38,8) cannot be cast to decimal(38,13)

Also changing type using ALTER TABLE doesn't work

The only solution we can find is to overwrite the entire data using overwriteSchema option

@zsxwing
Copy link
Member

zsxwing commented Oct 25, 2022

Run result failed with error -> Failed to merge decimal types with incompatible scale 0 and 2;

For the original issue reported in this ticket, currently Delta doesn't support merging different decimal types. The technical challenge is parquet may store different decimal types in different formats ( https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#decimal ) and Delta today doesn't have a mechanism to read data with different potential parquet decimal formats.

decimal(38,8) cannot be cast to decimal(38,13)

This is expected. Changing a value of decimal(38,8) to decimal(38,13) may lose information, hence we don't allow it.

@zsxwing zsxwing added enhancement New feature or request and removed bug Something isn't working labels Oct 25, 2022
tdas pushed a commit to tdas/delta that referenced this issue Jun 6, 2023
* [FlinkSQL_PR_1] Flink Delta Sink - Table API UPDATED (delta-io#389)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>

* [FlinkSQL_PR_2] - SQL Support for Delta Source connector. (delta-io#487)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_3] - Delta catalog skeleton (delta-io#503)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_4] - Delta catalog - Interactions with DeltaLog. Create and get table. (delta-io#506)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_5] - Delta catalog - DDL option validation. (delta-io#509)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_6] - Delta catalog - alter table + tests. (delta-io#510)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_7] - Delta catalog - Restrict Delta Table factory to work only with Delta Catalog + tests. (delta-io#514)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_8] - Delta Catalog - DDL/Query hint validation + tests. (delta-io#520)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_9] - Delta Catalog - Adding Flink's Hive catalog as decorated catalog. (delta-io#524)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. (delta-io#528)

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column.

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Scott Sandre <scott.sandre@databricks.com>

* [FlinkSQL_PR_11] - Delta Catalog - cache DeltaLog instances in DeltaCatalog. (delta-io#529)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_12] - UML diagrams. (delta-io#530)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_13] - Remove mergeSchema option from SQL API. (delta-io#531)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_14] - SQL examples. (delta-io#535)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* remove duplicate function after rebasing against master

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Co-authored-by: kristoffSC <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
tdas pushed a commit to tdas/delta that referenced this issue Jun 8, 2023
* [FlinkSQL_PR_1] Flink Delta Sink - Table API UPDATED (delta-io#389)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>

* [FlinkSQL_PR_2] - SQL Support for Delta Source connector. (delta-io#487)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_3] - Delta catalog skeleton (delta-io#503)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_4] - Delta catalog - Interactions with DeltaLog. Create and get table. (delta-io#506)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_5] - Delta catalog - DDL option validation. (delta-io#509)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_6] - Delta catalog - alter table + tests. (delta-io#510)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_7] - Delta catalog - Restrict Delta Table factory to work only with Delta Catalog + tests. (delta-io#514)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_8] - Delta Catalog - DDL/Query hint validation + tests. (delta-io#520)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_9] - Delta Catalog - Adding Flink's Hive catalog as decorated catalog. (delta-io#524)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column. (delta-io#528)

* [FlinkSQL_PR_10] - Table API support SELECT with filter on partition column.

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Co-authored-by: Scott Sandre <scott.sandre@databricks.com>

* [FlinkSQL_PR_11] - Delta Catalog - cache DeltaLog instances in DeltaCatalog. (delta-io#529)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_12] - UML diagrams. (delta-io#530)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_13] - Remove mergeSchema option from SQL API. (delta-io#531)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* [FlinkSQL_PR_14] - SQL examples. (delta-io#535)

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

* remove duplicate function after rebasing against master

---------

Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>
Signed-off-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
Co-authored-by: kristoffSC <krzysiek.chmielewski@gmail.com>
Co-authored-by: Paweł Kubit <pawel.kubit@getindata.com>
Co-authored-by: Krzysztof Chmielewski <krzysztof.chmielewski@getindata.com>
@AnasKhchaf
Copy link

Hello chaudharirohit2810 dattawalake,

have you tried spark.sql.decimalOperations.allowPrecisionLoss=false configuration to not losse precision in in decimal datatype ?

Hope this will help !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants