Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SDENT 180] support for fail fast type casting #201

Open
wants to merge 6 commits into
base: snappy/branch-2.1
Choose a base branch
from

Conversation

vatsalmevada
Copy link

@vatsalmevada vatsalmevada commented Mar 27, 2020

Changes proposed in this pull request

Apache Spark handles type casting failures in a fail safe manner. Instead of
failing the entire query, Spark populates null values whenever type casting
failure occurs. Snappydata inherits the same behaviour from Spark.

With this changes we are exposing a session level configuration to enable fail
fast behaviour for casting errors. i.e. instead of returning null values for
casting failures, the query will be failed altogether whenever first type cast
error occurs.

The configuration property is snappydata.failOnCastError which defaults
to false. Setting this property true will turn on fail fast casting behaviour.

Patch testing

  • added new tests

Other PRs

TIBCOSoftware/snappydata#1542

Vatsal Mevada added 3 commits March 24, 2020 15:23
conversions based on configuration property
`snappydata.failFastTypeCasting`.
- string to date
- string to decimal
- NaN, Infinite value of fractional numbers to timestamp
- string to boolean
- decimal precision overflow
- date to decimal

Enhancing string to numeric type casting failure to include the data
value for which casting failed. Also unifying the exception of string
to numeric conversion failure to match the exception of other casting
failures.
instead of using generic RuntimeException
- Adding tests for more scenarios
code flow is not reaching there due to analysis check performed by
`checkInputDataTypes` method.
@sumwale
Copy link

sumwale commented Mar 30, 2020

@vatsalmevada

This set of changes is large and will be difficult to port to later 2.x versions. Moreover, this will be handled cleanly in Spark 3.0 (https://issues.apache.org/jira/browse/SPARK-28989) so I do not think we should make this change as of now.

If really required for 2.x series, then

  1. The simpler and cleaner approach is to add an operator around CAST which will check for null
    result and throw an exception if the child result of CAST itself is not null. It should not require any
    changes to CAST itself, since the surrounding operator can doGenCode on CAST's child first
    and then evaluate CAST overall and check for both null or not. An analyzer rule can add that
    operator, and we should make a note to get rid of it when merging Spark 3.0

  2. the property name should be the same as in Spark 3.0 -- see the ticket above

@sumwale
Copy link

sumwale commented Mar 30, 2020

It should not require any changes to CAST itself, since the surrounding operator can doGenCode
on CAST's child first and then evaluate CAST overall and check for both null or not.

To clarify, the CAST operator's child will also need to be changed to a new expression that simply
picks up the result of evaluation from the surrounding operator. If there is any confusion about this
approach, I can paste some pseudo-code for the same.

@sumwale
Copy link

sumwale commented Mar 30, 2020

Another advantage of the mentioned approach is that it can be done entirely in SD layer without any Spark layer changes so will work for smart connector too. The only disadvantage will be that the exception message will have to be a generic one like "CAST operation from '...' to type ... failed" rather than specific ones but that is a reasonable compromise.

@sumwale sumwale force-pushed the snappy/branch-2.1 branch 4 times, most recently from 4ffe404 to 79ddd88 Compare June 13, 2021 14:16
@sumwale sumwale force-pushed the snappy/branch-2.1 branch 2 times, most recently from 52a9a47 to 763e1bb Compare June 17, 2021 15:26
@sumwale sumwale force-pushed the snappy/branch-2.1 branch 5 times, most recently from 1456d1f to 3d73189 Compare June 27, 2021 18:39
@sumwale sumwale force-pushed the snappy/branch-2.1 branch 4 times, most recently from d35dbba to e76c62d Compare April 6, 2022 22:34
@sumwale sumwale force-pushed the snappy/branch-2.1 branch 2 times, most recently from 1feaa36 to cdd6d29 Compare April 6, 2022 23:24
@sumwale sumwale force-pushed the snappy/branch-2.1 branch 3 times, most recently from cba053b to 3527357 Compare June 10, 2022 08:59
@sumwale sumwale force-pushed the snappy/branch-2.1 branch from 0169c3d to 1fb3673 Compare June 12, 2022 12:56
@sumwale sumwale force-pushed the snappy/branch-2.1 branch from 4a3161a to efa4458 Compare June 29, 2022 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants