-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-40819][SQL][3.3] Timestamp nanos behaviour regression #39904
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (Pending CIs).
cc @viirya since he is the release manager of Apache Spark 3.3.2.
val LEGACY_PARQUET_NANOS_AS_LONG = buildConf("spark.sql.legacy.parquet.nanosAsLong") | ||
.internal() | ||
.doc("When true, the Parquet's nanos precision timestamps are converted to SQL long values.") | ||
.version("3.2.3") | ||
.booleanConf | ||
.createWithDefault(false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this is worth documenting it in migration guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look at adding to the guide
Merged to branch-3.3. |
As per HyukjinKwon request on #38312 to backport fix into 3.3 ### What changes were proposed in this pull request? Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2 ### Why are the changes needed? Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns ``` Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) ``` https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()` Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test covering this scenario. Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)` Closes #39904 from awdavidson/ts-nanos-fix-3.3. Lead-authored-by: alfreddavidson <alfie.davidson9@gmail.com> Co-authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
As per @HyukjinKwon request on #38312 to backport fix into 3.3
What changes were proposed in this pull request?
Handle
TimeUnit.NANOS
for parquetTimestamps
addressing a regression in behaviour since 3.2Why are the changes needed?
Since version 3.2 reading parquet files that contain attributes with type
TIMESTAMP(NANOS,true)
is not possible as ParquetSchemaConverter returnshttps://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the
LogicalTypeAnnotation
which only covers Timestamp cases forTimeUnit.MILLIS
andTimeUnit.MICROS
meaningTimeUnit.NANOS
would returnillegalType()
Prior to 3.2 the matching used the
originalType
which forTIMESTAMP(NANOS,true)
returnnull
and therefore resulted to aLongType
, the change proposed is too considerTimeUnit.NANOS
and returnLongType
making behaviour the same as before.Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit test covering this scenario.
Internally deployed to read parquet files that contain
TIMESTAMP(NANOS,true)