[SPARK-40819][SQL] Timestamp nanos behaviour regression #38312

awdavidson · 2022-10-19T12:02:13Z

What changes were proposed in this pull request?

Handle TimeUnit.NANOS for parquet Timestamps addressing a regression in behaviour since 3.2

Why are the changes needed?

Since version 3.2 reading parquet files that contain attributes with type TIMESTAMP(NANOS,true) is not possible as ParquetSchemaConverter returns

Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true))

https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the LogicalTypeAnnotation which only covers Timestamp cases for TimeUnit.MILLIS and TimeUnit.MICROS meaning TimeUnit.NANOS would return illegalType()

Prior to 3.2 the matching used the originalType which for TIMESTAMP(NANOS,true) return null and therefore resulted to a LongType, the change proposed is too consider TimeUnit.NANOS and return LongType making behaviour the same as before.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit test covering this scenario.
Internally deployed to read parquet files that contain TIMESTAMP(NANOS,true)

EnricoMi · 2022-10-20T07:08:59Z

...c/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala

            } else {
              TimestampNTZType
            }
+          case timestamp: TimestampLogicalTypeAnnotation if timestamp.getUnit == TimeUnit.NANOS =>


Ideally, this case would be merged with above case, but that would require TimestampType and TimestampNTZType to support nanos, which is a bigger change.

This case deserves a comment that nanos are not supported as TimestampType but as LongType, without any timezone awareness.

Supporting nanos as TimestampType in the future looks like a breaking change then (Spark 4.x?). Or another TimestampNSType like TimstampNTZType could be introduced.

Comment has been added.

I agree, full support is a breaking change and my aim for now is to address the regression and not introduce new functionality/support as that would require further validation and testing of other components e.g. time based functions etc. However, definitely something that should be considered in future development

Hmm, how is this sufficient? we also should handle the long (nanos) accordingly in the Parquet reader, is that right? e.g.: ParquetVectorUpdaterFactory.

@sunchao so for type such as

message root { required int64 _1 (TIMESTAMP(NANOS,true)); }

the descriptor.getPrimitiveType().getPrimitiveTypeName() is INT64; as the sparkType is LongType it returns new LongUpdate();

https://github.com/apache/spark/blob/master/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/ParquetVectorUpdaterFactory.java#L105

Therefore no change is required from this perspective as it continues to be handled in the same way prior to #31776

I see. It's a bit weird though that we read nanos as long, but not timestamp. I'm not sure whether this should be considered as a regression, or that the previous behavior before 3.2 is merely unintended.

I agree it's weird and if it is unintended behaviour, from what I know it's been unintended since spark 2.3 so quite a long time (I have not checked versions before 2.3). However, will let you guys take the call on whether is should be considered a regression.

I assume the change to support nanos will be a breaking change; I am completely for the support of nanos but would be nice if the original behaviour still existed up until the support of nanos is developed and released.

Yes, the (earlier) existing behaviour was useful and should be restored until properly supported as typed nano timestamp. Unless a workaround can be found that restores the earlier behaviour.

Does providing a schema (spark.read.schema(...)) with long type override the parquet timestamp type from the parquet file? Would that be a workaround?

@EnricoMi so it is possible to use spark.read.schema(..) as a workaround, however, you end up loosing functionality like mergeSchema which will automatically handle schema evolution etc and potentially will need to know the entire schema up front if all columns are required. Also for other consumers/users, especially in the exploratory analysis space, it will require them to have a better understanding of the underlying data structure before they can use it and this gets more difficult when the file is extremely wide.

I can imagine developers creating otherways to avoid the nuisance; which seems a bit crazy considering that functionality already exists.

So this is not a valid workaround. @sunchao how is your feeling about restoring earlier useful behaviour?

Make sense. I'm OK with it.

EnricoMi · 2022-10-20T07:22:08Z

@sunchao @HyukjinKwon @LuciferYang this is a regression introduced by #31776 since 3.2.0

AmplabJenkins · 2022-10-20T19:11:39Z

Can one of the admins verify this patch?

sunchao

+1. cc @cloud-fan @sadikovi too.

cloud-fan · 2022-10-24T05:02:54Z

...c/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala

              TimestampNTZType
            }
+          // SPARK-40819: NANOS are not supported as a Timestamp, convert to LongType without
+          // timezone awareness to address behaviour regression introduced by SPARK-34661


Can we do a truncation and still read it as timestamp type?

I do not think this is a good idea as the precision will be lost which is extremely important for high frequency time series.

I haven’t verified but end users/developers would still be able to .cast(Timestamp) which I believe would truncate the timestamp; allowing developers to make that decision makes more sense then forcing the loss of precision.

Yes, the mere purpose of this exercise is to get access to the nano precision.

cloud-fan · 2022-10-24T06:46:03Z

Shall we make it a generate strategy? If we don't recognize the logical parquet type, just ignore it and read as the underlying physical type. cc @sadikovi

One problem I can see is bad backward compatibility. Once we support a new logical type in the future, we must make a breaking change as the reader will return a different data type.

Personally, I think requiring user-specified schema for unsupported logical types makes sense. It's error-prone if we allow schema merging during schema inference for unsupported logical types. For example, it doesn't make sense to merge TIMESTAMP(NANOS) and INT32 to INT64.

EnricoMi · 2022-10-24T07:38:57Z

One problem I can see is bad backward compatibility.

Meaning once we make Spark 3.x return longs here, supporting nano Timestamps must be moved to Spark 4.x?

But with a configuration flag like spark.sql.parquet.timestampNTZ.enabled, the user can control the break and Spark 3.x is free to support this in the future.

LuciferYang · 2022-10-24T13:10:11Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala

 }

 class ParquetSchemaInferenceSuite extends ParquetSchemaTest {
+  testSchemaInference[Tuple1[Long]](


Can this case run passed before Spark 3.2, in my impression, Parquet 1.10.1 used by Spark 3.1 does not support nanos type, does it?

This particular case doesn’t pass, neither does similar tests for TIMESTAMP(MILLIS,true) etc https://github.com/apache/spark/blob/master/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala#L2234 due to No enum constant org.apache.parquet.schema.OriginalType.TIMESTAMP when trying to convert the message.

https://github.com/apache/spark/blob/branch-3.1/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala#L65

Then is this really a regression?

So I've been looking further into it, it's because the message is different between 1.10.1 and 1.12.3 - meaning the test would need to be different.

In 1.10.1 the message is

message schema { required int64 attribute; }

where as 1.12.3 the message is the same as the unit test

message schema { required int64 attribute (TIMESTAMP(NANOS,true)); }

So in Spark 3.1.0 you end up hitting this block with returns a LongType https://github.com/apache/spark/blob/branch-3.1/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L146

where as since 3.2 you hit https://github.com/apache/spark/blob/branch-3.2/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L174 because a case for TimeUnit.NANOS is not covered

@cloud-fan moving Parquet from 1.10.1 to 1.12.3 introduced this regression where Spark 3.1 returned LongType and Spark 3.2 fails on illegal type.

awdavidson · 2022-10-27T13:12:50Z

@cloud-fan @LuciferYang any update/response regarding this?

EnricoMi · 2022-11-07T06:46:57Z

@cloud-fan where do we stand with this? Is this a regression? How do we proceed?

attilapiros · 2022-11-14T23:55:25Z

@awdavidson I would like to understand the use case a bit better. Is the parquet file was written by an earlier Spark (version < 3.2) and does the error comes when that parquet file is read back with a latter Spark? If yes this is clearly regression. Still in this case can you please show us how we can reproduce it manually (a small example code for write/read)?

If it was written by another tool can we got an example parquet file with sample data where the old version works and the new version fails?

awdavidson · 2022-11-15T16:25:56Z

@awdavidson I would like to understand the use case a bit better. Is the parquet file was written by an earlier Spark (version < 3.2) and does the error comes when that parquet file is read back with a latter Spark? If yes this is clearly regression. Still in this case can you please show us how we can reproduce it manually (a small example code for write/read)?

If it was written by another tool can we got an example parquet file with sample data where the old version works and the new version fails?

@attilapiros so the parquet file is being wrote by another process. Spark uses this data to run aggregations and analysis over different time horizons where the nanosecond precision is required. Currently, when using earlier Spark versions (< 3.2) the TIMESTAMP(NANOS, true) in the parquet schema is automatically converted to a LongType, however, since the moving from parquet 1.10.1 to 1.12.3 and the changes to ParquetSchemaConverter an illegalType() is thrown. As soon as I have access this evening I will provide an example parquet file.

Whilst I understand timestamps with nanosecond precision are not fully supported, this change in behaviour will prevent users from migrating to the latest spark version

Update: an example parquet file can be found here https://github.com/awdavidson/file-upload/blob/main/users.parquet
I am also going to look at adding additional tests with this file for any future changes

Update: Additional test case has been added using the example parquet file

LuciferYang · 2022-11-17T15:35:30Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala

  }

+  test("SPARK-40819 - ability to read parquet file with TIMESTAMP(NANOS, true)") {
+    val testDataPath = getClass.getResource("/test-data/timestamp-nanos.parquet")


Can we reuse

spark/sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala

Lines 462 to 467 in f01a8db

/**

* Returns full path to the given file in the resource folder

*/

protected def testFile(fileName: String): String = {

Thread.currentThread().getContextClassLoader.getResource(fileName).toString

}

👍 updated

EnricoMi · 2022-11-17T16:17:41Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala

+    val data = spark.read.parquet(testDataPath.toString).select("birthday")
+
+    assert(data.schema.fields.head.dataType == LongType)
+    assert(data.take(1).head.getAs[Long](0) == 1668537129000000000L)


Shall we sort the read dataframe to be guarantee we compare the first row (Unless all rows have the same timestamp)?

All rows have the same timestamp

Can we have true nano second values to be sure they are not zeroed by the loader but loaded as in the file, e.g. 1665532812012345678L.

EnricoMi · 2022-11-17T16:22:52Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala

    }
  }

+  test("SPARK-40819 - ability to read parquet file with TIMESTAMP(NANOS, true)") {


nit: other tests in this file follow the naming convention "SPARK-XXXXX: ..."

Suggested change

test("SPARK-40819 - ability to read parquet file with TIMESTAMP(NANOS, true)") {

test("SPARK-40819: ability to read parquet file with TIMESTAMP(NANOS, true)") {

EnricoMi

LGTM!

EnricoMi

Resolving conflicts isn't easy, but code should at least compile locally ;-)

EnricoMi · 2023-02-03T11:34:17Z

...c/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala

    inferTimestampNTZ: Boolean = SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.defaultValue.get) {
+    nanosAsLong: Boolean = SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.defaultValue.get) {


Suggested change

inferTimestampNTZ: Boolean = SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.defaultValue.get) {

nanosAsLong: Boolean = SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.defaultValue.get) {

inferTimestampNTZ: Boolean = SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.defaultValue.get,

nanosAsLong: Boolean = SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.defaultValue.get) {

EnricoMi · 2023-02-03T11:34:27Z

...re/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala

        inferTimestampNTZ = inferTimestampNTZ)
+        nanosAsLong = nanosAsLong)


Suggested change

inferTimestampNTZ = inferTimestampNTZ)

nanosAsLong = nanosAsLong)

inferTimestampNTZ = inferTimestampNTZ,

nanosAsLong = nanosAsLong)

EnricoMi · 2023-02-03T11:35:23Z

...e/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaSuite.scala

      inferTimestampNTZ = inferTimestampNTZ)
+      nanosAsLong = nanosAsLong)


Suggested change

inferTimestampNTZ = inferTimestampNTZ)

nanosAsLong = nanosAsLong)

inferTimestampNTZ = inferTimestampNTZ,

nanosAsLong = nanosAsLong)

### What changes were proposed in this pull request? Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2 ### Why are the changes needed? Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns ``` Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) ``` https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()` Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test covering this scenario. Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)` Closes #38312 from awdavidson/ts-nanos-fix. Lead-authored-by: alfreddavidson <alfie.davidson9@gmail.com> Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com> Co-authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit ceccda0) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

HyukjinKwon · 2023-02-06T09:35:23Z

Merged to master and branch-3.4.

It has a conflict w/ branch-3.3 and branch-3.2. Would you mind creating backporting PRs?

awdavidson · 2023-02-06T09:39:38Z

Merged to master and branch-3.4.

It has a conflict w/ branch-3.3 and branch-3.2. Would you mind creating backporting PRs?

Sure, I’ll do this asap

As per HyukjinKwon request on #38312 to backport fix into 3.3 ### What changes were proposed in this pull request? Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2 ### Why are the changes needed? Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns ``` Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) ``` https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()` Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test covering this scenario. Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)` Closes #39904 from awdavidson/ts-nanos-fix-3.3. Lead-authored-by: alfreddavidson <alfie.davidson9@gmail.com> Co-authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

As per HyukjinKwon request on #38312 to backport fix into 3.2 ### What changes were proposed in this pull request? Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2 ### Why are the changes needed? Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns ``` Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) ``` https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()` Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test covering this scenario. Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)` Closes #39905 from awdavidson/ts-nanos-fix-3.2. Authored-by: alfreddavidson <alfie.davidson9@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

HyukjinKwon · 2023-02-08T02:09:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala

+  val LEGACY_PARQUET_NANOS_AS_LONG = buildConf("spark.sql.legacy.parquet.nanosAsLong")
+    .internal()
+    .doc("When true, the Parquet's nanos precision timestamps are converted to SQL long values.")
+    .version("3.2.3")


@awdavidson I realised that we already released Spark 3.2.3.

Can you make a PR to fix this to 3.2.4?

…onfiguration As requested by HyukjinKwon in #38312 NB: This change needs to be backported ### What changes were proposed in this pull request? Update version set for "spark.sql.legacy.parquet.nanosAsLong" configuration in SqlConf. This update is required because the previous PR set version to `3.2.3` which has already been released. Updating to version `3.2.4` will correctly reflect when this configuration element was added ### Why are the changes needed? Correctness and to complete SPARK-40819 ### Does this PR introduce _any_ user-facing change? No, this is merely so this configuration element has the correct version ### How was this patch tested? N/A Closes #39943 from awdavidson/SPARK-40819_sql-conf. Authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…onfiguration As requested by HyukjinKwon in #38312 NB: This change needs to be backported ### What changes were proposed in this pull request? Update version set for "spark.sql.legacy.parquet.nanosAsLong" configuration in SqlConf. This update is required because the previous PR set version to `3.2.3` which has already been released. Updating to version `3.2.4` will correctly reflect when this configuration element was added ### Why are the changes needed? Correctness and to complete SPARK-40819 ### Does this PR introduce _any_ user-facing change? No, this is merely so this configuration element has the correct version ### How was this patch tested? N/A Closes #39943 from awdavidson/SPARK-40819_sql-conf. Authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 409c661) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

As per HyukjinKwon request on apache#38312 to backport fix into 3.2 ### What changes were proposed in this pull request? Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2 ### Why are the changes needed? Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns ``` Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) ``` https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()` Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test covering this scenario. Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)` Closes apache#39905 from awdavidson/ts-nanos-fix-3.2. Authored-by: alfreddavidson <alfie.davidson9@gmail.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…onfiguration As requested by HyukjinKwon in apache#38312 NB: This change needs to be backported ### What changes were proposed in this pull request? Update version set for "spark.sql.legacy.parquet.nanosAsLong" configuration in SqlConf. This update is required because the previous PR set version to `3.2.3` which has already been released. Updating to version `3.2.4` will correctly reflect when this configuration element was added ### Why are the changes needed? Correctness and to complete SPARK-40819 ### Does this PR introduce _any_ user-facing change? No, this is merely so this configuration element has the correct version ### How was this patch tested? N/A Closes apache#39943 from awdavidson/SPARK-40819_sql-conf. Authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 409c661) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

### What changes were proposed in this pull request? Handle `TimeUnit.NANOS` for parquet `Timestamps` addressing a regression in behaviour since 3.2 ### Why are the changes needed? Since version 3.2 reading parquet files that contain attributes with type `TIMESTAMP(NANOS,true)` is not possible as ParquetSchemaConverter returns ``` Caused by: org.apache.spark.sql.AnalysisException: Illegal Parquet type: INT64 (TIMESTAMP(NANOS,true)) ``` https://issues.apache.org/jira/browse/SPARK-34661 introduced a change matching on the `LogicalTypeAnnotation` which only covers Timestamp cases for `TimeUnit.MILLIS` and `TimeUnit.MICROS` meaning `TimeUnit.NANOS` would return `illegalType()` Prior to 3.2 the matching used the `originalType` which for `TIMESTAMP(NANOS,true)` return `null` and therefore resulted to a `LongType`, the change proposed is too consider `TimeUnit.NANOS` and return `LongType` making behaviour the same as before. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit test covering this scenario. Internally deployed to read parquet files that contain `TIMESTAMP(NANOS,true)` Closes apache#38312 from awdavidson/ts-nanos-fix. Lead-authored-by: alfreddavidson <alfie.davidson9@gmail.com> Co-authored-by: Attila Zsolt Piros <2017933+attilapiros@users.noreply.github.com> Co-authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit ceccda0) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

…onfiguration As requested by HyukjinKwon in apache#38312 NB: This change needs to be backported ### What changes were proposed in this pull request? Update version set for "spark.sql.legacy.parquet.nanosAsLong" configuration in SqlConf. This update is required because the previous PR set version to `3.2.3` which has already been released. Updating to version `3.2.4` will correctly reflect when this configuration element was added ### Why are the changes needed? Correctness and to complete SPARK-40819 ### Does this PR introduce _any_ user-facing change? No, this is merely so this configuration element has the correct version ### How was this patch tested? N/A Closes apache#39943 from awdavidson/SPARK-40819_sql-conf. Authored-by: awdavidson <54780428+awdavidson@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 409c661) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

wenijinew · 2025-01-29T11:37:38Z

Can I use the option spark.sql.legacy.parquet.nanosAsLong in any version newer than 3.2.4 or only the version 3.2.4?
I am using 3.3.1, and I set the option as below. It doesn't work.

SparkSession.builder.appName("app").config("spark.sql.legacy.parquet.nanosAsLong", "true").getOrCreate()

EnricoMi · 2025-01-29T12:36:46Z

This should be available in 3.2.4, 3.3.2, 3.4.0 and above.

github-actions bot added the SQL label Oct 19, 2022

EnricoMi reviewed Oct 20, 2022

View reviewed changes

awdavidson changed the title ~~[SPARK-40819][SQL] address timestamp nanos behaviour regression~~ [SPARK-40819][SQL] Timestamp nanos behaviour regression Oct 20, 2022

github-actions bot added BUILD CONNECT CORE DOCS INFRA KUBERNETES PANDAS API ON SPARK PYTHON R labels Oct 20, 2022

awdavidson closed this Oct 20, 2022

awdavidson reopened this Oct 20, 2022

awdavidson force-pushed the ts-nanos-fix branch from 4311815 to b71d2b6 Compare October 20, 2022 10:15

sunchao approved these changes Oct 21, 2022

View reviewed changes

EnricoMi approved these changes Oct 22, 2022

View reviewed changes

cloud-fan reviewed Oct 24, 2022

View reviewed changes

LuciferYang reviewed Oct 24, 2022

View reviewed changes

LuciferYang reviewed Nov 17, 2022

View reviewed changes

EnricoMi reviewed Nov 17, 2022

View reviewed changes

Remove conf element

14d13e6

EnricoMi approved these changes Feb 3, 2023

View reviewed changes

EnricoMi suggested changes Feb 3, 2023

View reviewed changes

awdavidson and others added 3 commits February 3, 2023 12:21

Formatting

2713d75

formatting

9f0cc5d

Add legacyParquetNanosAsLong

83391ab

EnricoMi approved these changes Feb 3, 2023

View reviewed changes

HyukjinKwon closed this in ceccda0 Feb 6, 2023

This was referenced Feb 6, 2023

[SPARK-40819][SQL][3.3] Timestamp nanos behaviour regression #39904

Closed

[SPARK-40819][SQL][3.2] Timestamp nanos behaviour regression #39905

Closed

HyukjinKwon reviewed Feb 8, 2023

View reviewed changes

awdavidson mentioned this pull request Feb 8, 2023

[SPARK-40819][SQL][FOLLOWUP] Update SqlConf version for nanosAsLong configuration #39943

Closed

	/**
	* Returns full path to the given file in the resource folder
	*/
	protected def testFile(fileName: String): String = {
	Thread.currentThread().getContextClassLoader.getResource(fileName).toString
	}

	test("SPARK-40819 - ability to read parquet file with TIMESTAMP(NANOS, true)") {
	test("SPARK-40819: ability to read parquet file with TIMESTAMP(NANOS, true)") {

		inferTimestampNTZ: Boolean = SQLConf.PARQUET_INFER_TIMESTAMP_NTZ_ENABLED.defaultValue.get) {
		nanosAsLong: Boolean = SQLConf.LEGACY_PARQUET_NANOS_AS_LONG.defaultValue.get) {

		inferTimestampNTZ = inferTimestampNTZ)
		nanosAsLong = nanosAsLong)

[SPARK-40819][SQL] Timestamp nanos behaviour regression #38312

[SPARK-40819][SQL] Timestamp nanos behaviour regression #38312

Uh oh!

Conversation

awdavidson commented Oct 19, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EnricoMi commented Oct 20, 2022

Uh oh!

AmplabJenkins commented Oct 20, 2022

Uh oh!

sunchao left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

EnricoMi Oct 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Oct 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EnricoMi commented Oct 24, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awdavidson Oct 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

awdavidson commented Oct 27, 2022

Uh oh!

EnricoMi commented Nov 7, 2022

Uh oh!

attilapiros commented Nov 14, 2022

Uh oh!

awdavidson commented Nov 15, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

EnricoMi Oct 24, 2022 •

edited

Loading

cloud-fan commented Oct 24, 2022 •

edited

Loading

awdavidson Oct 25, 2022 •

edited

Loading

awdavidson commented Nov 15, 2022 •

edited

Loading