Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-35998][SQL] Make from_csv/to_csv to handle year-month intervals properly #33210

Closed
wants to merge 2 commits into from

Conversation

sarutak
Copy link
Member

@sarutak sarutak commented Jul 5, 2021

What changes were proposed in this pull request?

This PR fixes an issue that from_csv/to_csv doesn't handle year-month intervals properly.
from_csv throws exception if year-month interval types are given.

spark-sql> select from_csv("interval '1-2' year to month", "a interval year to month");
21/07/03 04:32:24 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1-2' year to month", "a interval year to month")]
java.lang.Exception: Unsupported type: interval year to month
	at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775)
	at org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224)
	at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134)

Also, to_csv doesn't handle year-month interval types properly though any exception is thrown.
The result of to_csv for year-month interval types is not ANSI interval compliant form.

spark-sql> select to_csv(named_struct("a", interval '1-2' year to month));
14

The result above should be INTERVAL '1-2' YEAR TO MONTH.

Why are the changes needed?

Bug fix.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New tests.

@sarutak
Copy link
Member Author

sarutak commented Jul 5, 2021

cc: @MaxGekk

@github-actions github-actions bot added the SQL label Jul 5, 2021
@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45145/

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Kubernetes integration test status success
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/45145/

checkAnswer(toCsvDF, Row(toCsvExpected))

DataTypeTestUtils.yearMonthIntervalTypes.foreach { fromCsvDtype =>
val fromJsonDF = toCsvDF
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fromJsonDF -> fromCsvDF

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How embarrassing... Thanks.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. The last commit is trivial, and tests passed on the previous one. Merging to master/3.2.
Thank you, @sarutak .

@MaxGekk MaxGekk closed this in f4237af Jul 5, 2021
MaxGekk pushed a commit that referenced this pull request Jul 5, 2021
…s properly

### What changes were proposed in this pull request?

This PR fixes an issue that `from_csv/to_csv` doesn't handle year-month intervals properly.
`from_csv` throws exception if year-month interval types are given.
```
spark-sql> select from_csv("interval '1-2' year to month", "a interval year to month");
21/07/03 04:32:24 ERROR SparkSQLDriver: Failed in [select from_csv("interval '1-2' year to month", "a interval year to month")]
java.lang.Exception: Unsupported type: interval year to month
	at org.apache.spark.sql.errors.QueryExecutionErrors$.unsupportedTypeError(QueryExecutionErrors.scala:775)
	at org.apache.spark.sql.catalyst.csv.UnivocityParser.makeConverter(UnivocityParser.scala:224)
	at org.apache.spark.sql.catalyst.csv.UnivocityParser.$anonfun$valueConverters$1(UnivocityParser.scala:134)
```

Also, `to_csv` doesn't handle year-month interval types properly though any exception is thrown.
The result of `to_csv` for year-month interval types is not ANSI interval compliant form.

```
spark-sql> select to_csv(named_struct("a", interval '1-2' year to month));
14
```
The result above should be `INTERVAL '1-2' YEAR TO MONTH`.

### Why are the changes needed?

Bug fix.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New tests.

Closes #33210 from sarutak/csv-yminterval.

Authored-by: Kousuke Saruta <sarutak@oss.nttdata.com>
Signed-off-by: Max Gekk <max.gekk@gmail.com>
(cherry picked from commit f4237af)
Signed-off-by: Max Gekk <max.gekk@gmail.com>
@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140632 has finished for PR 33210 at commit c8778cc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jul 5, 2021

Test build #140655 has finished for PR 33210 at commit e39b5c3.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants