Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update to arrow 36 #5685

Merged
merged 8 commits into from
Mar 28, 2023
Merged

Update to arrow 36 #5685

merged 8 commits into from
Mar 28, 2023

Conversation

tustvold
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@tustvold tustvold changed the title Update for arrow 36 Update to arrow 36 Mar 22, 2023
@tustvold tustvold mentioned this pull request Mar 22, 2023
@@ -3437,10 +3437,6 @@ mod tests {
ScalarValue::Decimal128(None, 10, 2),
ScalarValue::try_from_array(&array, 3).unwrap()
);
assert_eq!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #5441 (comment)

This test was wrong

@github-actions github-actions bot added core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions labels Mar 22, 2023
"| [] |",
"| [] |",
"| [] |",
"| [a] |",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug fix -- see apache/arrow-rs#3803

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Mar 24, 2023
@tustvold tustvold marked this pull request as ready for review March 28, 2023 13:02
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @tustvold - I think this PR .ooks good other than the changes in

datafusion/core/tests/sqllogictests/test_files/aggregate.slt

but I have a suggested way forward

@@ -1390,7 +1390,7 @@ as values
('2021-1-1T05:11:10.432', 'Row 3');


statement ok
statement error DataFusion error: Arrow error: Parser error: Error parsing timestamp from '2021-1-1T05:11:10.432': error parsing date
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change the data in this test so it passes

Also, it looks like a slight regression to me in that parsing 2021-1-1 used to parse and now doesn't.

So I suggest:

1. Change this test from

2021-1-1T05:11:10.432

to something like

2021-01-01T05:11:10.432

2 file an upstream ticket in arrow-rs to be more lenient

Aka to accept 2021-1-1T05:11:10.432 as valid -- I can do this if you like

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Change this test from

Will do

2 file an upstream ticket in arrow-rs to be more lenient

I would strongly request we do not do this, it will severely hurt performance for an incredibly esoteric use-case. It isn't actually documented that chrono supports this, and we have never intentionally supported it either

@@ -1387,7 +1387,7 @@ as values
('2018-11-13T17:11:10.011375885995', 'Row 0'),
('2011-12-13T11:13:10.12345', 'Row 1'),
(null, 'Row 2'),
('2021-1-1T05:11:10.432', 'Row 3');
('2021-01-01T05:11:10.432', 'Row 3');
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow no longer supports timestamps of this shortened form, it has never been documented to support them, and it is an accidental undocumented quirk of chrono that it did. I would very strongly resist changing this, being able to predict the digit locations ahead of time is critical to avoiding a data dependency when parsing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed apache/arrow-rs#3969 to track (I don't think we necessarily need to change arrow to support these timestamps, it but it will reduce confusion to have the change documented)

@tustvold tustvold requested a review from alamb March 28, 2023 15:01
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great @tustvold -- thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants