-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704
Conversation
Thanks @akhilss99. Could you run |
@andygrove, can you please run the CI build again? thanks! |
Triggered. |
@@ -116,6 +116,7 @@ The following cast operations are generally compatible with Spark except for the | |||
| string | long | | | |||
| string | binary | | | |||
| string | date | Only supports years between 262143 BC and 262142 AD | | |||
| string | timestamp | | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To mark this cast as compatible, we would need to enable the cast StringType to TimestampType
fuzz test in CometCastSuite, and this currently fails due to incompatibilities between Spark and Comet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I mistook fuzz tests as running fuzz-test jar through SPARK-SUBMIT. Will revert the compatibility-guide changes.
// write for all formats | ||
assert_eq!( | ||
timestamp_parser("2020", EvalMode::Legacy).unwrap(), | ||
timestamp_parser("2020", EvalMode::Legacy, tz).unwrap(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add tests for 3-digit year
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've covered the test-cases for 3 digit years from L1877 - 1904, is there anything that I'm missing out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a nice improvement to the existing code, but doesn't fully close the issue.
I suspect that we may need to move away from the regex approach and port Spark's parsing code over (we did this for string to date already) to get full compatibility. This will likely be more performant as well.
I think this PR would still be worth merging though.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #704 +/- ##
============================================
+ Coverage 33.57% 33.60% +0.03%
Complexity 830 830
============================================
Files 110 110
Lines 42608 42564 -44
Branches 9352 9361 +9
============================================
- Hits 14306 14304 -2
+ Misses 25347 25300 -47
- Partials 2955 2960 +5 ☔ View full report in Codecov by Sentry. |
* timezone * 3 & 5 digit years
Can anyone please run the build? |
Triggered again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @akhilss99
…estamp (apache#704) (cherry picked from commit bd7834c)
Which issue does this PR close?
addresses timezone, 3 & 5 digit years' compatibility mentioned in #376
Rationale for this change
What changes are included in this PR?
Added 3 & 5 digit year, time-zone support.
How are these changes tested?
Through Unit tests.