Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704

Merged
merged 1 commit into from
Aug 1, 2024

Conversation

akhilss99
Copy link
Contributor

@akhilss99 akhilss99 commented Jul 22, 2024

Which issue does this PR close?

addresses timezone, 3 & 5 digit years' compatibility mentioned in #376

Rationale for this change

What changes are included in this PR?

Added 3 & 5 digit year, time-zone support.

How are these changes tested?

Through Unit tests.

@andygrove andygrove changed the title Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp, compatibility guide update. feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp, compatibility guide update. Jul 22, 2024
@andygrove
Copy link
Member

Thanks @akhilss99. Could you run make format to fix the code formatting?

@akhilss99
Copy link
Contributor Author

@andygrove, can you please run the CI build again? thanks!

@viirya
Copy link
Member

viirya commented Jul 24, 2024

Triggered.

@@ -116,6 +116,7 @@ The following cast operations are generally compatible with Spark except for the
| string | long | |
| string | binary | |
| string | date | Only supports years between 262143 BC and 262142 AD |
| string | timestamp | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mark this cast as compatible, we would need to enable the cast StringType to TimestampType fuzz test in CometCastSuite, and this currently fails due to incompatibilities between Spark and Comet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, I mistook fuzz tests as running fuzz-test jar through SPARK-SUBMIT. Will revert the compatibility-guide changes.

// write for all formats
assert_eq!(
timestamp_parser("2020", EvalMode::Legacy).unwrap(),
timestamp_parser("2020", EvalMode::Legacy, tz).unwrap(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also add tests for 3-digit year

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've covered the test-cases for 3 digit years from L1877 - 1904, is there anything that I'm missing out?

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice improvement to the existing code, but doesn't fully close the issue.

I suspect that we may need to move away from the regex approach and port Spark's parsing code over (we did this for string to date already) to get full compatibility. This will likely be more performant as well.

I think this PR would still be worth merging though.

@akhilss99 akhilss99 changed the title feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp, compatibility guide update. feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. Jul 25, 2024
@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.60%. Comparing base (ded3dd6) to head (bee83f3).
Report is 5 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #704      +/-   ##
============================================
+ Coverage     33.57%   33.60%   +0.03%     
  Complexity      830      830              
============================================
  Files           110      110              
  Lines         42608    42564      -44     
  Branches       9352     9361       +9     
============================================
- Hits          14306    14304       -2     
+ Misses        25347    25300      -47     
- Partials       2955     2960       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

    * timezone
    * 3 & 5 digit years
@akhilss99
Copy link
Contributor Author

Can anyone please run the build?
thanks

@viirya
Copy link
Member

viirya commented Jul 29, 2024

Triggered again.

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akhilss99

@andygrove andygrove merged commit bd7834c into apache:main Aug 1, 2024
74 checks passed
himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants