-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SparkConnector] Add proper time column support for Spark connector segment writer #14556
[SparkConnector] Add proper time column support for Spark connector segment writer #14556
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #14556 +/- ##
============================================
+ Coverage 61.75% 63.96% +2.21%
- Complexity 207 1570 +1363
============================================
Files 2436 2683 +247
Lines 133233 147348 +14115
Branches 20636 22590 +1954
============================================
+ Hits 82274 94249 +11975
- Misses 44911 46145 +1234
- Partials 6048 6954 +906
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
@@ -94,6 +115,8 @@ class PinotDataWriter[InternalRow]( | |||
val variables = Map( | |||
"partitionId" -> partitionId, | |||
"table" -> tableName, | |||
"startTime" -> startTime, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can also update javadocs to show the new variables for segment name generation. But do call out that it only works for numeric time columns.
Adding proper time column support to the Spark writer to correctly create time information in segment metadata, which was missing before.
Full api for configuring the time column when writing out segments is as follows:
Which follows Pinot's time column config options
timeFormat
andtimeGranularity
.I also added a couple new template variables,
startTime
andendTime
which can be used for building segment name. This is useful for efficient purging of out-of-retention segments without having to open the metadata.Testing
Updated unit/integration tests.
I also tested and validated this behavior in a pre-production environment.
feature
bugfix