-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RealtimeSegmentConverter was using incorrect schema #13877
RealtimeSegmentConverter was using incorrect schema #13877
Conversation
… incorrect schema This has the side effect of ignoring column based null handling
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #13877 +/- ##
============================================
- Coverage 61.75% 57.88% -3.88%
- Complexity 207 219 +12
============================================
Files 2436 2617 +181
Lines 133233 143479 +10246
Branches 20636 22031 +1395
============================================
+ Hits 82274 83046 +772
- Misses 44911 53925 +9014
- Partials 6048 6508 +460
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @gortiz, good catch! I just had one minor comment.
RealtimeSegmentConverter
cannot use the realtime segment because it contains some virtual columns. For example the$segmentName
and$docId
will be different in the sealed segment. ThereforeRealtimeSegmentConverter
copies the schema, removing the virtual columns in the process.That copy was done manually in
RealtimeSegmentConverter
and was a partial copy. For example, the generated schema doesn't keep the schema name. By chance the fact that this copy was partial didn't affect the sealing process. But when #11960 was added the partial copy inRealtimeSegmentConverter
had an important side effect: column based null handling was lost.That means that the realtime segment contains null columns, but once it is sealed these vectors are ignored.
This PR fixes that issue and adds some regression tests, but given Schema is mutable it is very difficult to verify that there are no more incorrect copies in the code. A refactor of the Schema class to make it more secure is needed.