Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RealtimeSegmentConverter was using incorrect schema #13877

Merged
merged 5 commits into from
Sep 18, 2024

Conversation

gortiz
Copy link
Contributor

@gortiz gortiz commented Aug 22, 2024

RealtimeSegmentConverter cannot use the realtime segment because it contains some virtual columns. For example the $segmentName and $docId will be different in the sealed segment. Therefore RealtimeSegmentConverter copies the schema, removing the virtual columns in the process.

That copy was done manually in RealtimeSegmentConverter and was a partial copy. For example, the generated schema doesn't keep the schema name. By chance the fact that this copy was partial didn't affect the sealing process. But when #11960 was added the partial copy in RealtimeSegmentConverter had an important side effect: column based null handling was lost.

That means that the realtime segment contains null columns, but once it is sealed these vectors are ignored.

This PR fixes that issue and adds some regression tests, but given Schema is mutable it is very difficult to verify that there are no more incorrect copies in the code. A refactor of the Schema class to make it more secure is needed.

… incorrect schema

This has the side effect of ignoring column based null handling
@codecov-commenter
Copy link

codecov-commenter commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.

Project coverage is 57.88%. Comparing base (59551e4) to head (af6605b).
Report is 1440 commits behind head on master.

Files with missing lines Patch % Lines
...rc/main/java/org/apache/pinot/spi/data/Schema.java 81.25% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #13877      +/-   ##
============================================
- Coverage     61.75%   57.88%   -3.88%     
- Complexity      207      219      +12     
============================================
  Files          2436     2617     +181     
  Lines        133233   143479   +10246     
  Branches      20636    22031    +1395     
============================================
+ Hits          82274    83046     +772     
- Misses        44911    53925    +9014     
- Partials       6048     6508     +460     
Flag Coverage Δ
custom-integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration <0.01% <0.00%> (-0.01%) ⬇️
integration1 <0.01% <0.00%> (-0.01%) ⬇️
integration2 0.00% <0.00%> (ø)
java-11 57.85% <82.35%> (-3.86%) ⬇️
java-21 57.77% <82.35%> (-3.86%) ⬇️
skip-bytebuffers-false 57.86% <82.35%> (-3.88%) ⬇️
skip-bytebuffers-true 57.75% <82.35%> (+30.02%) ⬆️
temurin 57.88% <82.35%> (-3.88%) ⬇️
unittests 57.87% <82.35%> (-3.88%) ⬇️
unittests1 40.72% <76.47%> (-6.18%) ⬇️
unittests2 27.97% <5.88%> (+0.24%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Collaborator

@yashmayya yashmayya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @gortiz, good catch! I just had one minor comment.

@gortiz gortiz merged commit dfc2cd0 into apache:master Sep 18, 2024
20 of 21 checks passed
@gortiz gortiz deleted the fix-realtime-seal-incorrect-schema branch September 18, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants