rowcontainer: fix hash row container for some types #49851
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The explanation is that
HashDiskRowContainer
is implemented usingDiskRowContainer
with the equality columns (i.e. the columns to hash)of the former being the ordering columns for the latter, and those
ordering columns are used to compute the keys of the rows (in
encodeRow
) so that we could store the row in the sorted order. Thisway we store the build (right) side of the join, but for the probe
(left) side we use
hashMemRowIterator
to compute the key of theprobing row. The key computation methods must be the same in both
places, otherwise, the results of the join can be incorrect. #45229
broke this synchronization by changing the key computation method in
hashMemRowIterator.computeKey
to useFingerprint
. So we have to eitheruse
Fingerprint
inencodeRow
or useEncode
incomputeKey
. The firstchoice doesn't seem to work because
Fingerprint
doesn't provide theordering we need in
DiskRowContainer
, so we need to use the second approach.The ordering property is necessary because
DiskRowContainer
implements"hash row container" by sorting all rows on the ordering (i.e. hash) columns
and using the ordering property to provide the "hashing" behavior (i.e. we
would seek to the first row that has the same hash columns and then iterate
from that row one row at a time forward until the hash columns remain the
same). If we don't have the ordering property, then the necessary invariant
that all rows that hash to the same value are contiguous is not maintained.
Release note: None