didn't flush changes to s3 when update row degree #2794

yuhao-su · 2022-05-24T20:43:01Z

I found a relevant bug. It is that when we update row degree in JoinEntry, we do not flush the changes to S3.

This does not affect inner join, but it affects outer joins and semi/anti join. This explains why these queries are affected.

As for the case of inner join, since it does not make use of row counts, perhaps we can simply not update it as the write is unnecessary...

Originally posted by @jon-chuang in #2495 (comment)

yuhao-su · 2022-05-24T20:54:01Z

Better do it after #2795 to avoid adding things to those that will soon be removed.

fuyufjh · 2022-06-09T07:36:16Z

Is it better to store the degree in memory only?

The motivation is, since every incoming row would populate all the data with the specified join key of the hash table on the other side, we can calculate degree within a reasonable time, because there is no IO operation during this.

The advantage of this approach is that we don't need to introduce an additional column degree, which makes the hash table more consistent with the index of LookupJoin.

Alternatively, we can apply this for LookupJoin, but in this way there will be 2 different Join implementations.

jon-chuang · 2022-06-10T13:50:42Z

To do so, we would need to populate both sides of the cache.

This is actually an additional I/O operation since we don't usually need to fetch the update side of the cache.

We actually need the degree data on the match side (which represents how many existing rows are in the update side), to decide whether the update is the first match/only remaining match for that row.

Further, in case of non-eq join conditions layered on top of an eq join, this requires re-evaluating the predicate for each row on the update side.

yuhao-su added the type/bug Something isn't working label May 24, 2022

yuhao-su self-assigned this May 24, 2022

yuhao-su added a commit that referenced this issue Jun 14, 2022

try fix #2794

bbab2e7

This was referenced Jun 14, 2022

feat(streaming): apply StateTable to hash join #3085

Merged

Inefficient hash join degree updating #3254

Closed

yuhao-su closed this as completed in ffe54bb Jun 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

didn't flush changes to s3 when update row degree #2794

didn't flush changes to s3 when update row degree #2794

yuhao-su commented May 24, 2022

yuhao-su commented May 24, 2022

fuyufjh commented Jun 9, 2022

jon-chuang commented Jun 10, 2022 •

edited

Loading

didn't flush changes to s3 when update row degree #2794

didn't flush changes to s3 when update row degree #2794

Comments

yuhao-su commented May 24, 2022

yuhao-su commented May 24, 2022

fuyufjh commented Jun 9, 2022

jon-chuang commented Jun 10, 2022 • edited Loading

jon-chuang commented Jun 10, 2022 •

edited

Loading