-
Notifications
You must be signed in to change notification settings - Fork 595
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inefficient hash join degree updating #3254
Comments
Indeed, there is write amplification due to the row degree. It is bad for joins with many matches. The specific scenario that is bad for us is if one side is very large, and the other side is updated frequently.
If I’m not wrong we already have an optimization for reducing outer join to inner join if any of the downstream predicates includes a column from the non-outer side. We would still need to handle situations where we cannot optimise the outer join away. Basically, I think as the first step, we can improve the situation by:
Basically, both of the above ideas reduce the generality of the current implementation into many specific cases and complicate the state representation. My sense is that we should not pursue this optimization unless we have specific user workload we want to optimise and we are seeing a bottleneck due to the write amplification. However, 1. Seems more likely to yield benefits for us since joins tend to be inner joins, so we can optimise for the majority case at least. We should probably have a category for such last-mile optimizations. |
Both optimizations sound feasible to me! The opt about incremental changes is a more general way to reduce the write amplification (no need to modify join executor). It might be implemented later in |
We solved #2794 in #3241 by updating all matched rows. This is inefficient.
One quick improvement should be only uploading incremental changes of
Update((Row, Row))
fromMemtable
to S3.But this can not avoid updating unnecessary datums in a row (since the only expected to change datum is the
degree
datum). We need other dirty hacks to achieve this.Another possible solution is to investigate the approach in Materialized, which seems can transform outer joins into inner joins. So that we have chances to remove the
degree
.This approach can also simplify the share arrangement implementation for join types other than inner.
The text was updated successfully, but these errors were encountered: