-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a limit to the number of columns in the CLUSTERED BY clause #13352
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Had one comment.
@@ -55,6 +57,15 @@ public static void validateQueryDef(final QueryDefinition queryDef) | |||
throw new ISE("Number of workers must be greater than 0"); | |||
} | |||
} | |||
|
|||
// Check if the number of columns in the query's CLUSTERED BY clause donot exceed the limit | |||
ClusterBy queryClusteredBy = queryDef.getFinalStageDefinition().getClusterBy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does only the final stage lead to an OOM? Wouldn't it be possible for more cluster by columns to be present in earlier stages than the final one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cluster by columns in the earlier stages might not have a 1:1 correspondence with the query that the user has written therefore raising a cluster by error, in that case, shouldn't be actionable for the user IMO. Hence I only added the limit in the final stage (the original query that the user has written). Along with the Sequential merge mode on, I think that there should be enough guard rails in place to prevent an OOM.
However we can add a limit on the cluster by in the other stages if we rephrase the error message as something like "Enough grouping keys present in stage [xx], the query might OOM". Those cluster by keys can correspond to something present in the group by clause for example. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the TooManyColumnsFault, I think that we can also go ahead with the second proposition since that is also imposed at a per-stage level, which might not correspond to the final result that the user expects. (The wording might need to change though).
import java.util.Objects; | ||
|
||
@JsonTypeName(TooManyClusteredByColumnsFault.CODE) | ||
public class TooManyClusteredByColumnsFault extends BaseMSQFault |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's document this fault as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out, updated!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after resolving merge conflict!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Will merge post the conflicts are resolved.
Thanks @LakshSingla
|
||
import java.util.Objects; | ||
|
||
@JsonTypeName(TooManyClusteredByColumnsFault.CODE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might need to add this to MSQIndexingModule.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing it out, I added it to the module.
Test failures seem unrelated/flaky, can the second stage of the CI/CD be run again? |
Failure look unrelated. |
If there is a huge number of columns passed to the clustered by clause while ingesting via MSQ, then the Worker tasks can OOM. (With sequential merge in place, controller tasks shouldn't OOM).
This PR adds a limit to the number of clustered by columns that can be passed in a query and throws a fault in case they are exceeded.
Release note
There is a limit to the number of columns that can be passed in the CLUSTERED BY clause while ingesting via MSQ.
This PR has: