Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CALCITE-6852] Project's digest calculation error, should include FieldNames in rowType #4208

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

xiedeyantu
Copy link

No description provided.

@@ -362,7 +362,8 @@ protected boolean deepEquals0(@Nullable Object obj) {

@API(since = "1.24", status = API.Status.INTERNAL)
protected int deepHashCode0() {
return Objects.hash(traitSet, input.deepHashCode(), exps, hints);
return Objects.hash(traitSet, input.deepHashCode(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this match what happens for other operators?

Copy link
Author

@xiedeyantu xiedeyantu Feb 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this match what happens for other operators?

@mihaibudiu Perhaps other operators need to check the deepHashCode0. But the original deepHashCode0 of the project may lead to incorrect choices. This change may expose errors in two rules, but I have fixed them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two reasons to check what is done in other similar places:

  • if this is a bug, we should try to find out whether the bug appears in other places so we can fix it. Not in this PR, but at least we can file a new issue about it.
  • if no other place checks field names, perhaps the intent of the digest is to ignore field names, and then maybe the bug is elsewhere

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I personally don't know what the intent of the digest is.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that I personally don't know what the intent of the digest is.

please see this issue: https://issues.apache.org/jira/projects/CALCITE/issues/CALCITE-6852

@mihaibudiu
Copy link
Contributor

I saw the issue, and I agree that there is a bug, but the question I am asking is whether this is the bug or the bug is elsewhere.
Perhaps someone who knows the intent of the digest can comment. I didn't try to investigate this myself by reading the code for digest in other classes.

@xiedeyantu
Copy link
Author

I saw the issue, and I agree that there is a bug, but the question I am asking is whether this is the bug or the bug is elsewhere. Perhaps someone who knows the intent of the digest can comment. I didn't try to investigate this myself by reading the code for digest in other classes.

@mihaibudiu OK, I think this is indeed a summary calculation problem of the project. The corresponding relationship between the operator and Vertex will be recorded in mapDigestToVertex, which makes it easy to retrieve the corresponding operator in findBestExp. If the Project loses the field name information during calculation, it will be deduplicated by map and replaced by a project with the same field but only a different alias. This may depend on the order of the put operators. As for other operators, the only one I can think of is agg, because only operators with column clipping may be involved in this problem, but generally agg will have a project on it, so the overall plan may be fine.

@xiedeyantu
Copy link
Author

I saw the issue, and I agree that there is a bug, but the question I am asking is whether this is the bug or the bug is elsewhere. Perhaps someone who knows the intent of the digest can comment. I didn't try to investigate this myself by reading the code for digest in other classes.

Hi, @mihaibudiu , I have added another part of the problem explanation. Please take a look at this CALCITE-6852 again

@rubenada
Copy link
Contributor

It seems to me that this comment in Project#explainTerms suggests that what we are discussing here was actually intended:

  @Override public RelWriter explainTerms(RelWriter pw) {
    ...
    // Skip writing field names so the optimizer can reuse the projects that differ in
    // field names only

@xiedeyantu
Copy link
Author

xiedeyantu commented Feb 23, 2025

It seems to me that this comment in Project#explainTerms suggests that what we are discussing here was actually intended:

  @Override public RelWriter explainTerms(RelWriter pw) {
    ...
    // Skip writing field names so the optimizer can reuse the projects that differ in
    // field names only

@rubenada Yes, the problem should arise here, missing aliases is a very serious problem. But we can't modify this, it will cause a big change in behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants