Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](Nereids) should not replace slot by Alias when do NormalizeSlot #24928

Merged
merged 1 commit into from
Sep 28, 2023

Conversation

morrySnow
Copy link
Contributor

when we do NormalizeToSlot, we pushed complex expression and only remain slot of it. When we do this, we collect alias and their child and compute its child in bottom project, remain the result slot in current node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr, we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@morrySnow
Copy link
Contributor Author

run buildall

@morrySnow
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.93 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17162548576 Bytes

@morrySnow morrySnow force-pushed the fix_normalize_window branch 3 times, most recently from c2c450d to 6054a65 Compare September 27, 2023 04:25
@morrySnow
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.3 seconds
stream load tsv: 557 seconds loaded 74807831229 Bytes, about 128 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17162296418 Bytes

@morrySnow morrySnow added dev/2.0.3 usercase Important user case type label labels Sep 27, 2023
starocean999
starocean999 previously approved these changes Sep 27, 2023
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 27, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

englefly
englefly previously approved these changes Sep 27, 2023
@morrySnow
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Sep 27, 2023
@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.96 seconds
stream load tsv: 561 seconds loaded 74807831229 Bytes, about 127 MB/s
stream load json: 21 seconds loaded 2358488459 Bytes, about 107 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.4 seconds inserted 10000000 Rows, about 340K ops/s
storage size: 17162315389 Bytes

when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
@morrySnow
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.83 seconds
stream load tsv: 583 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 20 seconds loaded 2358488459 Bytes, about 112 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 31 seconds loaded 861443392 Bytes, about 26 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17162350474 Bytes

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 28, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@englefly englefly merged commit b50c144 into apache:master Sep 28, 2023
23 of 25 checks passed
@morrySnow morrySnow deleted the fix_normalize_window branch September 28, 2023 06:52
xiaokang pushed a commit that referenced this pull request Oct 3, 2023
…#24928)

when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
vinlee19 pushed a commit to vinlee19/doris that referenced this pull request Oct 7, 2023
…apache#24928)

when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example

Window(max(...), c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

But, in some cases, we remove some SlotReference by mistake, for example

Window(max(...), c1, c1 as a1)

after normalization, we get

Window(max(...), a1)
+-- Project(..., c1 as a1)

we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get

Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.0.3-merged merge_conflict reviewed usercase Important user case type label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants