-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-24339][SQL]Add project for transform/map/reduce sql to prune column #21447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@gatorsmile Can you trigger this? |
| // Add project. | ||
| val namedExpressions = expressions.map { | ||
| case e: NamedExpression => e | ||
| case e: Expression => UnresolvedAlias(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: case e: _ => UnresolvedAlias(e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the type of expressions is Expression, so i think
case e: _ => UnresolvedAlias(e) and case e: Expression => UnresolvedAlias(e) is equivalent.
Did have other reasons to change this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a style issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The style is updated, review this, please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried, "case e: _ => UnresolveAlias(e)" will occur "unbound wildcard type" complie error. I will revert to "case e: Expression => UnresolvedAlias(e)".
|
Could you add explain result differences with/without this pr in the description? |
|
@maropu I updated the commet. In summary, with this pr can reduce the time of scan and assemble data. In our scenario, the relation(table) have 700 columns. |
|
@maropu @gatorsmile Do you have any comment/suggestion for this PR? Thanks. |
|
@maropu @gatorsmile @liancheng @HyukjinKwon Can you help me review this pr ? Thanks。 # |
|
kindly pining @gatorsmile @HyukjinKwon @ueshin |
|
ok to test |
|
Test build #92842 has finished for PR 21447 at commit
|
|
retest this please |
|
Test build #92916 has started for PR 21447 at commit |
|
Test build havs finished, be killed, why the result not shows here? retest this please! |
|
Wait, why is it against branch-2.2 not master? |
|
Test build #93103 has finished for PR 21447 at commit
|
|
@HyukjinKwon Our project is based on the branch-2.2, we will merge the patch to local branch manually if it is against master. we prefer following community to local branch. If you don't like this approach, I will close the PR and make a new PR to against master. |
|
Yea, usually we do it for master first and then backport it to other branches to reduce the diff and master has the fix. I would appreciate it if you go in this way. |
|
Sorry, the fix does not look good to me. We should let the optimizer add the project automatically. |
|
@xdcjie, mind closing this one? |
|
Ok! |
|
@maropu Do you want to take this over and add such a project in |
|
I want to give a follow up PR and cc @gatorsmile @maropu for a review. |
What changes were proposed in this pull request?
Transform query do not have Project Node, query like:
select transform(a, b) using 'func' from eand it' logic plan is:
so that it will scan all the column data of relation.
In this PR, I propose to add Project Node for transform query, so that it scan required data by prune columns, for above transform query, it's logic plan will be:
it will scan only two column data of relation.
In summary, Add Project Node for transform query can reduce the time of scan and assemble data.(In our scenario, the relation(table) have 700 columns.)
How was this patch tested?
Modify existing test ("transform query spec")