Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split EmptyExec into PlaceholderRowExec #8446

Merged
merged 5 commits into from
Dec 9, 2023

Conversation

razeghi71
Copy link
Contributor

@razeghi71 razeghi71 commented Dec 7, 2023

Which issue does this PR close?

Closes #8355

Rationale for this change

Explained in the issue. Please note that I've previously tried to do this using MemoryExec in #8412, but since MemoryExec doesn't have a serializer and it wasn't trivial to implement one, I took the original way proposed in the issue.

I named it PlaceHolderRowExec instead of OneRowExec as this is really more like a place holder with null values for columns until a projection gets applied to it and transform it to another format.

What changes are included in this PR?

In this PR I'm splitting EmptyExec case of produce_one_row=true to PlaceHolderRowExec. What I didn't do here is that I didn't split the same case in EmptyRelation, probably better to do it in a separate PR.

Are these changes tested?

Yes. the tests can be find the PlaceHolderRowExec file. Also the integration tests are changed to reflect this change.

Are there any user-facing changes?

When showing the physical plan, instead of EmptyExec produce_one_row=value there are now two different cases: EmptyExec and PlaceHolderRowExec

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) labels Dec 7, 2023
@razeghi71 razeghi71 marked this pull request as ready for review December 7, 2023 02:52
Copy link
Member

@waynexia waynexia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @razeghi71

I only have one naming question: which one is preferred? placeholder/Placeholder or place_holder/PlaceHolder

@razeghi71
Copy link
Contributor Author

I tend to think of Placeholder as one word but totally okay with the other one. If you're also okay with Placeholder instead of PlaceHolder I make another commit changing it.

@waynexia
Copy link
Member

waynexia commented Dec 7, 2023

I tend to think of Placeholder as one word but totally okay with the other one. If you're also okay with Placeholder instead of PlaceHolder I make another commit changing it.

Great! Other places also use Placeholder like Expr::Placeholder. I only find one occurrence of PlaceHolder in comment.

@razeghi71
Copy link
Contributor Author

We do have PlaceHolderRegistry tho: 🤔

https://github.com/apache/arrow-datafusion/blob/33fc1104c199904fb0ee019546ac6587e7088316/datafusion/proto/src/bytes/mod.rs#L107

I'll rename PlaceHolderRowExec to PlaceholderRowExec now. And maybe we rename that one also in another PR if we want.

@github-actions github-actions bot added the logical-expr Logical plan and expressions label Dec 7, 2023
@razeghi71 razeghi71 changed the title Split EmptyExec into PlaceHolderRowExec Split EmptyExec into PlaceholderRowExec Dec 7, 2023
@alamb
Copy link
Contributor

alamb commented Dec 7, 2023

Thank you @razeghi71 -- I hope to review this PR tomorrow

@alamb alamb added the api change Changes the API exposed to users of the crate label Dec 8, 2023
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @razeghi71 -- this PR good to me. I do think it is worth considering a different name for this node (even though I know you and @waynexia have been working to ensure consistent use of PlaceHolder). However I don't think it is required and this is an improvement.

let me know what you think


fn data(&self) -> Result<Vec<RecordBatch>> {
Ok({
let n_field = self.schema.fields.len();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the schema can ever have more than 0 fields 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does, In this case for example:

CREATE TABLE test (c INT) as values(0);
SELECT COUNT(*) from test;


/// Execution plan for empty relation with produce_one_row=true
#[derive(Debug)]
pub struct PlaceholderRowExec {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about calling this OneRowExec? Perhaps that would make the use case of this node more clear than the name PlaceholderRowExec.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with PlaceholderRow because it's always a row of null values, with columns named placeholder_{n} that get projected later. But if OneRowExec sounds better to you, I can definitely switch it up. Let me know what you think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think PlaceHolderExec is fine

@alamb
Copy link
Contributor

alamb commented Dec 9, 2023

I took the liberty of merging up from main to resolve a conflict on this PR

@alamb alamb merged commit d091b55 into apache:main Dec 9, 2023
22 checks passed
@razeghi71 razeghi71 deleted the refactor/split-empty-exec branch December 9, 2023 19:03
appletreeisyellow pushed a commit to appletreeisyellow/datafusion that referenced this pull request Dec 15, 2023
* add PlaceHolderRowExec

* Change produce_one_row=true calls to use PlaceHolderRowExec

* remove produce_one_row from EmptyExec, changes in proto serializer, working tests

* PlaceHolder => Placeholder

---------

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api change Changes the API exposed to users of the crate core Core DataFusion crate logical-expr Logical plan and expressions optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Split EmptyExec into OneRowExec
3 participants