Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Implement join #559

Closed

Conversation

leiysky
Copy link
Contributor

@leiysky leiysky commented May 16, 2021

Summary

Initial tasks to implement join:

  • Support parse join
  • Support convert join to Pipeline
  • Support execute join

Changelog

  • New Feature

Related Issues

Related to #319

Test Plan

Unit Tests
Stateless Tests

}

async fn execute(&self) -> Result<SendableDataBlockStream> {
let read_left_task = tokio::task::spawn(Self::read_from_sink(self.left.clone()));
Copy link
Member

@BohuTANG BohuTANG May 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tokio:task::spawn will push the task to the global tokio runtime.
In order to limit the resources of one session, we should use context::execute_task:
https://github.com/datafuselabs/datafuse/blob/master/fusequery/query/src/pipelines/processors/processor_merge.rs#L64

It will bind all the threads used by the processors to the context's runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reminding me about this.

I was intending to implement a really simple version at first to help me understand the whole procedure, but so far I still haven't find a proper way to build join transform into Pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

planner -> transform:
https://github.com/datafuselabs/datafuse/blob/master/fusequery/query/src/pipelines/processors/pipeline_builder.rs

For a sql, steps to a pipeline:

  1. parser to planner
  2. optimizer
  3. planner -> transform by pipeline builder
  4. executor the pipeline

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BohuTANG Yes, but it seems fusequery currently has strong consumption that a PlanNode only has one input, which doesn't fit join well.

I'm finding a way to handle this properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extend it to many if need, processor now has many inputs.

@leiysky leiysky force-pushed the feature/support-parse-join branch from 946bd8b to 78b5503 Compare May 17, 2021 13:15
@github-actions github-actions bot added C-documentation Category: documentation A-query Area: databend query fusestore labels May 17, 2021
@leiysky leiysky force-pushed the feature/support-parse-join branch from 78b5503 to 4766d2d Compare May 17, 2021 13:19
@github-actions github-actions bot removed C-documentation Category: documentation fusestore labels May 17, 2021
@leiysky leiysky force-pushed the feature/support-parse-join branch from 4766d2d to b653b0b Compare May 19, 2021 14:21
@databend-bot databend-bot added the pr-feature this PR introduces a new feature to the codebase label May 22, 2021
@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your title and description.

Please review the labels and make any necessary changes.

@leiysky
Copy link
Contributor Author

leiysky commented May 23, 2021

image

We have cartesian product now.

@databendlabs databendlabs deleted a comment from CLAassistant Jun 4, 2021
@BohuTANG BohuTANG mentioned this pull request Jun 7, 2021
2 tasks
@leiysky
Copy link
Contributor Author

leiysky commented Jun 14, 2021

I will finish the prequisite of Join in other PRs first, so this PR won't be ready for review within a predictable period of time.

@leiysky leiysky closed this Sep 13, 2021
@leiysky leiysky deleted the feature/support-parse-join branch May 3, 2022 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-query Area: databend query pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants