Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-49560][SQL] Add SQL pipe syntax for the TABLESAMPLE operator #48168

Closed
wants to merge 14 commits into from

Conversation

dtenedor
Copy link
Contributor

@dtenedor dtenedor commented Sep 19, 2024

What changes were proposed in this pull request?

WIP

This PR adds SQL pipe syntax support for the TABLESAMPLE operator.

For example:

CREATE TABLE t(x INT, y STRING) USING CSV;
INSERT INTO t VALUES (0, 'abc'), (1, 'def');

TABLE t
|> TABLESAMPLE (100 PERCENT) REPEATABLE (0)
|> TABLESAMPLE (5 ROWS) REPEATABLE (0)
|> TABLESAMPLE (BUCKET 1 OUT OF 1) REPEATABLE (0);

0	abc
1	def

Why are the changes needed?

The SQL pipe operator syntax will let users compose queries in a more flexible fashion.

Does this PR introduce any user-facing change?

Yes, see above.

How was this patch tested?

This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Sep 19, 2024
@dtenedor dtenedor changed the title [WIP][SPARK-49561][SQL] Add SQL pipe syntax for the TABLESAMPLE operator [WIP][SPARK-49560][SQL] Add SQL pipe syntax for the TABLESAMPLE operator Sep 25, 2024
@dtenedor dtenedor changed the title [WIP][SPARK-49560][SQL] Add SQL pipe syntax for the TABLESAMPLE operator [SPARK-49560][SQL] Add SQL pipe syntax for the TABLESAMPLE operator Sep 30, 2024
@dtenedor dtenedor marked this pull request as ready for review September 30, 2024 18:19
@dtenedor
Copy link
Contributor Author

Alright @gengliangwang @cloud-fan here's the TABLESAMPLE operator. It is super simple, we just add | sample to the list of parsing options and call withSample(c, left) on it from the AstBuilder. 🙏

@gengliangwang
Copy link
Member

Thanks, merging to master

attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

WIP

This PR adds SQL pipe syntax support for the TABLESAMPLE operator.

For example:

```
CREATE TABLE t(x INT, y STRING) USING CSV;
INSERT INTO t VALUES (0, 'abc'), (1, 'def');

TABLE t
|> TABLESAMPLE (100 PERCENT) REPEATABLE (0)
|> TABLESAMPLE (5 ROWS) REPEATABLE (0)
|> TABLESAMPLE (BUCKET 1 OUT OF 1) REPEATABLE (0);

0	abc
1	def
```

### Why are the changes needed?

The SQL pipe operator syntax will let users compose queries in a more flexible fashion.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48168 from dtenedor/pipe-tablesample.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
himadripal pushed a commit to himadripal/spark that referenced this pull request Oct 19, 2024
### What changes were proposed in this pull request?

WIP

This PR adds SQL pipe syntax support for the TABLESAMPLE operator.

For example:

```
CREATE TABLE t(x INT, y STRING) USING CSV;
INSERT INTO t VALUES (0, 'abc'), (1, 'def');

TABLE t
|> TABLESAMPLE (100 PERCENT) REPEATABLE (0)
|> TABLESAMPLE (5 ROWS) REPEATABLE (0)
|> TABLESAMPLE (BUCKET 1 OUT OF 1) REPEATABLE (0);

0	abc
1	def
```

### Why are the changes needed?

The SQL pipe operator syntax will let users compose queries in a more flexible fashion.

### Does this PR introduce _any_ user-facing change?

Yes, see above.

### How was this patch tested?

This PR adds a few unit test cases, but mostly relies on golden file test coverage. I did this to make sure the answers are correct as this feature is implemented and also so we can look at the analyzer output plans to ensure they look right as well.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#48168 from dtenedor/pipe-tablesample.

Authored-by: Daniel Tenedorio <daniel.tenedorio@databricks.com>
Signed-off-by: Gengliang Wang <gengliang@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants