Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Postgres TABLESAMPLE support. #3921

Merged
merged 7 commits into from
Mar 7, 2024
Merged

Conversation

dvogel
Copy link

@dvogel dvogel commented Feb 2, 2024

These changes add support for queries like so:

SELECT * FROM my_table TABLESAMPLE BERNOULLI(10)

The syntax, excerpt from the PostgreSQL docs:

TABLESAMPLE sampling_method ( argument [, ...] ) [ REPEATABLE ( seed ) ]

I'm skeptical that these changes are ready to merge. The basic functionality does work for me in a side project. However I have not been able to test this to a level I feel comfortable with.

Note the TODO comment regarding the AppearsInFromClause associated Count type. It is set to Once but I struggled to understand how this is used. If this means once per joinable fragment then it should be Once. However PostgreSQL does allow separate TABLESAMPLE clauses in each joinable fragment. Since it seems meant to pertain to the underlying table, perhaps it should be derived from the S type parameter? If so I'm afraid this is beyond my current understanding of the diesel_derives code :)

I'm not sure whether I've properly exported the TablesampleDsl trait correctly. In my application code, where I call .tablesample(...) I also need to use diesel::dsl::TablesampleDsl. That felt a little unexpected. I looked into re-exporting it into the prelude but I didn't see other backend-specific DSL traits re-exported there so I backed away.

It seems most of the DSL tests are in the doctests. I tried to follow that pattern but I wasn't able to run those. A naive cargo rustdoc --features postgres -- --test led to all tests failing. I'd very much appreciate any direction you can provide to run those tests.

Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this 👍 That's really helpful and the PR is a good starting point to iterate on.

Note the TODO comment regarding the AppearsInFromClause associated Count type. It is set to Once but I struggled to understand how this is used. If this means once per joinable fragment then it should be Once. However PostgreSQL does allow separate TABLESAMPLE clauses in each joinable fragment. Since it seems meant to pertain to the underlying table, perhaps it should be derived from the S type parameter? If so I'm afraid this is beyond my current understanding of the diesel_derives code :)

AppearsInFromClause::Count there indicates how often a query source appears in your from clause. Once says, that a certain query source (that one that's given to AppearsInFromClause as generic argument) appears in your from clause, which later allows columns in select/filter/etc use that table. The added impl essentially just say that Tablesample is samewhat transparent there and can be treated just like table itself.

I'm not sure whether I've properly exported the TablesampleDsl trait correctly. In my application code, where I call .tablesample(...) I also need to use diesel::dsl::TablesampleDsl. That felt a little unexpected. I looked into re-exporting it into the prelude but I didn't see other backend-specific DSL traits re-exported there so I backed away.

The prelude reexports methods from expression_methods: https://github.com/diesel-rs/diesel/blob/master/diesel/src/lib.rs#L648
which in turn reexports the backend specific variants: https://github.com/diesel-rs/diesel/blob/master/diesel/src/expression_methods/mod.rs. That written: That trait does not really fit into the expression_methods module as it's something different. I would likely reexport it directly from diesel::prelude (and add OnlyDsl as well there).

It seems most of the DSL tests are in the doctests. I tried to follow that pattern but I wasn't able to run those. A naive cargo rustdoc --features postgres -- --test led to all tests failing. I'd very much appreciate any direction you can provide to run those tests.

There are also some tests in diesel_tests, but it should be fine use doc-tests for this functionality. After all we only care that it's covered somewhere.

For running the tests you need to:

  • Set the DATABASE_URL environment variable to a database that should be used for tests
  • Run tests via cargo test --no-default-features --features "postgres"

In addition to the doc tests, I would appreciate having at least one test for this in diesel_compile_tests, that show this can only be used with the postgres backend.

diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel_derives/src/table.rs Outdated Show resolved Hide resolved
diesel_derives/src/table.rs Outdated Show resolved Hide resolved
@dvogel
Copy link
Author

dvogel commented Feb 6, 2024

Thank you for all of your feedback. I found it quite helpful! I have updated this branch to:

  • add a compile test
  • use bind params
  • export the dsl symbols more ergonomically
  • encode the sql specialization into type parameters.

Regarding the type parameters, I'm hoping you can provide some feedback regarding how close my approach is to your suggestion. I don't use this approach much in my own code so I'm likely missing some of the nuance in the existing dsl code. The code works in my own project, however the second of the two doctests fails to compile and I'm struggling to figure out why. The error:

---- diesel/src/pg/expression/extensions/tablesample_dsl.rs - pg::expression::extensions::tablesample_dsl::TablesampleDsl (line 63) stdout ----
error[E0277]: the trait bound `query_source::joins::Join<Tablesample<users::table, diesel::dsl::BernoulliMethod>, posts::table, Inner>: AppearsInFromClause<posts::table>` is not satisfied
   --> diesel/src/pg/expression/extensions/tablesample_dsl.rs:73:6
    |
12  |     .inner_join(posts::table)
    |      ^^^^^^^^^^ the trait `AppearsInFromClause<posts::table>` is not implemented for `query_source::joins::Join<Tablesample<users::table, diesel::dsl::BernoulliMethod>, posts::table, Inner>`
    |
    = help: the trait `AppearsInFromClause<T>` is implemented for `query_source::joins::Join<Left, Right, Kind>`
note: required for `posts::columns::id` to implement `AppearsOnTable<query_source::joins::Join<Tablesample<users::table, diesel::dsl::BernoulliMethod>, posts::table, Inner>>`
   --> diesel/src/pg/expression/extensions/../../../doctest_setup.rs:285:13
    |
285 |             id -> Integer,
    |             ^^
    = note: 2 redundant requirements hidden
    = note: required for `((users::columns::id, users::columns::name), (posts::columns::id, posts::columns::user_id, posts::columns::title))` to implement `AppearsOnTable<query_source::joins::Join<Tablesample<users::table, diesel::dsl::BernoulliMethod>, posts::table, Inner>>`
    = note: required for `query_source::joins::Join<Tablesample<users::table, diesel::dsl::BernoulliMethod>, posts::table, Inner>` to implement `QuerySource`
    = note: 1 redundant requirement hidden
    = note: required for `JoinOn<Join<Tablesample<table, BernoulliMethod>, table, Inner>, Grouped<Eq<Nullable<user_id>, Nullable<id>>>>` to implement `QuerySource`
    = note: the full type name has been written to '/tmp/rustdoctestMSz9fp/rust_out.long-type-14004784566645490992.txt'
    = note: required for `SelectStatement<FromClause<Tablesample<users::table, diesel::dsl::BernoulliMethod>>>` to implement `InternalJoinDsl<posts::table, Inner, diesel::expression::grouped::Grouped<diesel::expression::operators::Eq<NullableExpression<posts::columns::user_id>, NullableExpression<users::columns::id>>>>`

The difference between the two examples is the join, as the error indicates. However I'm having trouble understanding how the type constraints are being applied here. In diesel/src/query_source/joins.rs there is an impl for AppearsInFromClause:

impl<T, Left, Right, Kind> AppearsInFromClause<T> for Join<Left, Right, Kind>
where
    Left: AppearsInFromClause<T> + QuerySource,
    Right: AppearsInFromClause<T> + QuerySource,
    Left::Count: Plus<Right::Count>,
{
    type Count = <Left::Count as Plus<Right::Count>>::Output;
}

Since this impl targets Join, then something about the constraints on the type parameters must cause a mismatch:

Left: Tablesample<users::table, diesel::dsl::BernoulliMethod>
Left: AppearsInFromClause<T> + QuerySource,

In diesel_derives/src/table.rs I added:

impl<TSM> diesel::query_source::AppearsInFromClause<diesel::query_builder::Tablesample<table, TSM>>
    for table
    where TSM:  diesel::query_builder::TablesampleMethod
{
	type Count = diesel::query_source::Once;
}

Since BernoulliMethod is an impl of TablesampleMethod, that constraint is satisfied. Let's look at QuerySource. In src/pg/query_builder/tablesample.rs I added:

impl<S, TSM> QuerySource for Tablesample<S, TSM>
where
    S: Table + Clone,
    TSM: TablesampleMethod,
    <S as QuerySource>::DefaultSelection:
        ValidGrouping<()> + SelectableExpression<Tablesample<S, TSM>>,
{
    type FromClause = Self;
    type DefaultSelection = <S as QuerySource>::DefaultSelection;

It's pretty obvious that users implements Table. In diesel_derives/src/table.rs I added:

impl<TSM> diesel::SelectableExpression<diesel::query_builder::Tablesample<super::table, TSM>>
    for #column_name
    where TSM:  diesel::query_builder::TablesampleMethod {}

So it seems my mistake is likely not satisfying this constraint:

    <S as QuerySource>::DefaultSelection:
        ValidGrouping<()> + SelectableExpression<Tablesample<S, TSM>>

However, with the exception of the TSM type parameter, this follows the Only impl of QuerySource that works with joins.

Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. This is now much closer to what I had in mind 👍

I've left a bunch of comment for some changes to the exposed API, but the implementation itself, based on the generic marker types looks fine.

As for the errors with the joins: It looks like you might missing the equivalent of these impls:

impl $crate::query_source::TableNotEqual<$left::table>
for $crate::query_builder::Only<$right::table>
{
}
impl $crate::query_source::TableNotEqual<$right::table>
for $crate::query_builder::Only<$left::table>
{
}
impl $crate::query_source::TableNotEqual<$crate::query_builder::Only<$left::table>>
for $right::table
{
}
impl $crate::query_source::TableNotEqual<$crate::query_builder::Only<$right::table>>
for $left::table
{
}

Also you likely need to update the output for the other compile tests as well as some of the error output is quite sensitive to what rustc emits (It contains a number of types that implement a certain trait, which obviously might change here)

@dvogel
Copy link
Author

dvogel commented Feb 10, 2024

Thanks for pointing me toward macros/mod.rs! Adding those trait impls did allow my doctest to compile. The latest changes include the builder pattern approach you suggested. I believe all of the test failures are unrelated to my changes. I have a CI run with a few extra unrelated fixes here that show the postgresql tablesample tests all passing.

@weiznich
Copy link
Member

The fixes for the failing CI tests have been merged, so if you rebase your PR I can do a hopefully final review.

@dvogel
Copy link
Author

dvogel commented Feb 21, 2024

Thanks for the heads up. I've rebased this and marked it as ready for review.

@dvogel dvogel marked this pull request as ready for review February 21, 2024 05:30
Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update 👍

I'm sorry but I have another round of mostly minor comments about documentation and what to make public where. It would be great if you could address them, otherwise I might fix them myself in the next week.

The other thing that would be great to have is an entry into the Changelog.md file mentioning this new feature.

diesel/src/pg/expression/extensions/tablesample_dsl.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Outdated Show resolved Hide resolved
diesel/src/pg/query_builder/tablesample.rs Show resolved Hide resolved
diesel/src/query_builder/mod.rs Show resolved Hide resolved
@dvogel
Copy link
Author

dvogel commented Feb 23, 2024

Thanks for the update 👍

I'm sorry but I have another round of mostly minor comments about documentation and what to make public where. It would be great if you could address them, otherwise I might fix them myself in the next week.

The other thing that would be great to have is an entry into the Changelog.md file mentioning this new feature.

From what I can tell reading through these on my phone they look like appropriate changes. I'd be happy to make them. I'll be away from my computer until the 1st of March though.

@weiznich
Copy link
Member

No worries, take the time you need for this I would not have time to do this on my own in the next week or so.

Copy link
Member

@weiznich weiznich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. It looks good now 👍

I will take care of the CI failure tomorrow. It seems to be unrelated (and caused by a new deprecation in a new chrono version)

@weiznich weiznich enabled auto-merge March 7, 2024 11:24
@weiznich weiznich added this pull request to the merge queue Mar 7, 2024
Merged via the queue into diesel-rs:master with commit e5fc2a8 Mar 7, 2024
48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants