Skip to content

Conversation

qstommyshu
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Migrated tests in datafusion/core/tests/subtrait to use insta.

Are these changes tested?

Yes, I manually tested the before/after changes.

Are there any user-facing changes?

No.

@github-actions github-actions bot added the substrait Changes to the substrait crate label Mar 28, 2025
@qstommyshu
Copy link
Contributor Author

Hi @alamb and @blaginin

Part2 of the substrait tests migration is done as well. Please take a look when you have time :)

The only tests that cannot be changed to insta are tests in simple_intersect_table_reuse()and simple_intersect() in this file. I don't think it is possible to migrate these tests to insta because their expected_plan_str requires formatting variable into a string. It is not supported in Rust to generate a raw string with variable, and raw string is a required argument for insta inline snapshot test. So I left those tests out (I don't think it is possible to migrate them into insta).

Thanks

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thankk you @qstommyshu -- I think this looks great to me. I had one minor comment but not required

Thanks again 🚀

@blaginin
Copy link
Contributor

blaginin commented Mar 28, 2025

can you merge main into this branch please? to remove extra diff

@qstommyshu
Copy link
Contributor Author

I'll do a last commit to resolve those comments

@alamb
Copy link
Contributor

alamb commented Mar 28, 2025

🌶️

Copy link
Member

@xudong963 xudong963 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

@xudong963
Copy link
Member

cc @blaginin

@blaginin
Copy link
Contributor

Hey again, thanks for working on that 🙏

can you merge main into this branch please? to remove extra diff

Just to explain, the current PR diff is quite large because it includes code that's already merged. Merging or rebasing would make the diff much smaller, making it easier to review.

Closes #15398.

There might potentially be a few more tests to update. For example, fn simple_intersect (with allow_duplicates?).

@blaginin
Copy link
Contributor

Hey again, thanks for working on this 🙏

can you merge main into this branch please? to remove extra diff

Just to explain, the current PR diff is quite large because it includes code that's already merged. Merging or rebasing would make the diff much smaller, making it easier to review.

Closes #15398.

There might potentially be a few more tests to update. For example, fn simple_intersect (with allow_duplicates?).

}

async fn assert_expected_plan_unoptimized(
async fn assert_and_generate_plan(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two small nitpicks:

  1. in generate_plan_from_substrait, you return LogicalPlan which is asserted later. I personally really like that approach because you convert data to string as soon as possible. Maybe we could do the same thing here?
  2. I find assert_and_generate_plan a bit misleading because in reality you actually don't assert plan here (as it's happening in the test itself).

Copy link
Contributor Author

@qstommyshu qstommyshu Mar 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Updated the assert_and_generate_plan() to also return a LogicalPlan now. If I understand it correctly (I'm not too clear about what you mean by "convert data to string as soon as possible", I assume it means we can return a LogicalPlan as assert_snapshot converts it to String internally)?

  2. I renamed this function to generate_plan_from_sql() to suggest this function generates a logical plan from sql.

The assert_schema parameter determines if it does schema assertion internally, and the optimized parameter determines if we want it to generate an optimized plan.

Hope that resolves the comment. Please let me know if the comments are not resolved.

@qstommyshu
Copy link
Contributor Author

Hey again, thanks for working on this 🙏

can you merge main into this branch please? to remove extra diff

Just to explain, the current PR diff is quite large because it includes code that's already merged. Merging or rebasing would make the diff much smaller, making it easier to review.

Closes #15398.

There might potentially be a few more tests to update. For example, fn simple_intersect (with allow_duplicates?).

Hi, @blaginin I'm not sure what exactly do you mean by merge it to main? I see there is no conflicts with base branch so it probably means GitHub can fast forward it?

@blaginin
Copy link
Contributor

Hi, @blaginin I'm not sure what exactly do you mean by merge it to main? I see there is no conflicts with base branch so it probably means GitHub can fast forward it?

that's github diff as i see it, you see that almost all files are actually part of the previous PR which is merged and not actually needed to be reviewed here.

Zen Browser 2025-03-29 23 38 22

To remove them from the diff, you could merge main into your branch or do a rebase

@qstommyshu
Copy link
Contributor Author

Got it, thanks for pointing that out. Just cleared up the diff tree.

Copy link
Contributor

@blaginin blaginin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@qstommyshu
Copy link
Contributor Author

qstommyshu commented Mar 30, 2025

There might potentially be a few more tests to update. For example, fn simple_intersect (with allow_duplicates?).

I'm not sure if simple_intersect() can be updated to use insta.

It uses a parameter syntax to construct the expected_plan_str, which I don't think we can create a raw string (for inline assertion) with a variable.

As I understand, allow_duplicates allow a snapshot to be used multiple times in a test (useful for testing in loop), probably won't help much in the case of simple_intersect() and simple_intersect_table_reuse().

It was mentioned in previous comments:

The only tests that cannot be changed to insta are tests in simple_intersect_table_reuse()and simple_intersect() in this file. I don't think it is possible to migrate these tests to insta because their expected_plan_str requires formatting variable into a string. It is not supported in Rust to generate a raw string with variable, and raw string is a required argument for insta inline snapshot test. So I left those tests out (I don't think it is possible to migrate them into insta).

I could be wrong as well, please let me know if there is a better way to refactor these tests.

@alamb alamb merged commit 102f879 into apache:main Mar 30, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 30, 2025

Thanks again @qstommyshu -- I don't have any more suggestions on how to refactor the tests

nirnayroy pushed a commit to nirnayroy/datafusion that referenced this pull request May 2, 2025
* add `cargo insta` to dev dependencies

* migrate `consumer_intergration.rs` tests to `insta`

* Revert "migrate `consumer_intergration.rs` tests to `insta`"

This reverts commit c3be2eb.

* migrate `consumer_integration.rs` to `insta` inline snapshot

* migrate logical plans tests to use `insta` snapshots

* migrate emit_kind_tests to use `insta` snapshots

* migrate function_test to use `insta` snapshots for assertions

* migrate substrait_validations tests to use insta snapshots, missing `insta` mapping to `assert!`

* revert `handle_emit_as_project_without_volatile_exprs` back to `assert_eq!` and remove `format!` for `assert_snapshot!`

* migrate function and validation tests to use plan directly in assert_snapshot!

* migrate serialize tests to use insta snapshots for assertions

* migrate logical_plans test to use insta snapshots for assertions

* WIP

* migrate `assert_expected_plan_substrait`

* refactor tests to use assert_and_generate_plan and assert_snapshot! for improved clarity and consistency

* remove println!

* migrate tests to use generate_plan_from_sql for improved clarity
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate subtrait tests to insta

4 participants