Skip to content

Conversation

@blaginin
Copy link
Collaborator

@blaginin blaginin commented Mar 1, 2025

Which issue does this PR close?

Rationale for this change

I feel like there should be a way to apply swapping even if the file is partitioned - but submitting a hotfix since it's a release blocker

What changes are included in this PR?

Are these changes tested?

Added a test

Are there any user-facing changes?

No

@blaginin
Copy link
Collaborator Author

blaginin commented Mar 1, 2025

uv run pytest 
================================================== 472 passed, 4 skipped, 47 deselected, 90 warnings in 23.16s ===================================================

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @blaginin

It would also be great to find a reproducer somehow. I don't have any more time this morning to help but I can try to find some later today or tomorrow morning

Ok(all_alias_free_columns(projection.expr()).then(|| {

Ok((all_alias_free_columns(projection.expr())
&& self.table_partition_cols.is_empty())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the check might be if any of the columns needed are in the table_partition_cols (rather than there just being partition columns at all) 🤔 Or something like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great point 🤗

@blaginin blaginin force-pushed the bugfix/do-not-swap-proj-for-paritions branch from 88695a2 to 55f8b30 Compare March 1, 2025 21:37
@github-actions github-actions bot added the core Core DataFusion crate label Mar 1, 2025
@blaginin blaginin marked this pull request as ready for review March 1, 2025 23:13
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @blaginin -- this is great. I was worried that this didn't handle the case when the pushed projection had an expression, but I wrote a test (I will make a follow on PR) and it seems to work

Nice job

@alamb alamb merged commit 5e27008 into apache:main Mar 2, 2025
25 checks passed
@alamb
Copy link
Contributor

alamb commented Mar 2, 2025

I made a small follow on:

alamb pushed a commit to alamb/datafusion that referenced this pull request Mar 2, 2025
* Do not swap with projection when file is partitioned

* Narrow the case when not swapping

* Add test
alamb added a commit that referenced this pull request Mar 2, 2025
* Do not swap with projection when file is partitioned

* Narrow the case when not swapping

* Add test

Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

index out of bounds: the len is 2 but the index is 2 in some data sources

2 participants