Filtering on duplicate columns after lazy right-join giving incorrect results #21142
Closed
2 tasks done
Labels
A-optimizer
Area: plan optimization
accepted
Ready for implementation
bug
Something isn't working
P-high
Priority: high
python
Related to Python Polars
regression
Issue introduced by a new release
Checks
Reproducible example
When filtering on a duplicate column name after a lazy join, it looks like the wrong column is used, which results in incorrect results. When collecting right after the join and then filtering, the results are correct.
To reproduce, we can start with 2 DataFrames to be joined together on
join
, and that have a duplicate column nameduplicate
:Example 1: Right join with filtering on left column = wrong result
Example 2: Left join with filtering on right column = correct result
When filtering on the right column, the issue does not seem to occur:
Different filters also give wrong results
The above examples only use
.is_null()
, but other filters give the same issue:Log output
Issue description
When filtering on a duplicate column name after a lazy join, it looks like the wrong column is used, which results in incorrect results. When collecting right after the join and then filtering, the results are correct.
Expected behavior
The results of a filter on a LazyFrame is the same as for a DataFrame.
Installed versions
The text was updated successfully, but these errors were encountered: