Skip to content

batches_to_sort_string differing from similar implementation in assert_batches_sorted_eq #15312

@Shreyaskr1409

Description

@Shreyaskr1409

Describe the bug

I was migrating tests to insta in a PR #15248 and got a problem. For same expected output in a test, I was finding differing old and new snapshots while using batches_to_sort_string and no differing snapshots while using assert_batches_sorted_eq. I did not encounter this issue while migrating many other tests in /datafusion/physical-plan, this weirdly was the first time I encountered this issue.

Edit: Also referencing at the following PR comment #15288 (comment) . This is also a problem discovered so far.

Previous code (using assert_batches_sorted_eq) :

        let expected = [
            "+---+---+---+----+---+---+",
            "| a | b | c | a  | b | c |",
            "+---+---+---+----+---+---+",
            "|   |   |   | 30 | 3 | 6 |",
            "|   |   |   | 40 | 4 | 4 |",
            "| 2 | 7 | 9 | 10 | 2 | 7 |",
            "| 2 | 7 | 9 | 20 | 2 | 5 |",
            "| 0 | 4 | 7 |    |   |   |",
            "| 1 | 5 | 8 |    |   |   |",
            "| 2 | 8 | 1 |    |   |   |",
            "+---+---+---+----+---+---+",
        ];
        assert_batches_sorted_eq!(expected, &batches);

New code (using batches_to_sort_string) :

        allow_duplicates! {
            assert_snapshot!(batches_to_sort_string(&batches), @r#"
            +---+---+---+----+---+---+
            | a | b | c | a  | b | c |
            +---+---+---+----+---+---+
            |   |   |   | 30 | 3 | 6 |
            |   |   |   | 40 | 4 | 4 |
            | 2 | 7 | 9 | 10 | 2 | 7 |
            | 2 | 7 | 9 | 20 | 2 | 5 |
            | 0 | 4 | 7 |    |   |   |
            | 1 | 5 | 8 |    |   |   |
            | 2 | 8 | 1 |    |   |   |
            +---+---+---+----+---+---+
                "#)
        }

In both cases, I had made sure several times that the expected output is the same.

I am getting the following output while using new code:
Image

To Reproduce

In /datafusion/physical-plan/src/joins/hash_join.rs,
replace following part in async fn join_full_with_filter(batch_size: usize) -> Result<()>:

        let expected = [
            "+---+---+---+----+---+---+",
            "| a | b | c | a  | b | c |",
            "+---+---+---+----+---+---+",
            "|   |   |   | 30 | 3 | 6 |",
            "|   |   |   | 40 | 4 | 4 |",
            "| 2 | 7 | 9 | 10 | 2 | 7 |",
            "| 2 | 7 | 9 | 20 | 2 | 5 |",
            "| 0 | 4 | 7 |    |   |   |",
            "| 1 | 5 | 8 |    |   |   |",
            "| 2 | 8 | 1 |    |   |   |",
            "+---+---+---+----+---+---+",
        ];
        assert_batches_sorted_eq!(expected, &batches);

with

        allow_duplicates! {
            assert_snapshot!(batches_to_sort_string(&batches), @r#"
            +---+---+---+----+---+---+
            | a | b | c | a  | b | c |
            +---+---+---+----+---+---+
            |   |   |   | 30 | 3 | 6 |
            |   |   |   | 40 | 4 | 4 |
            | 2 | 7 | 9 | 10 | 2 | 7 |
            | 2 | 7 | 9 | 20 | 2 | 5 |
            | 0 | 4 | 7 |    |   |   |
            | 1 | 5 | 8 |    |   |   |
            | 2 | 8 | 1 |    |   |   |
            +---+---+---+----+---+---+
                "#)
        }

Expected behavior

Similar results for both the tests.

Additional context

I did not encounter this issue while migrating many other tests in /datafusion/physical-plan, this weirdly was the first time I encountered this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions