Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

Closed
alamb opened this issue Jun 10, 2022 · 2 comments · Fixed by #2917
Closed

Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

alamb opened this issue Jun 10, 2022 · 2 comments · Fixed by #2917
Labels
bug Something isn't working

Comments

@alamb
Copy link
Contributor

alamb commented Jun 10, 2022

Describe the bug
The verify-release-candidate script failed for me locally while verifying release candidate 9.0.0 RC1

To Reproduce
Run the release verification script with 9.0.0 RC1

./dev/release/verify-release-candidate.sh 9.0.0 1

It eventually fails with the following message:

failures:

---- sql::explain_analyze::csv_explain stdout ----
thread 'sql::explain_analyze::csv_explain' panicked at 'assertion failed: `(left == right)`
  left: `[["logical_plan", "Projection: #aggregate_test_100.c1\n  Filter: #aggregate_test_100.c2 > Int64(10)\n    TableScan: aggregate_test_100 projection=Some([c1, c2]), partial_filters=[#aggregate_test_100.c2 > Int64(10)]"], ["physical_plan", "ProjectionExec: expr=[c1@0 as c1]\n  CoalesceBatchesExec: target_batch_size=4096\n    FilterExec: CAST(c2@1 AS Int64) > 10\n      RepartitionExec: partitioning=RoundRobinBatch(NUM_CORES)\n        CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c2]\n"]]`,
 right: `[["logical_plan", "Projection: #aggregate_test_100.c1\n  Filter: #aggregate_test_100.c2 > Int64(10)\n    TableScan: aggregate_test_100 projection=Some([c1, c2]), partial_filters=[#aggregate_test_100.c2 > Int64(10)]"], ["physical_plan", "ProjectionExec: expr=[c1@0 as c1]\n  CoalesceBatchesExec: target_batch_size=4096\n    FilterExec: CAST(c2@1 AS Int64) > 10\n      RepartitionExec: partitioning=RoundRobinBatch(NUM_CORES)\n        CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c2]\n"]]`', datafusion/core/tests/sql/explain_analyze.rs:766:5

---- sql::explain_analyze::test_physical_plan_display_indent stdout ----
thread 'sql::explain_analyze::test_physical_plan_display_indent' panicked at 'assertion failed: `(left == right)`
  left: `["GlobalLimitExec: skip=None, fetch=10", "  SortExec: [the_min@2 DESC]", "    CoalescePartitionsExec", "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]", "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "          CoalesceBatchesExec: target_batch_size=4096", "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "                CoalesceBatchesExec: target_batch_size=4096", "                  FilterExec: c12@1 < CAST(10 AS Float64)", "                    RepartitionExec: partitioning=RoundRobinBatch(9000)", "                      CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]"]`,
 right: `["GlobalLimitExec: skip=None, fetch=10", "  SortExec: [the_min@2 DESC]", "    CoalescePartitionsExec", "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]", "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "          CoalesceBatchesExec: target_batch_size=4096", "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "                CoalesceBatchesExec: target_batch_size=4096", "                  FilterExec: c12@1 < CAST(10 AS Float64)", "                    RepartitionExec: partitioning=RoundRobinBatch(9000)", "                      CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]"]`: expected:
[
    "GlobalLimitExec: skip=None, fetch=10",
    "  SortExec: [the_min@2 DESC]",
    "    CoalescePartitionsExec",
    "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]",
    "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "          CoalesceBatchesExec: target_batch_size=4096",
    "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "                CoalesceBatchesExec: target_batch_size=4096",
    "                  FilterExec: c12@1 < CAST(10 AS Float64)",
    "                    RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                      CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]",
]
actual:

[
    "GlobalLimitExec: skip=None, fetch=10",
    "  SortExec: [the_min@2 DESC]",
    "    CoalescePartitionsExec",
    "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]",
    "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "          CoalesceBatchesExec: target_batch_size=4096",
    "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "                CoalesceBatchesExec: target_batch_size=4096",
    "                  FilterExec: c12@1 < CAST(10 AS Float64)",
    "                    RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                      CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]",
]
', datafusion/core/tests/sql/explain_analyze.rs:680:5

---- sql::explain_analyze::test_physical_plan_display_indent_multi_children stdout ----
thread 'sql::explain_analyze::test_physical_plan_display_indent_multi_children' panicked at 'assertion failed: `(left == right)`
  left: `["ProjectionExec: expr=[c1@0 as c1]", "  CoalesceBatchesExec: target_batch_size=4096", "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "          ProjectionExec: expr=[c1@0 as c1]", "            ProjectionExec: expr=[c1@0 as c1]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)", "          ProjectionExec: expr=[c2@0 as c2]", "            ProjectionExec: expr=[c1@0 as c2]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]"]`,
 right: `["ProjectionExec: expr=[c1@0 as c1]", "  CoalesceBatchesExec: target_batch_size=4096", "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "          ProjectionExec: expr=[c1@0 as c1]", "            ProjectionExec: expr=[c1@0 as c1]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)", "          ProjectionExec: expr=[c2@0 as c2]", "            ProjectionExec: expr=[c1@0 as c2]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]"]`: expected:
[
    "ProjectionExec: expr=[c1@0 as c1]",
    "  CoalesceBatchesExec: target_batch_size=4096",
    "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c1@0 as c1]",
    "            ProjectionExec: expr=[c1@0 as c1]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c2@0 as c2]",
    "            ProjectionExec: expr=[c1@0 as c2]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
]
actual:

[
    "ProjectionExec: expr=[c1@0 as c1]",
    "  CoalesceBatchesExec: target_batch_size=4096",
    "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c1@0 as c1]",
    "            ProjectionExec: expr=[c1@0 as c1]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c2@0 as c2]",
    "            ProjectionExec: expr=[c1@0 as c2]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
]
', datafusion/core/tests/sql/explain_analyze.rs:731:5


failures:
    sql::explain_analyze::csv_explain
    sql::explain_analyze::test_physical_plan_display_indent
    sql::explain_analyze::test_physical_plan_display_indent_multi_children

test result: FAILED. 362 passed; 3 failed; 2 ignored; 0 measured; 0 filtered out; finished in 3.11s

error: test failed, to rerun pass '-p datafusion --test sql_integration'
+ cleanup
+ '[' no = yes ']'
+ echo 'Failed to verify release candidate. See /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/arrow-9.0.0.XXXXX.KsfEL7Og for details.'
Failed to verify release candidate. See /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/arrow-9.0.0.XXXXX.KsfEL7Og for details.

Expected behavior
The verification should pass

Additional context
Mailing list thread: https://lists.apache.org/thread/7mg9kwlfyrxm5fx96w8q0c436by93567

@alamb alamb added the bug Something isn't working label Jun 10, 2022
@alamb
Copy link
Contributor Author

alamb commented Jun 10, 2022

The difference appears to be in the normalized path, rather than the actual structure:

 CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], ...

vs

CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], ...

So I think this is a test bug rather than a code bug

@andygrove
Copy link
Member

I ran into this a while back and changed my ARROW_TEST_DATA env var to remove the trailing slash but we should really fix this. I will take a look in the next few days if nobody else picks this up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants