Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

alamb · 2022-06-10T18:07:43Z

Describe the bug
The verify-release-candidate script failed for me locally while verifying release candidate 9.0.0 RC1

To Reproduce
Run the release verification script with 9.0.0 RC1

./dev/release/verify-release-candidate.sh 9.0.0 1

It eventually fails with the following message:

failures:

---- sql::explain_analyze::csv_explain stdout ----
thread 'sql::explain_analyze::csv_explain' panicked at 'assertion failed: `(left == right)`
  left: `[["logical_plan", "Projection: #aggregate_test_100.c1\n  Filter: #aggregate_test_100.c2 > Int64(10)\n    TableScan: aggregate_test_100 projection=Some([c1, c2]), partial_filters=[#aggregate_test_100.c2 > Int64(10)]"], ["physical_plan", "ProjectionExec: expr=[c1@0 as c1]\n  CoalesceBatchesExec: target_batch_size=4096\n    FilterExec: CAST(c2@1 AS Int64) > 10\n      RepartitionExec: partitioning=RoundRobinBatch(NUM_CORES)\n        CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c2]\n"]]`,
 right: `[["logical_plan", "Projection: #aggregate_test_100.c1\n  Filter: #aggregate_test_100.c2 > Int64(10)\n    TableScan: aggregate_test_100 projection=Some([c1, c2]), partial_filters=[#aggregate_test_100.c2 > Int64(10)]"], ["physical_plan", "ProjectionExec: expr=[c1@0 as c1]\n  CoalesceBatchesExec: target_batch_size=4096\n    FilterExec: CAST(c2@1 AS Int64) > 10\n      RepartitionExec: partitioning=RoundRobinBatch(NUM_CORES)\n        CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c2]\n"]]`', datafusion/core/tests/sql/explain_analyze.rs:766:5

---- sql::explain_analyze::test_physical_plan_display_indent stdout ----
thread 'sql::explain_analyze::test_physical_plan_display_indent' panicked at 'assertion failed: `(left == right)`
  left: `["GlobalLimitExec: skip=None, fetch=10", "  SortExec: [the_min@2 DESC]", "    CoalescePartitionsExec", "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]", "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "          CoalesceBatchesExec: target_batch_size=4096", "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "                CoalesceBatchesExec: target_batch_size=4096", "                  FilterExec: c12@1 < CAST(10 AS Float64)", "                    RepartitionExec: partitioning=RoundRobinBatch(9000)", "                      CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]"]`,
 right: `["GlobalLimitExec: skip=None, fetch=10", "  SortExec: [the_min@2 DESC]", "    CoalescePartitionsExec", "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]", "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "          CoalesceBatchesExec: target_batch_size=4096", "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]", "                CoalesceBatchesExec: target_batch_size=4096", "                  FilterExec: c12@1 < CAST(10 AS Float64)", "                    RepartitionExec: partitioning=RoundRobinBatch(9000)", "                      CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]"]`: expected:
[
    "GlobalLimitExec: skip=None, fetch=10",
    "  SortExec: [the_min@2 DESC]",
    "    CoalescePartitionsExec",
    "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]",
    "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "          CoalesceBatchesExec: target_batch_size=4096",
    "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "                CoalesceBatchesExec: target_batch_size=4096",
    "                  FilterExec: c12@1 < CAST(10 AS Float64)",
    "                    RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                      CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]",
]
actual:

[
    "GlobalLimitExec: skip=None, fetch=10",
    "  SortExec: [the_min@2 DESC]",
    "    CoalescePartitionsExec",
    "      ProjectionExec: expr=[c1@0 as c1, MAX(aggregate_test_100.c12)@1 as MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)@2 as the_min]",
    "        AggregateExec: mode=FinalPartitioned, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "          CoalesceBatchesExec: target_batch_size=4096",
    "            RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "              AggregateExec: mode=Partial, gby=[c1@0 as c1], aggr=[MAX(aggregate_test_100.c12), MIN(aggregate_test_100.c12)]",
    "                CoalesceBatchesExec: target_batch_size=4096",
    "                  FilterExec: c12@1 < CAST(10 AS Float64)",
    "                    RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                      CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1, c12]",
]
', datafusion/core/tests/sql/explain_analyze.rs:680:5

---- sql::explain_analyze::test_physical_plan_display_indent_multi_children stdout ----
thread 'sql::explain_analyze::test_physical_plan_display_indent_multi_children' panicked at 'assertion failed: `(left == right)`
  left: `["ProjectionExec: expr=[c1@0 as c1]", "  CoalesceBatchesExec: target_batch_size=4096", "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "          ProjectionExec: expr=[c1@0 as c1]", "            ProjectionExec: expr=[c1@0 as c1]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)", "          ProjectionExec: expr=[c2@0 as c2]", "            ProjectionExec: expr=[c1@0 as c2]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]"]`,
 right: `["ProjectionExec: expr=[c1@0 as c1]", "  CoalesceBatchesExec: target_batch_size=4096", "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)", "          ProjectionExec: expr=[c1@0 as c1]", "            ProjectionExec: expr=[c1@0 as c1]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]", "      CoalesceBatchesExec: target_batch_size=4096", "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)", "          ProjectionExec: expr=[c2@0 as c2]", "            ProjectionExec: expr=[c1@0 as c2]", "              RepartitionExec: partitioning=RoundRobinBatch(9000)", "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]"]`: expected:
[
    "ProjectionExec: expr=[c1@0 as c1]",
    "  CoalesceBatchesExec: target_batch_size=4096",
    "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c1@0 as c1]",
    "            ProjectionExec: expr=[c1@0 as c1]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c2@0 as c2]",
    "            ProjectionExec: expr=[c1@0 as c2]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
]
actual:

[
    "ProjectionExec: expr=[c1@0 as c1]",
    "  CoalesceBatchesExec: target_batch_size=4096",
    "    HashJoinExec: mode=Partitioned, join_type=Inner, on=[(Column { name: \"c1\", index: 0 }, Column { name: \"c2\", index: 0 })]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c1\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c1@0 as c1]",
    "            ProjectionExec: expr=[c1@0 as c1]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
    "      CoalesceBatchesExec: target_batch_size=4096",
    "        RepartitionExec: partitioning=Hash([Column { name: \"c2\", index: 0 }], 9000)",
    "          ProjectionExec: expr=[c2@0 as c2]",
    "            ProjectionExec: expr=[c1@0 as c2]",
    "              RepartitionExec: partitioning=RoundRobinBatch(9000)",
    "                CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], has_header=true, limit=None, projection=[c1]",
]
', datafusion/core/tests/sql/explain_analyze.rs:731:5


failures:
    sql::explain_analyze::csv_explain
    sql::explain_analyze::test_physical_plan_display_indent
    sql::explain_analyze::test_physical_plan_display_indent_multi_children

test result: FAILED. 362 passed; 3 failed; 2 ignored; 0 measured; 0 filtered out; finished in 3.11s

error: test failed, to rerun pass '-p datafusion --test sql_integration'
+ cleanup
+ '[' no = yes ']'
+ echo 'Failed to verify release candidate. See /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/arrow-9.0.0.XXXXX.KsfEL7Og for details.'
Failed to verify release candidate. See /var/folders/s3/h5hgj43j0bv83shtmz_t_w400000gn/T/arrow-9.0.0.XXXXX.KsfEL7Og for details.

Expected behavior
The verification should pass

Additional context
Mailing list thread: https://lists.apache.org/thread/7mg9kwlfyrxm5fx96w8q0c436by93567

The text was updated successfully, but these errors were encountered:

alamb · 2022-06-10T18:08:58Z

The difference appears to be in the normalized path, rather than the actual structure:

 CsvExec: files=[ARROW_TEST_DATA/csv/aggregate_test_100.csv], ...

vs

CsvExec: files=[/privateARROW_TEST_DATA/csv/aggregate_test_100.csv], ...

So I think this is a test bug rather than a code bug

andygrove · 2022-06-10T23:06:57Z

I ran into this a while back and changed my ARROW_TEST_DATA env var to remove the trailing slash but we should really fix this. I will take a look in the next few days if nobody else picks this up.

alamb added the bug Something isn't working label Jun 10, 2022

This was referenced Jul 15, 2022

csv_explain fails on RC verifier #2916

Closed

Fix release verification script by not overriding ARROW_TEST_DATA or PARQUET_TEST_DATA #2917

Merged

alamb closed this as completed in #2917 Jul 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

alamb commented Jun 10, 2022

alamb commented Jun 10, 2022

andygrove commented Jun 10, 2022

Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

Test (path normalization) failures while verifying release candidate 9.0.0 RC1 #2719

Comments

alamb commented Jun 10, 2022

alamb commented Jun 10, 2022

andygrove commented Jun 10, 2022