Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Use valid globals reference in MWZJ and TABK #14246

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

danking
Copy link
Contributor

@danking danking commented Feb 2, 2024

CHANGELOG: Fix a bug, introduced in 0.2.114, in which Table.multi_way_zip_join and Table.aggregate_by_key could throw "NoSuchElementException: Ref with name __iruid_..." when one or more of the tables had a number of partitions substantially different from the desired number of output partitions.

Fixes #14245.

In both MultiWayZipJoin and TableAggregateByKey, we repartition the child but neglect to use the new globals Ref from the repartitioned child. As long as repartitionNoShuffle does not create a new TableStage with new globals, this is fine, but that is not, in general, true. It seems that recently, in lowered backends, when the repartition cost is deemed "high" we generate a fresh TableStage with a fresh globals ref.

CHANGELOG: Fix a bug, introduced in 0.2.114, in which `Table.multi_way_zip_join` and `Table.aggregate_by_key` could throw "NoSuchElementException: Ref with name __iruid_..." when one or more of the tables had a number of partitions substantially different from the desired number of output partitions.

Fixes hail-is#14245.

In both MultiWayZipJoin and TableAggregateByKey, we repartition the child but neglect to use the
new globals `Ref` from the repartitioned child. As long as `repartitionNoShuffle` does not create a
new TableStage with new globals, this is fine, but that is not, in general, true. It seems that
recently, in lowered backends, when the repartition cost is deemed "high" we generate a fresh
TableStage with a fresh globals ref.

repartitioned.mapPartition(Some(child.typ.key)) { partition =>
Let(
FastSeq("global" -> repartitioned.globals),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sneaky

@danking danking merged commit d4679eb into hail-is:main Feb 2, 2024
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants