Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic find_distributed_partition (non-set) #529

Open
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

matthiasdiener
Copy link
Collaborator

@matthiasdiener matthiasdiener commented Jul 25, 2024

abandon (almost) all sets ye who enter here

Closes #465.
Closes #498.

Things to test:

  • is the orderedset change to DirectPredecessorsGetter necessary? Edit: In my tests, it didn't seem to have made a difference, but from the code structure, it appears that DirectPredecessorsGetter needs to be deterministic too; changed from orderedsets to unique tuples in 076a76e

Please squash

@matthiasdiener matthiasdiener self-assigned this Jul 25, 2024
@matthiasdiener matthiasdiener force-pushed the deterministic-fdp-nonset branch from f08a309 to 3364a4f Compare July 25, 2024 21:22
@matthiasdiener matthiasdiener force-pushed the deterministic-fdp-nonset branch from c134b87 to 817b255 Compare July 25, 2024 21:45
@matthiasdiener matthiasdiener force-pushed the deterministic-fdp-nonset branch from 26492ce to f3f3c7d Compare July 25, 2024 21:56
@matthiasdiener
Copy link
Collaborator Author

This is ready for a first look @inducer.

@matthiasdiener matthiasdiener changed the title Deterministic find_distributed_partition v2 Deterministic find_distributed_partition (non-set) Jul 26, 2024
Copy link
Owner

@inducer inducer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look, just a few minor nits. Looks good to go from my perspective.

@@ -826,7 +772,7 @@ def find_distributed_partition(
raise comm_batches_or_exc

comm_batches = cast(
Sequence[AbstractSet[CommunicationOpIdentifier]],
Sequence[list[CommunicationOpIdentifier]],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Sequence[list[CommunicationOpIdentifier]],
Sequence[Collection[CommunicationOpIdentifier]],

maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the cast completely in 168ef53

We only consider the predecessors of a nodes in a data-flow sense.
"""
def _get_preds_from_shape(self, shape: ShapeType) -> frozenset[Array]:
return frozenset({dim for dim in shape if isinstance(dim, Array)})
def _get_preds_from_shape(self, shape: ShapeType) -> abc_Set[Array]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def _get_preds_from_shape(self, shape: ShapeType) -> abc_Set[Array]:
def _get_preds_from_shape(self, shape: ShapeType) -> AbstractSet[Array]:

(from typing)? I'm not sure mypy understands the collections.abc types well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may have been fixed with 076a76e

Comment on lines 858 to 859
output_arrays = tuple(unique(outputs._data.values()))
mso_arrays = tuple(unique(materialized_arrays + sent_arrays + output_arrays))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these might be fine left as lists (as returned from) unique, typed as Sequence to have mypy flag attempted mutation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand - could you please clarify? (unique doesn't return a list)

@matthiasdiener matthiasdiener force-pushed the deterministic-fdp-nonset branch from 3c87726 to 076a76e Compare August 13, 2024 19:11
@matthiasdiener matthiasdiener force-pushed the deterministic-fdp-nonset branch from 191422d to 168ef53 Compare August 14, 2024 16:53
@matthiasdiener matthiasdiener marked this pull request as ready for review August 14, 2024 19:43
@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Aug 14, 2024

This is ready for another look @inducer. It seems to work fine with the main version of pytato, but when merging it with our production branch, the execution is not deterministic.

@matthiasdiener
Copy link
Collaborator Author

but when merging it with our production branch, the execution is not deterministic.

Nevermind, this seems to have been a merge error.

@matthiasdiener matthiasdiener requested a review from inducer August 14, 2024 20:24
@matthiasdiener
Copy link
Collaborator Author

matthiasdiener commented Sep 28, 2024

This is ready for review @inducer. The current version shows the same performance as the baseline, and I have not seen any determinism-related issues when using this PR.

@matthiasdiener matthiasdiener force-pushed the deterministic-fdp-nonset branch from cc1045a to e8b5806 Compare October 14, 2024 16:48
@matthiasdiener
Copy link
Collaborator Author

As far as I can see, the mypy errors are unrelated to this PR.

pytato/distributed/partition.py Outdated Show resolved Hide resolved
@@ -327,37 +327,37 @@ class DirectPredecessorsGetter(Mapper[frozenset[ArrayOrNames], []]):

We only consider the predecessors of a nodes in a data-flow sense.
"""
def _get_preds_from_shape(self, shape: ShapeType) -> frozenset[ArrayOrNames]:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T = TypeVar("T")

class FakeOrderedFrozenSet(immutabledict[T, None]):
    pass

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to get this to work with mypy, but what do you think of fbcbcef?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That works for me. I was under the impression that type aliases could not be generic, but apparently I'm wrong? I tried for a bit to back up my assumption, but I wasn't able to. (I also wasn't able to back up the opposite.) Definitive info would be most welcome! 🙂

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants