Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow unknown tasks to be stolen #5572

Merged
merged 4 commits into from
Dec 10, 2021
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Dec 8, 2021

This effectively reverts #5392 which enforced that unknown tasks couldn't be stolen. This was flagged as a regression since a test asserting this behaviour was introduced in very early stages of the stealing development #278. Back then we didn't have any intermediate runtime information about the task but this is no longer true and this behaviour can be revised.

For testability, I added WorkStealing.start and WorkStealing.stop since I believe for proper timing based tests we should have the possibility to disable the background PC. Otherwise it is very hard to assert specifically what's happening. If preferred, I can factor this out into a dedicated PR

@fjetter fjetter force-pushed the allow_unknown_tasks_stolen branch from 0ddda3e to 03e2111 Compare December 8, 2021 18:55
Copy link
Collaborator

@gjoseph92 gjoseph92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Having start and stop is nice!


steal = s.extensions["stealing"]
await steal.stop()
lock = await Semaphore(max_leases=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why a Semaphore instead of a Lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had issues with the lock and have had some in the past. It somehow didn't lock. I may have forgotten to await something, idk. fwiw, I think we should throw lock away and replace it with the semaphore implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lock = lambda: Semaphore(max_leases=1) seems reasonable to me. Nice to have fewer codepaths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do something like the following but that's a matter of taste :)

class Lock(Semaphore):
    def __init__(*args, **kwargs):
        super().__init__(*args, **kwargs, max_leases=1)

The semaphore also implements lease timeouts such that it handles dying workers gracefully by releasing the lock.

IIRC, the reason why I didn't do this from the beginning is that there are subtly differences in API causing the Lock API to break. For instance, you need to await a semaphore object before using it while you don't need to do this on a lock, etc. Also a lot of the tests are specific about the extension. However, we might probable just be able to delete all Lock specific tests since the semaphore should cover it anyhow.

@fjetter
Copy link
Member Author

fjetter commented Dec 9, 2021

There appears to be a genuine test failure with test_allow_tasks_stolen_before_first_completes Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Task stealing regression in 2021-11-0+ (preventing task load balancing)
2 participants