Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start uncythonization #6104

Merged
merged 21 commits into from
Apr 14, 2022
Merged

Start uncythonization #6104

merged 21 commits into from
Apr 14, 2022

Conversation

martindurant
Copy link
Member

@martindurant martindurant commented Apr 11, 2022

Notes:

  • most underscored attributes with property attributes have been replaced by simple non-underscored attributes, since they were non-uniformly accessed using both modes; a few need actions on set, so those have been maintained. This change may even improve performance slightly for fewer name lookups.

  • attributes of the Scheduler and underscored methods have not been changed

  • the class hierarchy has not been reverted yet, and I am not convinced we need to

  • we have lost some type information, which was mostly in the form of cython types; mypy complains a lot (but how did it ever pass??). Existing types are inconsistent in a number of places, since __init__s set a None value and this is immediately overwritten at the site of instance creation, e.g., .prefix in new_task().

  • Closes Cythonized Builds  community#228 , Extract SchedulerState from Scheduler #5839 , we had a community issue about this?

  • Tests added / passed

  • Passes pre-commit run --all-files

Copy link
Member

@mrocklin mrocklin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What remains to be done after this?

distributed/scheduler.py Show resolved Hide resolved
Comment on lines -1143 to -1145
@property
def duration(self) -> double:
return self._duration
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Losing these properties makes me very happy 😄

@github-actions
Copy link
Contributor

github-actions bot commented Apr 11, 2022

Unit Test Results

       16 files  ±  0         16 suites  ±0   7h 21m 27s ⏱️ - 17m 10s
  2 738 tests +  8    2 655 ✔️ +    8       81 💤 ±  0  2 ±0 
21 789 runs  +64  20 753 ✔️ +105  1 034 💤  - 41  2 ±0 

For more details on these failures, see this check.

Results for commit 850f0df. ± Comparison against base commit bd3f47e.

♻️ This comment has been updated with latest results.

@martindurant
Copy link
Member Author

@mrocklin , added update to the description. Currently all tests in test_scheduler.py pass bar one. Of course, it would be nice to push this through quickly, else it'll cause major merge conflicts.

@martindurant
Copy link
Member Author

(now all tests in test_scheduler pass; but there are more tests in general which I haven't even tried yet, and of course appeasing mypy)

@mrocklin
Copy link
Member

mrocklin commented Apr 12, 2022 via email

@jakirkham
Copy link
Member

Thanks for working on this Martin!

@jakirkham
Copy link
Member

Also would be good to revert PR ( #5831 )

@martindurant
Copy link
Member Author

Also would be good to revert PR ( #5831 )

bahh yeah, I hope there's no too much of that

@jakirkham
Copy link
Member

Shouldn't be, but please let me know if you run into issues.

Would be tempted to just try git revert 94ebd5711c31a709aa1bb42ba3b6d8dfbd3c63ad.

@jakirkham
Copy link
Member

@charlesbluca, could you please look at the Conda package changes here to make sure we haven't missed anything?

.github/workflows/conda.yml Outdated Show resolved Hide resolved
.github/workflows/conda.yml Outdated Show resolved Hide resolved
.github/workflows/conda.yml Outdated Show resolved Hide resolved
.github/workflows/conda.yml Outdated Show resolved Hide resolved
continuous_integration/recipes/distributed/meta.yaml Outdated Show resolved Hide resolved
@martindurant
Copy link
Member Author

Some green! :)

@martindurant
Copy link
Member Author

Would anyone here like to help me mypy this out? See the last bullet of the description.

@martindurant
Copy link
Member Author

OK, just types left. Who votes for liberal "ignore" to move forward versus those for sprinkling types around?

@jakirkham
Copy link
Member

Personally am ok with that approach 👍

@mrocklin
Copy link
Member

OK, just types left. Who votes for liberal "ignore" to move forward versus those for sprinkling types around?

Can you say more about what you mean about having "types left"? Is this mypy complaining to you, or are there still cython types lying around?

Also, just to clarify on scope, full decythonization (which I don't think you're on the hook for, to be clear) would also involve rejoining things like SchedulerState, decide_worker, and so on. There were a variety of code structure changes that were done to accomdate Cython which we would probably want to undo. (although again, you're not on the hook for this solo)

@martindurant
Copy link
Member Author

I meant mypy - the reason that the lint CI run shows red. I have fixed some locally, we'll see how far I get tonight. I'm not sure why they weren't picked up before.

to clarify on scope

Yes, agreed, all the changes I want to make in this PR are ready except mypy types. Actually, I thought it could do with some thinking about whether we want to remerge the classes or not - to be had in a separate discussion.

@martindurant
Copy link
Member Author

Down to 16 missing type annotations and no errors

@mrocklin
Copy link
Member

I'm fine with liberal use of #type: ignore if we think that it's hiding nitpicky rather than structural issues. We can think about unleashing @crusaderky on this part of the codebase after things get structurally cleaned up.

@martindurant martindurant marked this pull request as ready for review April 13, 2022 13:20
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Show resolved Hide resolved
distributed/scheduler.py Show resolved Hide resolved
distributed/scheduler.py Show resolved Hide resolved
distributed/scheduler.py Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
distributed/scheduler.py Outdated Show resolved Hide resolved
@martindurant
Copy link
Member Author

General comments

  • on class-level annotations: all of these that were deleted were cython types, and not sufficient for mypy to be able to validate agains. Hence, they are gone, but I suggest a separate follow-up PR could be used to add them, if someone has the spare capacity.
  • on att : str = None # type: ignore constructs; these are bad and should not be written this way. They are not str | None types in most cases, since instantiating code immediately sets the value after __init__ returns. This suggests we need a rewrite, but I'd rather not do that here.

@jakirkham
Copy link
Member

Right got the sense that the reason Coiled requested dropping Cythonization is that there is other work Coiled wants to pursue in the Scheduler. Assuming I've understood this correctly (which I may not, please feel free to correct me), it seems like we should decide on a handoff point. It seems like Martin has done a great job at dropping Cython already. So think we are very close to (if not already at) this handoff point.

martindurant and others added 4 commits April 13, 2022 14:57
Co-authored-by: crusaderky <crusaderky@gmail.com>
Co-authored-by: crusaderky <crusaderky@gmail.com>
Co-authored-by: crusaderky <crusaderky@gmail.com>
@crusaderky
Copy link
Collaborator

  • on class-level annotations: all of these that were deleted were cython types, and not sufficient for mypy to be able to validate agains. Hence, they are gone, but I suggest a separate follow-up PR could be used to add them, if someone has the spare capacity.

OK, I'll reinstate them in a later PR.

  • on att : str = None # type: ignore constructs; these are bad and should not be written this way. They are not str | None types in most cases, since instantiating code immediately sets the value after __init__ returns. This suggests we need a rewrite, but I'd rather not do that here.

I agree they should be dealt with at a later stage.

@mrocklin
Copy link
Member

Right got the sense that the reason Coiled requested dropping Cythonization is that there is other work Coiled wants to pursue in the Scheduler

Not quite. We've identified that the main bottleneck to use today isn't performance, but rather stability. We're not really doing anything new these days. We're mostly trying to keep the lights on (fight deadlocks, get CI working again, figure out how to keep memory under control, and so on). There was significant complexity accrued in the Cythonization effort without really providing a ton of value for a while, and so we're arguing to drop it. That's ok, not every experiment needs to succeed.

Mostly I want to avoid the impression that there is some secret work happening somewhere for which this is required. We're in full maintenance mode right now. Help on that maintenance is welcome, especially from folks who know a lot about what happened in this experiment. We can take all of of the maintenance though if other folks are too busy.

distributed/scheduler.py Outdated Show resolved Hide resolved
@jakirkham
Copy link
Member

Mostly I want to avoid the impression that there is some secret work happening somewhere for which this is required.

No worries. Did not have that impression. Florian had mentioned in one of our calls doing similar cleanup with the Scheduler as is being done with the Worker. So this is what I was thinking of here. Don't really know what that would entail so was sparse in details in my previous comment.

@martindurant
Copy link
Member Author

All good, @crusaderky ?

Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge as soon as CI is green

@crusaderky crusaderky merged commit 41ecbca into dask:main Apr 14, 2022
@crusaderky
Copy link
Collaborator

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants