Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-29842: Make Executor.map less eager so it handles large/unbounded… #707

Closed
wants to merge 6 commits into from

Conversation

MojoVampire
Copy link
Contributor

@MojoVampire MojoVampire commented Mar 18, 2017

@the-knights-who-say-ni
Copy link

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

  1. If you don't have an account on b.p.o, please create one
  2. Make sure your GitHub username is listed in "Your Details" at b.p.o
  3. If you have not already done so, please sign the PSF contributor agreement. The "bugs.python.org username " requested by the form is the "Login name" field under "Your Details".
  4. If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
  5. Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

@mention-bot
Copy link

@MojoVampire, thanks for your PR! By analyzing the history of the files in this pull request, we identified @birkenfeld, @brianquinlan and @ezio-melotti to be potential reviewers.

@MojoVampire
Copy link
Contributor Author

I've already submitted a contributor agreement pre-GitHub migration. I just updated my b.p.o. user profile (josh.r) to link to my GitHub account name. Is that sufficient, or do I need to submit a new contributor agreement based on my GitHub e-mail address?

@MojoVampire
Copy link
Contributor Author

Hmm... Is the failure of continuous-integration/travis-ci/pr something real? Clicking Details just tells me it can't find a python/cpython repository at all...

@serhiy-storchaka serhiy-storchaka added the type-feature A feature request or enhancement label Mar 19, 2017
Doc/library/concurrent.futures.rst Show resolved Hide resolved
Lib/concurrent/futures/_base.py Show resolved Hide resolved
Lib/concurrent/futures/_base.py Show resolved Hide resolved
Lib/concurrent/futures/_base.py Show resolved Hide resolved
Lib/concurrent/futures/_base.py Show resolved Hide resolved
@pkch
Copy link

pkch commented May 17, 2017

You can also take a look at my implementation that I uploaded to https://github.com/pkch/executors. It does something more like what I described in the issue tracker, the main benefit being that it's not blocking.

@methane methane requested a review from pitrou July 25, 2018 21:33
Copy link
Member

@methane methane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM at code level.

Misc/NEWS Outdated Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Lib/concurrent/futures/_base.py Show resolved Hide resolved
Lib/concurrent/futures/process.py Show resolved Hide resolved
@leezu
Copy link

leezu commented Nov 13, 2018

@MojoVampire could you share your plans about this PR? Do you plan to drive it forward?

@MojoVampire
Copy link
Contributor Author

I have made the requested changes; please review again.

I did not add a Misc/NEWS entry since the file no longer exists (it's autogenerated from commit messages now, correct?).

@bedevere-bot
Copy link

Thanks for making the requested changes!

@methane: please review the changes made to this pull request.

@tirkarthi
Copy link
Member

I did not add a Misc/NEWS entry since the file no longer exists (it's autogenerated from commit messages now, correct?).

NEWS entries can be generated using blurb or blurb-it

Please see : https://devguide.python.org/committing/?highlight=news#what-s-new-and-news-entries

@MojoVampire
Copy link
Contributor Author

MojoVampire commented May 6, 2019

I have made the requested changes; please review again.

Actually made the Misc/NEWS entry properly. Sorry for confusion; I haven't made a PR since the news Misc/NEWS regime began and didn't know about the blurb tool. Thanks for the assist @tirkarthi

@bedevere-bot
Copy link

Thanks for making the requested changes!

@methane: please review the changes made to this pull request.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments below. A significant issue is that this changes the behaviour of shutdown(wait=True) to not wait for completion of all pending futures. I don't think that's an acceptable change.

Lib/concurrent/futures/_base.py Outdated Show resolved Hide resolved

By default, a reasonable number of tasks are
queued beyond the number of workers, an explicit *prefetch* count may be
provided to specify how many extra tasks should be queued.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using "chunks" here would be more precise than "tasks".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation for chunksize uses the phrasing "this method chops iterables into a number of chunks which it submits to the pool as separate tasks", and since not all executors even use chunks (ThreadPoolExecutor ignores the argument), I figured I'd stick with "tasks". It does kind of leave out a term to describe a single work item; the docs uses chunks and tasks as synonyms, with no term for a single work item.

self.assertCountEqual(finished, range(10))
# No guarantees on how many tasks dispatched,
# but at least one should have been dispatched
self.assertGreater(len(finished), 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change breaks compatibility. The doc for shutdown says:

If wait is True then this method will not return until all the pending futures are done executing and the resources associated with the executor have been freed.

So all futures should have executed, instead of being cancelled.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the time I wrote it, it didn't conflict with the documentation precisely; the original documentation said that map was "Equivalent to map(func, *iterables) except func is executed asynchronously and several calls to func may be made concurrently.", but doesn't guarantee that any actual futures exist (it's implemented in terms of submit and futures, but doesn't actually require such a design).

That said, it looks like you updated the documentation to add "the iterables are collected immediately rather than lazily;", which, if considered a guarantee, rather than a warning, would make this a breaking change even ignoring the "cancel vs. wait" issue.

Do you have any suggestions? If strict adherence to your newly (as of late 2017) documented behavior is needed, I suppose I could change the default behavior from "reasonable prefetch" to "exhaustive prefetch", so when prefetch isn't passed, every task is submitted, but it would be kind of annoying to lose the "good by default" behavior of limited prefetching.

The reason I cancelled rather than waiting on the result is that I was trying to follow the normal use pattern for map; since the results are yielded lazily, if the iterator goes away or is closed explicitly (or you explicitly shut down the executor), you're done; having the outstanding futures complete when you're not able to see the results means you're either:

  1. Expecting the tasks to complete without running out the Executor.map (which doesn't work with Py3's map at all, so the analogy to map should allow it; if you don't run it out, you have no guarantees anything was done)
  2. Not planning to use any further results (in which case running any submitted but unscheduled futures means doing work no one can see the results of)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I think there are two problems to discuss:

  1. What happens when shutdown(wait=True) is called. Currently it waits for all outstanding tasks. I don't think we can change that (the explicit wait flag exists for a reason).
  2. Whether map() can be silently switched to a lazy mode of operation. There's a (perhaps minor) problem with that. Currently, if one of iterables raises an error, map() propagates the exception. With your proposal, the exception may be raised later inside the result iterator.

I think 2) might easily be worked around by introducing a separate method (lazy_map?).

It seems it would be good to discuss those questions on the mailing-list.

Copy link
Contributor Author

@MojoVampire MojoVampire May 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the problem with using the "lazy_map" name is that it feels like recreating the same annoying distinctions between map and imap from the Py2 era, and it would actually have Executor.map (which is supposed to match map, which lazily consumes the input(s)) be less similar to map than Executor.lazy_map.

If it's necessary to gain acceptance, I could change the default behavior to use prefetch=sys.maxsize - self._max_workers. It would match the pre-existing behavior for just about anything that conceivably worked before (modulo the tiny differences in memory usage of deque vs. list for storing the futures) since:

  1. All tasks would be submitted fully up front, so shutdown(wait=True) would in fact wait on them (and no further calls to submit would occur in the generator, so submitting wouldn't occur post-shutdown, which would raise a RuntimeError and cause the cancellation
  2. It wouldn't be lazy for anything by default (it would either work eagerly or crash, in the same manner it currently behaves)

If you passed a reasonable prefetch, you wouldn't have these behaviors (and we should document that interaction), but at least existing code would continue to work identically.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion. I think discussing those alternatives on the ML, to gather more opinions and arguments, would be useful.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

…lds no reference to result at the moment it yields

Reduce line lengths to PEP8 limits
@flavianh
Copy link

@MojoVampire I think you need to comment your PR with I have made the requested changes; please review again for it to pass

@github-actions
Copy link

This PR is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 19, 2022
@rhettinger rhettinger requested a review from applio May 10, 2022 21:56
@erlend-aasland
Copy link
Contributor

@MojoVampire, are you going to follow up this PR?

@erlend-aasland erlend-aasland added the pending The issue will be closed if no feedback is provided label Jun 29, 2022
@github-actions github-actions bot removed the stale Stale PR or inactive for long period of time. label Jun 30, 2022
@MojoVampire
Copy link
Contributor Author

@erlend-aasland: I'd like to apply this, but I never got any idea of what would constitute an acceptable final result. Executor.map is, frankly, useless for many of its intended purposes right now; in an effort to improve performance on huge inputs, you end up prefetching the entire input and pre-scheduling all the tasks before you can process any of them.

@erlend-aasland
Copy link
Contributor

cc. @kumaraditya303, is this something you'd be interested in reviewing?

@kumaraditya303
Copy link
Contributor

is this something you'd be interested in reviewing?

I'll take a look soon, but it seems there are two PRs for the same thing and this one has conflicts so maybe we should continue on #18566

@kumaraditya303
Copy link
Contributor

Closing this in favor of #18566, thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting changes pending The issue will be closed if no feedback is provided type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.