bpo-29842: Make Executor.map less eager so it handles large/unbounded… #707

MojoVampire · 2017-03-18T02:13:27Z

… input iterables appropriately

https://bugs.python.org/issue29842

… input iterables appropriately

the-knights-who-say-ni · 2017-03-18T02:13:28Z

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept your contribution by verifying you have signed the PSF contributor agreement (CLA).

Unfortunately we couldn't find an account corresponding to your GitHub username on bugs.python.org (b.p.o) to verify you have signed the CLA. This is necessary for legal reasons before we can look at your contribution. Please follow these steps to help rectify the issue:

If you don't have an account on b.p.o, please create one
Make sure your GitHub username is listed in "Your Details" at b.p.o
If you have not already done so, please sign the PSF contributor agreement. The "bugs.python.org username " requested by the form is the "Login name" field under "Your Details".
If you just signed the CLA, please wait at least one US business day and then check "Your Details" on bugs.python.org to see if your account has been marked as having signed the CLA (the delay is due to a person having to manually check your signed CLA)
Reply here saying you have completed the above steps

Thanks again to your contribution and we look forward to looking at it!

mention-bot · 2017-03-18T02:13:29Z

@MojoVampire, thanks for your PR! By analyzing the history of the files in this pull request, we identified @birkenfeld, @brianquinlan and @ezio-melotti to be potential reviewers.

MojoVampire · 2017-03-18T02:16:16Z

I've already submitted a contributor agreement pre-GitHub migration. I just updated my b.p.o. user profile (josh.r) to link to my GitHub account name. Is that sufficient, or do I need to submit a new contributor agreement based on my GitHub e-mail address?

MojoVampire · 2017-03-18T06:08:45Z

Hmm... Is the failure of continuous-integration/travis-ci/pr something real? Clicking Details just tells me it can't find a python/cpython repository at all...

Doc/library/concurrent.futures.rst

Lib/concurrent/futures/_base.py

pkch · 2017-05-17T02:44:51Z

You can also take a look at my implementation that I uploaded to https://github.com/pkch/executors. It does something more like what I described in the issue tracker, the main benefit being that it's not blocking.

Doc/library/concurrent.futures.rst

methane

LGTM at code level.

Misc/NEWS

bedevere-bot · 2018-07-25T21:34:11Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Lib/concurrent/futures/_base.py

Lib/concurrent/futures/process.py

leezu · 2018-11-13T16:29:37Z

@MojoVampire could you share your plans about this PR? Do you plan to drive it forward?

MojoVampire · 2019-05-06T16:23:51Z

I have made the requested changes; please review again.

I did not add a Misc/NEWS entry since the file no longer exists (it's autogenerated from commit messages now, correct?).

bedevere-bot · 2019-05-06T16:23:54Z

Thanks for making the requested changes!

@methane: please review the changes made to this pull request.

tirkarthi · 2019-05-06T16:41:23Z

I did not add a Misc/NEWS entry since the file no longer exists (it's autogenerated from commit messages now, correct?).

NEWS entries can be generated using blurb or blurb-it

Please see : https://devguide.python.org/committing/?highlight=news#what-s-new-and-news-entries

MojoVampire · 2019-05-06T16:51:53Z

I have made the requested changes; please review again.

Actually made the Misc/NEWS entry properly. Sorry for confusion; I haven't made a PR since the news Misc/NEWS regime began and didn't know about the blurb tool. Thanks for the assist @tirkarthi

bedevere-bot · 2019-05-06T16:51:56Z

Thanks for making the requested changes!

@methane: please review the changes made to this pull request.

pitrou

Some comments below. A significant issue is that this changes the behaviour of shutdown(wait=True) to not wait for completion of all pending futures. I don't think that's an acceptable change.

Lib/concurrent/futures/_base.py

pitrou · 2019-05-06T16:59:57Z

Doc/library/concurrent.futures.rst

+
+       By default, a reasonable number of tasks are
+       queued beyond the number of workers, an explicit *prefetch* count may be
+       provided to specify how many extra tasks should be queued.


Using "chunks" here would be more precise than "tasks".

The documentation for chunksize uses the phrasing "this method chops iterables into a number of chunks which it submits to the pool as separate tasks", and since not all executors even use chunks (ThreadPoolExecutor ignores the argument), I figured I'd stick with "tasks". It does kind of leave out a term to describe a single work item; the docs uses chunks and tasks as synonyms, with no term for a single work item.

pitrou · 2019-05-06T17:01:55Z

Lib/test/test_concurrent_futures.py

-        self.assertCountEqual(finished, range(10))
+        # No guarantees on how many tasks dispatched,
+        # but at least one should have been dispatched
+        self.assertGreater(len(finished), 0)


I think this change breaks compatibility. The doc for shutdown says:

If wait is True then this method will not return until all the pending futures are done executing and the resources associated with the executor have been freed.

So all futures should have executed, instead of being cancelled.

At the time I wrote it, it didn't conflict with the documentation precisely; the original documentation said that map was "Equivalent to map(func, *iterables) except func is executed asynchronously and several calls to func may be made concurrently.", but doesn't guarantee that any actual futures exist (it's implemented in terms of submit and futures, but doesn't actually require such a design).

That said, it looks like you updated the documentation to add "the iterables are collected immediately rather than lazily;", which, if considered a guarantee, rather than a warning, would make this a breaking change even ignoring the "cancel vs. wait" issue.

Do you have any suggestions? If strict adherence to your newly (as of late 2017) documented behavior is needed, I suppose I could change the default behavior from "reasonable prefetch" to "exhaustive prefetch", so when prefetch isn't passed, every task is submitted, but it would be kind of annoying to lose the "good by default" behavior of limited prefetching.

The reason I cancelled rather than waiting on the result is that I was trying to follow the normal use pattern for map; since the results are yielded lazily, if the iterator goes away or is closed explicitly (or you explicitly shut down the executor), you're done; having the outstanding futures complete when you're not able to see the results means you're either:

Expecting the tasks to complete without running out the Executor.map (which doesn't work with Py3's map at all, so the analogy to map should allow it; if you don't run it out, you have no guarantees anything was done)

Not planning to use any further results (in which case running any submitted but unscheduled futures means doing work no one can see the results of)

Actually, I think there are two problems to discuss:

What happens when shutdown(wait=True) is called. Currently it waits for all outstanding tasks. I don't think we can change that (the explicit wait flag exists for a reason).

Whether map() can be silently switched to a lazy mode of operation. There's a (perhaps minor) problem with that. Currently, if one of iterables raises an error, map() propagates the exception. With your proposal, the exception may be raised later inside the result iterator.

I think 2) might easily be worked around by introducing a separate method (lazy_map?).

It seems it would be good to discuss those questions on the mailing-list.

Yeah, the problem with using the "lazy_map" name is that it feels like recreating the same annoying distinctions between map and imap from the Py2 era, and it would actually have Executor.map (which is supposed to match map, which lazily consumes the input(s)) be less similar to map than Executor.lazy_map.

If it's necessary to gain acceptance, I could change the default behavior to use prefetch=sys.maxsize - self._max_workers. It would match the pre-existing behavior for just about anything that conceivably worked before (modulo the tiny differences in memory usage of deque vs. list for storing the futures) since:

All tasks would be submitted fully up front, so shutdown(wait=True) would in fact wait on them (and no further calls to submit would occur in the generator, so submitting wouldn't occur post-shutdown, which would raise a RuntimeError and cause the cancellation

It wouldn't be lazy for anything by default (it would either work eagerly or crash, in the same manner it currently behaves)

If you passed a reasonable prefetch, you wouldn't have these behaviors (and we should document that interaction), but at least existing code would continue to work identically.

I don't have a strong opinion. I think discussing those alternatives on the ML, to gather more opinions and arguments, would be useful.

bedevere-bot · 2019-05-06T17:03:50Z

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

…lds no reference to result at the moment it yields Reduce line lengths to PEP8 limits

flavianh · 2019-07-20T10:22:25Z

@MojoVampire I think you need to comment your PR with I have made the requested changes; please review again for it to pass

github-actions · 2022-02-19T00:08:17Z

This PR is stale because it has been open for 30 days with no activity.

erlend-aasland · 2022-06-29T19:15:10Z

@MojoVampire, are you going to follow up this PR?

MojoVampire · 2022-07-12T22:43:17Z

@erlend-aasland: I'd like to apply this, but I never got any idea of what would constitute an acceptable final result. Executor.map is, frankly, useless for many of its intended purposes right now; in an effort to improve performance on huge inputs, you end up prefetching the entire input and pre-scheduling all the tasks before you can process any of them.

erlend-aasland · 2022-07-14T22:14:20Z

cc. @kumaraditya303, is this something you'd be interested in reviewing?

kumaraditya303 · 2022-07-15T12:58:47Z

is this something you'd be interested in reviewing?

I'll take a look soon, but it seems there are two PRs for the same thing and this one has conflicts so maybe we should continue on #18566

kumaraditya303 · 2022-07-18T15:29:44Z

Closing this in favor of #18566, thanks for the PR!

bpo-29842: Make Executor.map less eager so it handles large/unbounded…

fbdb56c

… input iterables appropriately

the-knights-who-say-ni added the CLA not signed label Mar 18, 2017

Mariatta removed the CLA not signed label Mar 18, 2017

the-knights-who-say-ni added the CLA signed label Mar 18, 2017

serhiy-storchaka added the type-feature A feature request or enhancement label Mar 19, 2017

pkch suggested changes May 15, 2017

View reviewed changes

pkch reviewed May 17, 2017

View reviewed changes

Lib/concurrent/futures/_base.py Show resolved Hide resolved

Mariatta added needs rebase and removed needs rebase labels Oct 9, 2017

brettcannon added the awaiting core review label Feb 21, 2018

bearpelican mentioned this pull request Mar 29, 2018

Adding LazyThreadPoolExecutor class fastai/fastai#262

Merged

MojoVampire commented Jul 25, 2018

View reviewed changes

Doc/library/concurrent.futures.rst Outdated Show resolved Hide resolved

methane requested a review from pitrou July 25, 2018 21:33

methane requested changes Jul 25, 2018

View reviewed changes

Misc/NEWS Outdated Show resolved Hide resolved

bedevere-bot removed the awaiting core review label Jul 25, 2018

bedevere-bot added the awaiting changes label Jul 25, 2018

methane requested changes Jul 25, 2018

View reviewed changes

Lib/concurrent/futures/_base.py Show resolved Hide resolved

Lib/concurrent/futures/process.py Show resolved Hide resolved

MojoVampire added 2 commits May 6, 2019 12:18

Merge branch 'master' into fix-issue-29842

6281379

Add prefetch info to docstrings

8394e34

bedevere-bot added awaiting change review and removed awaiting changes labels May 6, 2019

Add Misc/NEWS entry

634a3de

pitrou requested changes May 6, 2019

View reviewed changes

bedevere-bot removed the awaiting change review label May 6, 2019

bedevere-bot added the awaiting changes label May 6, 2019

MojoVampire added 2 commits May 6, 2019 15:18

Remove trailing whitespace in docs

0152831

Fix behavior to follow test_free_reference requirements (generator ho…

3c38fab

…lds no reference to result at the moment it yields Reduce line lengths to PEP8 limits

csabella requested review from pitrou and methane January 14, 2020 12:21

graingert mentioned this pull request Feb 20, 2020

bpo-29842: Make Executor.map less eager so it handles large/unbounded… #18566

Closed

github-actions bot added the stale Stale PR or inactive for long period of time. label Feb 19, 2022

rhettinger requested a review from applio May 10, 2022 21:56

MojoVampire mannequin mentioned this pull request Apr 10, 2022

Make Executor.map work with infinite/large inputs correctly #74028

Closed

erlend-aasland added the pending The issue will be closed if no feedback is provided label Jun 29, 2022

github-actions bot removed the stale Stale PR or inactive for long period of time. label Jun 30, 2022

ezio-melotti removed the CLA signed label Jul 13, 2022

kumaraditya303 closed this Jul 18, 2022

ebonnal mentioned this pull request Dec 4, 2024

gh-74028: concurrent.futures.Executor.map: introduce buffersize param for lazier behavior #125663

Merged

AA-Turner removed the pending The issue will be closed if no feedback is provided label Apr 6, 2025

Uh oh!

bpo-29842: Make Executor.map less eager so it handles large/unbounded… #707

bpo-29842: Make Executor.map less eager so it handles large/unbounded… #707

Uh oh!

Conversation

MojoVampire commented Mar 18, 2017 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the-knights-who-say-ni commented Mar 18, 2017

Uh oh!

mention-bot commented Mar 18, 2017

Uh oh!

MojoVampire commented Mar 18, 2017

Uh oh!

MojoVampire commented Mar 18, 2017

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pkch commented May 17, 2017

Uh oh!

Uh oh!

methane left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bedevere-bot commented Jul 25, 2018

Uh oh!

Uh oh!

Uh oh!

leezu commented Nov 13, 2018

Uh oh!

MojoVampire commented May 6, 2019

Uh oh!

bedevere-bot commented May 6, 2019

Uh oh!

tirkarthi commented May 6, 2019

Uh oh!

MojoVampire commented May 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bedevere-bot commented May 6, 2019

Uh oh!

pitrou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pitrou May 6, 2019

Choose a reason for hiding this comment

Uh oh!

MojoVampire May 6, 2019

Choose a reason for hiding this comment

Uh oh!

pitrou May 6, 2019

Choose a reason for hiding this comment

Uh oh!

MojoVampire May 6, 2019

Choose a reason for hiding this comment

Uh oh!

pitrou May 6, 2019

Choose a reason for hiding this comment

Uh oh!

MojoVampire May 6, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pitrou May 7, 2019

Choose a reason for hiding this comment

Uh oh!

bedevere-bot commented May 6, 2019

Uh oh!

flavianh commented Jul 20, 2019

Uh oh!

github-actions bot commented Feb 19, 2022

Uh oh!

erlend-aasland commented Jun 29, 2022

Uh oh!

MojoVampire commented Mar 18, 2017 •

edited by bedevere-bot

Loading

MojoVampire commented May 6, 2019 •

edited

Loading

MojoVampire May 6, 2019 •

edited

Loading