-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Simplifying and generalising pmap #14843
Comments
how easy would it be to have |
Changing pmap to use a shared queue, as discussed here - #14736 (comment), should allow us to start using started workers right away. |
To expand on the above:
User code would be something like:
|
thanks amit. question: in your proposed interface, can a worker which comes online during the execution of a particular |
Yes. |
@samoconnor, the proposed changes look good. Look forward to the PRs. |
@amitmurthy, In the interests of avoiding one-big-PR, I plan to start by submitting a PR adds just adds amap(f, c...; async_max=100) → collection Implementation: amap(f, c...; kv...) = collect(imap(f, c...; kv...)) imap(f, c...; async_max=100) → iterator Apply f to each element of c using at most 100 asynchronous tasks. For multiple collection arguments, apply f elementwise. Implementation using StreamMapItr: imap(f, c...; async_max=nothing) = StreamMapItr(f, c...; async_max=async_max) amap!(function, collection; async_max=100) amap!(function, destination, collection...; async_max=100) Implementation using AsyncMapItr: function amap!(f, c...; async_max=nothing)
destination = c[1]
if length(c) > 1
c = c[2:end]
end
for task in AsyncMapItr(f, destination, c..., async_max=async_max) end
return destination
end |
one benefit of a single large PR is that we could see all the proposed changes to the API at once. for example, i'm curious whether you plan to include an iterator-returning version of |
👍 I definitely like the separation into composable chunks. |
Please don't open a PR that tries to change too many things at once. Incremental changes are much smoother to get reviewed and merged. The end vision can be seen "at once" in a package or separate branch, but PR's should be broken into smaller gradual chunks whenever possible. |
@bjarthur: If |
Some thoughts. Apologies for the delayed response. My understanding is that the suggested Putting up some thoughts for discussion:
This will also help optimize This version of |
@amitmurthy, like the idea of having I think the first step is still to add the underlying (I guess if |
@tanmaykm , @ViralBShah can you post test cases that cover the issues that @amitmurthy describes above? |
in regards to the request for clearer names in #15058 (comment), it's not obvious to me, from the name alone, that the proposed how about the following refactoring of the interface:
so in the end, we have this:
|
@bjarthur: I've taken you suggestion together with @StefanKarpinski's request for a clearer names and revised #15058 (comment) to have just one function: asyncmap(f, c...; ntasks=0) = StreamMapIterator(f, c...; ntasks=ntasks) |
Reading the code for
The usual pattern in Julia seems to be that non-blocking implementations are hidden and public interfaces are blocking.
Perhaps the preferred approach should be to use In Base there are only 4 calls to A GitHub-wide search finds 359 matches for |
@amitmurthy: see PR #15073, WorkerPool and remote() |
I would rather not export I agree with your reasoning w.r.t. |
OK, done: samoconnor@e263ad6 |
Maybe if |
Could |
Yes, Amit suggested that in a previous comment. asyncpmap(f, v) = flatten( pmap(chunk -> asyncmap(f, chunk), eachchunk(v)) ) This assumes a reasonably balanced workload, which seems fine for now. |
@samoconnor The observation that @amitmurthy was referring to in #14843 (comment) was abhijithch/RecSys.jl#22 (comment) A simple test would be: julia> # $ julia -p 8
julia> f = (x)->nothing;
julia> @time @parallel for i in 1:10^4 f(i) end;
0.117559 seconds (180.03 k allocations: 7.235 MB)
julia> @time pmap(f, 1:10^4);
1.196159 seconds (2.87 M allocations: 77.740 MB, 1.11% gc time) |
I intend to open a third PR (in addition to #15058 and #15073) to implement The Question: Should the PR...
(Note: At this point I think Retry.jl is reasonably well field tested. It is used in many places in the |
@amitmurthy I have submitted #15409 with |
Amit's notes pasted from #15073:
|
Status
Next steps New asyncmap(f, c...) = collect(StreamMapIterator(f, c...))
function pmap(f, c...; err_retry=nothing, err_stop=nothing, pids=nothing)
if err_retry != nothing
depwarn("`err_retry` is deprecated, use `pmap(retry(f), c...) or `asyncmap(remote(retry(f)), c...)`.", :pmap)
if err_retry == true
return asyncmap(retry(remote(f)), c...)
end
end
if err_stop != nothing
depwarn("`err_stop` is deprecated, use `pmap(@catch(f), c...).", :pmap)
if err_stop == true
return asyncmap(remote(@catch(f)), c...)
end
end
if pids != nothing
depwarn("`pids` is deprecated. It no longer has any effect.", :pmap)
end
return asyncmap(remote(f), c...)
end Perhaps there should be a kw arg remote(f, pool) = (args...)->remotecall_fetch(f, pool, args...)
pmap(f, c...; pool=default_worker_pool()) = asyncmap(remote(f, pool), c...) |
JuliaLang#14843 Add default small delay to retry. 50ms delay on first retry. 250ms delay on 2nd retry. This at least gives other tasks a chance to run. If retry n is set higher, the delay increases to 1250ms, 6250ms ... max_delay caps the dealy at 10s by default. This should handle network-timescale issues without creating undue load.
JuliaLang#14843 Add default small delay to retry. 50ms delay on first retry. 250ms delay on 2nd retry. This at least gives other tasks a chance to run. If retry n is set higher, the delay increases to 1250ms, 6250ms ... max_delay caps the dealy at 10s by default. This should handle network-timescale issues without creating undue load. tweak test/error.jl per https://travis-ci.org/JuliaLang/julia/jobs/114700424#L1597
Refactored function pgenerate(f, c)
batches = batchsplit(c, min_batch_count = nworkers() * 3)
return flatten(asyncgenerate(remote(b -> asyncmap(f, b)), batches))
end
pmap(f, c) = collect(pgenerate(f, c)) |
JuliaLang#14843 Add default small delay to retry. 50ms delay on first retry. 250ms delay on 2nd retry. This at least gives other tasks a chance to run. If retry n is set higher, the delay increases to 1250ms, 6250ms ... max_delay caps the dealy at 10s by default. This should handle network-timescale issues without creating undue load. tweak test/error.jl per https://travis-ci.org/JuliaLang/julia/jobs/114700424#L1597
#15409 introduces 3 un-exported generate functions that might be generally useful:
Should these be exported for general use in a seperate PR? They are quite useful for chaining things together: # For one...
result_url = upload_results(crunch_numbers(download_data(data_url)))
# For many in parallel...
result_urls = asyncmap(upload_results,
pgenerate(crunch_numbers,
asyncgenerate(download_data, url_list))) |
I like where this is headed, but... any time we use naming to express combinations of behaviors, I feel like we're missing something. In this case, we have map vs. generate, sync vs. async, and local vs. distributed (which is somewhat unfortunately called parallel rather than distributed). It seems like an ideal case for something compositional rather than having some subset of the eight possible names. |
FWIW #15409 tries to make things compositional under the covers. My preference would be to make I'd also remove Then I'd make That would leave just: ... maybe too extreme? |
JuliaLang#14843 Add default small delay to retry. 50ms delay on first retry. 250ms delay on 2nd retry. This at least gives other tasks a chance to run. If retry n is set higher, the delay increases to 1250ms, 6250ms ... max_delay caps the dealy at 10s by default. This should handle network-timescale issues without creating undue load. tweak test/error.jl per https://travis-ci.org/JuliaLang/julia/jobs/114700424#L1597
JuliaLang#14843 Add default small delay to retry. 50ms delay on first retry. 250ms delay on 2nd retry. This at least gives other tasks a chance to run. If retry n is set higher, the delay increases to 1250ms, 6250ms ... max_delay caps the dealy at 10s by default. This should handle network-timescale issues without creating undue load. tweak test/error.jl per https://travis-ci.org/JuliaLang/julia/jobs/114700424#L1597
JuliaLang#14843 Add default small delay to retry. 50ms delay on first retry. 250ms delay on 2nd retry. This at least gives other tasks a chance to run. If retry n is set higher, the delay increases to 1250ms, 6250ms ... max_delay caps the dealy at 10s by default. This should handle network-timescale issues without creating undue load. tweak test/error.jl per https://travis-ci.org/JuliaLang/julia/jobs/114700424#L1597
…g#15409 and JuliaLang#14843) Rename *MapIterator to *Generator
Closed by #15409 |
[ Update: https://github.com//pull/15409 implements most of this and is now merged ]
This issue covers some of the same ground as #12943 (stale), #14515 and #14736. My intention is to present an overall
pmap
refactoring plan here for comment and submit new PRs if it is seen as useful.pmap
FeaturesThe current
pmap
implementation has the following features:@sync
/@async
to run mapped function in parallel and collect results.remotecall_wait
calls.These are all useful features. However, it seems to me that
pmap
currently tries to do too much and that many of these features would be useful in contexts other thanpmap
:remotecall
( e.g. HTTP,readstring(::Cmd)
or AWS Lambda).remotecall
mechanism.pmap
they should also be useful with ordinarymap
.Proposed Separation of Features
amap
asyncmap
#15058asyncmap
adds@sync
/@async
to regularmap
. (feature "1.")WorkerPool
andremote
#15073A
WorkerPool
keeps track of which workers are busy.take!(default_worker_pool())
yeilds a worker that is not already busy doingremotecall_wait
, waiting if needed (feature "2".). If there is areliable way to identify "faulty" workers (feature "3.") then
worker
will not return fault workers. (Partial implementation in comments of: #14736)remote
takes a function and returns a lambda that executes the function on a remote worker.Using
asyncmap
andremote
together...@catch
#15409@catch
takes a function can returns a lambda that catches any exceptions thrown by the function and returns the exception object.pmap(f, v; err_stop=false)
can be replaced withpmap(@catch(f), v)
(feature "5.")retry
#15409retry
takes a function and returns a lambda that retries the function if an error occurs (feature "4.").pmap(f, v; err_retry=true)
can be replaced withpmap(retry(f), v)
, or for more granular error handlingpmap(retry(f, e -> isa(e, NetworkTimeoutError)), v)
(
@repeat
and@retry
are implemented in https://github.com/samoconnor/Retry.jl)pmap
Defaultspmap
currently has some unexpected (at least to me) defaults.map
.The text was updated successfully, but these errors were encountered: