Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

map() and pmap() interfaces differ in return type and shape #4606

Closed
c42f opened this issue Oct 22, 2013 · 9 comments
Closed

map() and pmap() interfaces differ in return type and shape #4606

c42f opened this issue Oct 22, 2013 · 9 comments
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request parallelism Parallel or distributed computation

Comments

@c42f
Copy link
Member

c42f commented Oct 22, 2013

As discussed on the julia-users mailing list thread "parallelized comprehensions"
https://groups.google.com/forum/#!topic/julia-users/DTyTqib3iOk
map preserves the shape of the input and does type deduction, whereas pmap does not:

julia> map(x -> x.^2, 2*ones(2,2))
2x2 Array{Float64,2}:
 4.0  4.0
 4.0  4.0

julia> pmap(x -> x.^2, 2*ones(2,2))
4-element Array{Any,1}:
 4.0
 4.0
 4.0
 4.0

From a user's perspective it would be good if both functions had the same interface.

@juliangehring
Copy link
Contributor

I second that it would be nice for pmap to preserve the output type.

@ivarne ivarne added the help wanted Indicates that a maintainer wants help on an issue or pull request label Jan 24, 2015
@jakebolewski jakebolewski added the parallelism Parallel or distributed computation label Jun 2, 2015
@ngiann
Copy link

ngiann commented Nov 4, 2015

This still seems to be an issue.
I am using Julia Version 0.4.0.

@rajathshashidhara
Copy link

method(map) reveals that map has the following definition
map(f, A::AbstractArray{T<:Any,N<:Any}) at abstractarray.jl:1322

actually, map function has been specialized to various datatypes.
This also includes a generic definition for any container which supports an iterator.
map(f, iters...) at abstractarray.jl:1162

Whereas pmap has only one implementation for any iterable object.
pmap(f, lsts...) at multi.jl:1512

To fix this issue, a specialized implementation of pmap for AbstractArray has to added much like the map specialization.

I can patch this up. I need to know what is the right place to add such an implementation. May be in abstractarray.jl ?

@jebej
Copy link
Contributor

jebej commented Nov 1, 2016

Was there progress on this? This is still an issue in 0.5.

@jiahao
Copy link
Member

jiahao commented Nov 1, 2016

Cross-reference #14265 #14635

@amitmurthy
Copy link
Contributor

We also need to leverage the different map implementations. I am wondering if the right way to do this is to implement 2-passes using regular map

pmap (and asyncmap) thus become something equivalent to this:

julia> d=map(x->remotecall(identity, default_worker_pool(), x), ones(2,2))
2×2 Array{Future,2}:
 Future(2,1,6,#NULL)  Future(4,1,8,#NULL)
 Future(5,1,7,#NULL)  Future(3,1,9,#NULL)

julia> d2=map(fetch, d)
2×2 Array{Float64,2}:
 1.0  1.0
 1.0  1.0

We will need a much lighter equivalent of Future for this.
The same for asyncmap too except that the execution will be in different tasks rather than processes.

@kkmann
Copy link

kkmann commented Nov 25, 2016

+1, I just bumped into this as well.

@amitmurthy
Copy link
Contributor

amitmurthy commented Nov 26, 2016

If

a) the collection length is not very large and
b) the number of workers do not change during the pmap run and
c) you do not need support for batching

this should return the same type and shape as map.

function my_pmap(f, c...)
    s=Base.Semaphore(nworkers())
    acq = Base.acquire
    rel = Base.release
    f_rcf = remote(f)   # remotecall_fetch on a free worker

    async_f = (x...) -> (acq(s); tf = ()->(acq(s); v=f_rcf(x...); rel(s); v); t=schedule(Task(tf)); yield(); rel(s); t)
    map(x->wait(x), map(async_f, c...))
end

Works for regular arrays, tuples and sparse arrays. Retains shape for bit arrays but not type.

Wrap remote(f) with a retry for retry-on-error support.

The above code starts a new task for every element in the collection and then collects the return value from each task at the end which makes it very inefficient in terms of memory usage.

@amitmurthy
Copy link
Contributor

Closed by #19447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Indicates that a maintainer wants help on an issue or pull request parallelism Parallel or distributed computation
Projects
None yet
Development

No branches or pull requests

10 participants