Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type stability and array layout with pmap? #14265

Closed
kshyatt opened this issue Dec 4, 2015 · 3 comments
Closed

Type stability and array layout with pmap? #14265

kshyatt opened this issue Dec 4, 2015 · 3 comments
Labels
parallelism Parallel or distributed computation types and dispatch Types, subtyping and method dispatch

Comments

@kshyatt
Copy link
Contributor

kshyatt commented Dec 4, 2015

julia> a = rand(2,2)
2x2 Array{Float64,2}:
 0.929993  0.610538
 0.860706  0.343106

julia> pmap(abs,a)
4-element Array{Any,1}:
 0.929993
 0.860706
 0.610538
 0.343106

julia> a = rand(2,2)
2x2 Array{Float64,2}:
 0.707792  0.529175
 0.140618  0.0539725

julia> abs(a)
2x2 Array{Float64,2}:
 0.707792  0.529175
 0.140618  0.0539725

I understand that it might be hard for pmap to keep track of the array layout, but the promotion to Any is kind of annoying.

@kshyatt kshyatt added parallelism Parallel or distributed computation types and dispatch Types, subtyping and method dispatch labels Dec 4, 2015
@gajomi
Copy link
Contributor

gajomi commented Jan 6, 2016

I found this interesting and so took a look at it. I wanted to compare/contrast the implementation of map and pmap to help decide what should be done. Maybe everything I will write was already obvious to everyone else... if so my apologies.

pmap seems to make a Dict{Int,Any} where the keys are the process ids and the values are the results of the map. It builds these up asynchronously in a smart way and in the final stages synchronously dumps the results into an array. Since the values are annotated as Any the array building step assumes the same, leaving us with the annoying result @kshyatt describes.

In contrast to pmap, map has a bunch of methods. The one that dispatches on arrays in your example seems to work by computing the first value of the result first = f(A[1]) and then calling dest = similar(A, typeof(first)) to set up the destination container. As far as I can tell pmap should be able to do something like this with little added complexity, which would help to keep the array shape. Where things might get a bit tricky is in finishing this process, where map_to! basically does all the work. There are some obvious things and so non-obvious things about why it was written in the way it was, so rather than guessing about the non-obvious parts I'll just include the function body

function map_to!{T,F}(f::F, offs, dest::AbstractArray{T}, A::AbstractArray)
    # map to dest array, checking the type of each result. if a result does not
    # match, widen the result type and re-dispatch.
    @inbounds for i = offs:length(A)
        el = f(A[i])
        S = typeof(el)
        if S === T || S <: T
            dest[i] = el::T
        else
            R = typejoin(T, S)
            new = similar(dest, R)
            copy!(new,1, dest,1, i-1)
            new[i] = el
            return map_to!(f, i+1, new, A)
        end
    end
    return dest
end

I am assuming that the continuous type widening approach was adopted for performance reasons against some "typical" cases? Anyway, since the function call part of mapping is done asynchronously in pmap the above wouldn't directly translate. However, if one substitutes el = f(A[i]) with a fetching of an already computed result in dictionary pmap builds up this type widening could be done as a final synchronous step in a revised pmap.

So the above is about how things are/could be implemented. However I am not sure to what extent the current implementations of these things reflects the intended semantics. Is there anything written about how map and map-like function are supposed to behave? If would be interested in making fixes of the sort brought up above if the intended behavior could be clarified for me.

@gajomi
Copy link
Contributor

gajomi commented Jan 11, 2016

I think I was being a bit gun shy in asking for so much clarification in the previous comment. The referenced PR rectifies the problem in the original issue.

However it would be kind of nice to see something about intended semantics for map in general (for example mapping of string enforces string returns types, which is maybe part of a larger pattern?).

@amitmurthy
Copy link
Contributor

Closed by #19447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parallelism Parallel or distributed computation types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

No branches or pull requests

3 participants