Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parallel and Bi layer #662

Closed
wants to merge 1 commit into from
Closed

Conversation

swiesend
Copy link

@swiesend swiesend commented Mar 6, 2019

@MikeInnes I've cleaned the parallel branch and separated the peephole LSTM into another branch for another pull request.

To enable a bidirectional model Base.reverse() is defined for common Flux data structures.

To enable a bidirectional Base.reverse() is defined for common Flux datastructures
@swiesend
Copy link
Author

swiesend commented Mar 6, 2019

I messed my original Fork up and therefore created a new one.

@swiesend
Copy link
Author

swiesend commented Mar 6, 2019

The approach is still the same:

The user adds some layers and defines custom mappings and inverses for the input. After the mapping-phase comes the reduce-phase, which defaults to concatenation of the parallel layers:

mutable struct Parallel
    layers::Vector
    map::Vector{Function}
    inv::Vector{Function}
    reduce::Function
end

function (p::Parallel)(xs))
    # y = map(x) |> f |> map^-1
    apply(l) = p.inv[l](p.layers[l](p.map[l](xs)))
    
    Z = map(l -> apply(l), eachindex(p.layers))

    p.reduce(Z)
end

For the full implementation see: src/layers/parallel.jl

The unit test test/layers/parallel.jl contains some example models to showcase the usage of the Parallel layer and the convenience layer Bi to create bidirectional sequential models.

@swiesend swiesend changed the title add Parallel and Bi layer Add Parallel and Bi layer Mar 6, 2019
@Tokazama
Copy link
Contributor

Tokazama commented Mar 6, 2019

Is there a reason that Distributed.pmap can't be used here?

@swiesend
Copy link
Author

swiesend commented Mar 6, 2019

@Tokazama you mean something like this? I just was unaware of where Base.pmap went in julia-1.0.

see my parallel-pmap branch:
src/layers/parallel.jl
test/layers/parallel.jl

I'v added a distributed field to Parallel, but maybe it would be better to introduce a special type like ParallelDistributed to handle this case with multiple-dispatch:

mutable struct Parallel
    layers::Vector
    map::Vector{Function}
    inv::Vector{Function}
    reduce::Function
    distributed::Bool
end

Well technical it is possible, but I don't know if this would hinder more than help for support of Flux.gpu()? @MikeInnes @ViralBShah

Also it would add Distributed as dependency.

@swiesend
Copy link
Author

swiesend commented Mar 6, 2019

If I try a to add gpu support parallel-gpu:
src/layers/parallel.jl
test/layers/parallel.jl

with an explicit mapping function, which preallocates the intermediate Matrix:

    # explicit mapping - preallocated size
    first = apply(1)
    Z = Vector{typeof(first)}(UndefInitializer(), length(p.layers))
    for l in eachindex(p.layers)
        if l == 1
            Z[l] = first
        else
            Z[l] = apply(l)
        end
    end

then I run into the following error (without the explicit mapping I run into the same error, but I thought this would help maybe for the gpu inference stuff):

models using a `Parallel` layer: Error During Test at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:40
  Got exception outside of a @test
  ReadOnlyMemoryError()
  Stacktrace:
   [1] gemv!(::Char, ::Float32, ::Array{Float32,2}, ::CuArray{Float32,1}, ::Float32, ::CuArray{Float32,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/blas.jl:577
   [2] gemv!(::CuArray{Float32,1}, ::Char, ::Array{Float32,2}, ::CuArray{Float32,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:360
   [3] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:64 [inlined]
   [4] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:46 [inlined]
   [5] _forward at /home/sebastian/develop/julia/Flux.jl/src/tracker/lib/array.jl:370 [inlined]
   [6] #track#1 at /home/sebastian/develop/julia/Flux.jl/src/tracker/Tracker.jl:51 [inlined]
   [7] track at /home/sebastian/develop/julia/Flux.jl/src/tracker/Tracker.jl:51 [inlined]
   [8] * at /home/sebastian/develop/julia/Flux.jl/src/tracker/lib/array.jl:362 [inlined]
   [9] Dense at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:82 [inlined]
   [10] (::Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}})(::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:122
   [11] (::getfield(Flux, Symbol("#apply#104")){Parallel,CuArray{Float32,1}})(::Int64) at /home/sebastian/develop/julia/Flux.jl/src/layers/parallel.jl:67
   [12] (::Parallel)(::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/parallel.jl:73
   [13] applychain(::Tuple{Parallel,Dense{typeof(identity),TrackedArray{,Array{Float32,2}},TrackedArray{,Array{Float32,1}}}}, ::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:31
   [14] (::Chain{Tuple{Parallel,Dense{typeof(identity),TrackedArray{,Array{Float32,2}},TrackedArray{,Array{Float32,1}}}}})(::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:33
   [15] top-level scope at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:46
   [16] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1156
   [17] top-level scope at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:40
   [18] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [19] top-level scope at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:10
   [20] include at ./boot.jl:326 [inlined]
   [21] include_relative(::Module, ::String) at ./loading.jl:1038
   [22] include(::Module, ::String) at ./sysimg.jl:29
   [23] include(::String) at ./client.jl:403
   [24] top-level scope at none:0
   [25] eval(::Module, ::Any) at ./boot.jl:328
   [26] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85
   [27] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117 [inlined]
   [28] (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})() at ./task.jl:259

@Tokazama
Copy link
Contributor

Tokazama commented Mar 6, 2019

To clarify, I'm in no position to say whether this should or shouldn't have pmap support. I'm not sure what the current state of distributed computing is in Flux beyond CUDA GPU support, or if there is any strong motivation for those who manage Flux to allow/support such features. I just thought it seemed like low hanging fruit in implementing something like this

@MikeInnes
Copy link
Member

There isn't much reason to use pmap here; even if it works with AD, it's going to be serialised in the backwards pass, at least until we move to Zygote.

I'm kind of unsure about this Parallel layer in general; it doesn't seem like a very obvious abstraction to me. If I need to read the code to understand what it's doing then I may as well just write that code myself.

I see the the motivation is to have more convenient bidirectional RNNs. It'd be fine to just fold Parallel into that. But I think it'd be good to discuss what code looks like currently without Bi, how it looks after, and explore the API design space a bit before committing to one thing.

@swiesend
Copy link
Author

swiesend commented Apr 4, 2019

Sure, I am also not very convinced that this is very compatible to the non sequential layers like the Conv layer. Before adopting something like this, this should work fine and be composable with other things, too.

I had seen that the @thebhatman tried to go another way by just using Flux.flip, but I don't know how it turned out or if he went adopting this pattern.

@thebhatman
Copy link
Contributor

I modified flip by using reverse for multidimensional arrays so that it reverses only a specific dimension. I created the bidirectional layer using this.

@swiesend
Copy link
Author

swiesend commented Apr 4, 2019

@thebhatman can you link to your approach? I can't find I right now.

@Tokazama
Copy link
Contributor

Tokazama commented Apr 4, 2019

Is each element of layers suppose to have its own gradient on backprop? Because when I use weights = Tracker.data.(Flux.params(parallel_model)); it appears all elements in the vector share the same set of weights.

@swiesend
Copy link
Author

swiesend commented Apr 4, 2019

@Tokazama I think it depends on what you put into the layers. Do you duplicate with a deepcopy? But this may be better discussed as issue or on slack. Don't you think? Or is there a flaw in the approach you want to point out?

@ToucheSir
Copy link
Member

Since we now have a Parallel layer and this is a pre-Zygote PR, I think we can safely close it. Implementing a bidirectional layer on top of the current parallel shouldn't be too difficult.

@DhairyaLGandhi
Copy link
Member

Thanks all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants