Add Parallel and Bi layer #662

swiesend · 2019-03-06T13:49:45Z

@MikeInnes I've cleaned the parallel branch and separated the peephole LSTM into another branch for another pull request.

To enable a bidirectional model Base.reverse() is defined for common Flux data structures.

To enable a bidirectional Base.reverse() is defined for common Flux datastructures

swiesend · 2019-03-06T13:50:34Z

I messed my original Fork up and therefore created a new one.

swiesend · 2019-03-06T13:54:06Z

The approach is still the same:

The user adds some layers and defines custom mappings and inverses for the input. After the mapping-phase comes the reduce-phase, which defaults to concatenation of the parallel layers:

mutable struct Parallel
    layers::Vector
    map::Vector{Function}
    inv::Vector{Function}
    reduce::Function
end

function (p::Parallel)(xs))
    # y = map(x) |> f |> map^-1
    apply(l) = p.inv[l](p.layers[l](p.map[l](xs)))
    
    Z = map(l -> apply(l), eachindex(p.layers))

    p.reduce(Z)
end

For the full implementation see: src/layers/parallel.jl

The unit test test/layers/parallel.jl contains some example models to showcase the usage of the Parallel layer and the convenience layer Bi to create bidirectional sequential models.

Tokazama · 2019-03-06T15:33:32Z

Is there a reason that Distributed.pmap can't be used here?

swiesend · 2019-03-06T19:54:20Z

@Tokazama you mean something like this? I just was unaware of where Base.pmap went in julia-1.0.

see my parallel-pmap branch:
src/layers/parallel.jl
test/layers/parallel.jl

I'v added a distributed field to Parallel, but maybe it would be better to introduce a special type like ParallelDistributed to handle this case with multiple-dispatch:

mutable struct Parallel
    layers::Vector
    map::Vector{Function}
    inv::Vector{Function}
    reduce::Function
    distributed::Bool
end

Well technical it is possible, but I don't know if this would hinder more than help for support of Flux.gpu()? @MikeInnes @ViralBShah

Also it would add Distributed as dependency.

swiesend · 2019-03-06T20:56:46Z

If I try a to add gpu support parallel-gpu:
src/layers/parallel.jl
test/layers/parallel.jl

with an explicit mapping function, which preallocates the intermediate Matrix:

    # explicit mapping - preallocated size
    first = apply(1)
    Z = Vector{typeof(first)}(UndefInitializer(), length(p.layers))
    for l in eachindex(p.layers)
        if l == 1
            Z[l] = first
        else
            Z[l] = apply(l)
        end
    end

then I run into the following error (without the explicit mapping I run into the same error, but I thought this would help maybe for the gpu inference stuff):

models using a `Parallel` layer: Error During Test at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:40
  Got exception outside of a @test
  ReadOnlyMemoryError()
  Stacktrace:
   [1] gemv!(::Char, ::Float32, ::Array{Float32,2}, ::CuArray{Float32,1}, ::Float32, ::CuArray{Float32,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/blas.jl:577
   [2] gemv!(::CuArray{Float32,1}, ::Char, ::Array{Float32,2}, ::CuArray{Float32,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:360
   [3] mul! at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:64 [inlined]
   [4] * at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/matmul.jl:46 [inlined]
   [5] _forward at /home/sebastian/develop/julia/Flux.jl/src/tracker/lib/array.jl:370 [inlined]
   [6] #track#1 at /home/sebastian/develop/julia/Flux.jl/src/tracker/Tracker.jl:51 [inlined]
   [7] track at /home/sebastian/develop/julia/Flux.jl/src/tracker/Tracker.jl:51 [inlined]
   [8] * at /home/sebastian/develop/julia/Flux.jl/src/tracker/lib/array.jl:362 [inlined]
   [9] Dense at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:82 [inlined]
   [10] (::Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}})(::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:122
   [11] (::getfield(Flux, Symbol("#apply#104")){Parallel,CuArray{Float32,1}})(::Int64) at /home/sebastian/develop/julia/Flux.jl/src/layers/parallel.jl:67
   [12] (::Parallel)(::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/parallel.jl:73
   [13] applychain(::Tuple{Parallel,Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}}}, ::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:31
   [14] (::Chain{Tuple{Parallel,Dense{typeof(identity),TrackedArray{…,Array{Float32,2}},TrackedArray{…,Array{Float32,1}}}}})(::CuArray{Float32,1}) at /home/sebastian/develop/julia/Flux.jl/src/layers/basic.jl:33
   [15] top-level scope at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:46
   [16] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1156
   [17] top-level scope at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:40
   [18] top-level scope at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/Test/src/Test.jl:1083
   [19] top-level scope at /home/sebastian/develop/julia/Flux.jl/test/layers/parallel.jl:10
   [20] include at ./boot.jl:326 [inlined]
   [21] include_relative(::Module, ::String) at ./loading.jl:1038
   [22] include(::Module, ::String) at ./sysimg.jl:29
   [23] include(::String) at ./client.jl:403
   [24] top-level scope at none:0
   [25] eval(::Module, ::Any) at ./boot.jl:328
   [26] eval_user_input(::Any, ::REPL.REPLBackend) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:85
   [27] macro expansion at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:117 [inlined]
   [28] (::getfield(REPL, Symbol("##26#27")){REPL.REPLBackend})() at ./task.jl:259

Tokazama · 2019-03-06T21:47:16Z

To clarify, I'm in no position to say whether this should or shouldn't have pmap support. I'm not sure what the current state of distributed computing is in Flux beyond CUDA GPU support, or if there is any strong motivation for those who manage Flux to allow/support such features. I just thought it seemed like low hanging fruit in implementing something like this

MikeInnes · 2019-04-04T13:16:07Z

There isn't much reason to use pmap here; even if it works with AD, it's going to be serialised in the backwards pass, at least until we move to Zygote.

I'm kind of unsure about this Parallel layer in general; it doesn't seem like a very obvious abstraction to me. If I need to read the code to understand what it's doing then I may as well just write that code myself.

I see the the motivation is to have more convenient bidirectional RNNs. It'd be fine to just fold Parallel into that. But I think it'd be good to discuss what code looks like currently without Bi, how it looks after, and explore the API design space a bit before committing to one thing.

swiesend · 2019-04-04T14:01:13Z

Sure, I am also not very convinced that this is very compatible to the non sequential layers like the Conv layer. Before adopting something like this, this should work fine and be composable with other things, too.

I had seen that the @thebhatman tried to go another way by just using Flux.flip, but I don't know how it turned out or if he went adopting this pattern.

thebhatman · 2019-04-04T14:04:59Z

I modified flip by using reverse for multidimensional arrays so that it reverses only a specific dimension. I created the bidirectional layer using this.

swiesend · 2019-04-04T14:40:31Z

@thebhatman can you link to your approach? I can't find I right now.

Tokazama · 2019-04-04T15:11:10Z

Is each element of layers suppose to have its own gradient on backprop? Because when I use weights = Tracker.data.(Flux.params(parallel_model)); it appears all elements in the vector share the same set of weights.

swiesend · 2019-04-04T15:24:16Z

@Tokazama I think it depends on what you put into the layers. Do you duplicate with a deepcopy? But this may be better discussed as issue or on slack. Don't you think? Or is there a flaw in the approach you want to point out?

ToucheSir · 2021-06-14T19:12:37Z

Since we now have a Parallel layer and this is a pre-Zygote PR, I think we can safely close it. Implementing a bidirectional layer on top of the current parallel shouldn't be too difficult.

DhairyaLGandhi · 2021-06-14T19:16:27Z

Thanks all

add Parallel and Bi layer

258140f

To enable a bidirectional Base.reverse() is defined for common Flux datastructures

swiesend changed the title ~~add Parallel and Bi layer~~ Add Parallel and Bi layer Mar 6, 2019

jpsamaroo mentioned this pull request May 12, 2019

parallel layer #773

Closed

rana mentioned this pull request Dec 10, 2019

Zygote errors building bidirectional RNN #962

Closed

DhairyaLGandhi closed this Jun 14, 2021

CarloLucibello mentioned this pull request May 13, 2024

Create a flag to use Enzyme as the AD in training/etc. #2443

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Parallel and Bi layer #662

Add Parallel and Bi layer #662

swiesend commented Mar 6, 2019 •

edited

Loading

swiesend commented Mar 6, 2019

swiesend commented Mar 6, 2019

Tokazama commented Mar 6, 2019

swiesend commented Mar 6, 2019 •

edited

Loading

swiesend commented Mar 6, 2019 •

edited

Loading

Tokazama commented Mar 6, 2019

MikeInnes commented Apr 4, 2019

swiesend commented Apr 4, 2019

thebhatman commented Apr 4, 2019

swiesend commented Apr 4, 2019

Tokazama commented Apr 4, 2019

swiesend commented Apr 4, 2019 •

edited

Loading

ToucheSir commented Jun 14, 2021

DhairyaLGandhi commented Jun 14, 2021

Add Parallel and Bi layer #662

Add Parallel and Bi layer #662

Conversation

swiesend commented Mar 6, 2019 • edited Loading

swiesend commented Mar 6, 2019

swiesend commented Mar 6, 2019

Tokazama commented Mar 6, 2019

swiesend commented Mar 6, 2019 • edited Loading

swiesend commented Mar 6, 2019 • edited Loading

Tokazama commented Mar 6, 2019

MikeInnes commented Apr 4, 2019

swiesend commented Apr 4, 2019

thebhatman commented Apr 4, 2019

swiesend commented Apr 4, 2019

Tokazama commented Apr 4, 2019

swiesend commented Apr 4, 2019 • edited Loading

ToucheSir commented Jun 14, 2021

DhairyaLGandhi commented Jun 14, 2021

swiesend commented Mar 6, 2019 •

edited

Loading

swiesend commented Mar 6, 2019 •

edited

Loading

swiesend commented Mar 6, 2019 •

edited

Loading

swiesend commented Apr 4, 2019 •

edited

Loading