Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Parallel layer #1462

Merged
merged 15 commits into from
Jan 14, 2021
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
* Removed kwarg only constructors for [`convolutional layers`](https://github.com/FluxML/Flux.jl/pull/1379).
* Add [sparse initialization](https://github.com/FluxML/Flux.jl/pull/1454) as described in [Deep learning via Hessian-free optimization](https://dl.acm.org/doi/abs/10.5555/3104322.3104416).
* Moved GPU CI to use buildkite instead of GitLab
* New [`Parallel` layer](https://github.com/FluxML/Flux.jl/pull/1462) adds inception module-like building blocks.
* Other new features and bug fixes (see GitHub releases page)

## v0.11.2
Expand Down
133 changes: 133 additions & 0 deletions docs/src/models/advanced.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,3 +70,136 @@ by simply deleting it from `ps`:
ps = params(m)
delete!(ps, m[2].b)
```

## Custom multiple input or output layer

Sometimes a model needs to receive several separate inputs at once or produce several separate outputs at once. In other words, there multiple paths within this high-level layer, each processing a different input or producing a different output. A simple example of this in machine learning literature is the [inception module](https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf).

Naively, we could have a struct that stores the weights of along each path and implement the joining/splitting in the forward pass function. But that would mean a new struct any time the operations along each path changes. Instead, this guide will show you how to construct a high-level layer (like [`Chain`](@ref)) that is made of multiple sub-layers for each path.

### Multiple inputs: a custom `Join` layer

Our custom `Join` layer will accept multiple inputs at once, pass each input through a separate path, then combine the results together. Note that this layer can already be constructed using [`Parallel`](@ref), but we will first walk through how do this manually.

We start by defining a new struct, `Join`, that stores the different paths and a combine operation as its fields.
```julia
using Flux
using CUDA

# custom join layer
struct Join{T, F}
combine::F
paths::T
end

# allow Join(op, m1, m2, ...) as a constructor
Join(combine, paths...) = Join(combine, paths)
```
Notice that we parameterized the type of the `paths` field. This is necessary for fast Julia code; in general, `T` might be a `Tuple` or `Vector`, but we don't need to pay attention to what it specifically is. The same goes for the `combine` field.

The next step is to use [`Flux.@functor`](@ref) to make our struct behave like a Flux layer. This is important so that calling `params` on a `Join` returns the underlying weight arrays on each path.
```julia
Flux.@functor Join
```

Finally, we define the forward pass. For `Join`, this means applying each `path` in `paths` to each input array, then using `combine` to merge the results.
```julia
(m::Join)(xs::Tuple) = m.combine(map((f, x) -> f(x), m.paths, xs))
(m::Join)(xs...) = m(xs)
```

Lastly, we can test our new layer. Thanks to the proper abstractions in Julia, our layer works on GPU arrays out of the box!
```julia
model = Chain(
Join(vcat,
Chain(
Dense(1, 5),
Dense(5, 1)
),
Dense(1, 2),
Dense(1, 1),
),
Dense(4, 1)
) |> gpu

xs = map(gpu, (rand(1), rand(1), rand(1)))

model(xs)
# returns a single float vector with one value
```

#### Using `Parallel`

Flux already provides [`Parallel`](@ref) that can offer the same functionality. In this case, `Join` is going to just be syntactic sugar for `Parallel`.
```julia
Join(combine, paths) = Parallel(combine, paths)
Join(combine, paths...) = Join(combine, paths)

# use vararg/tuple version of Parallel forward pass
model = Chain(
Join(vcat,
Chain(
Dense(1, 5),
Dense(5, 1)
),
Dense(1, 2),
Dense(1, 1),
),
Dense(4, 1)
) |> gpu

xs = map(gpu, (rand(1), rand(1), rand(1)))

model(xs)
# returns a single float vector with one value
```

### Multiple outputs: a custom `Split` layer

Our custom `Split` layer will accept a single input, then pass the input through a separate path to produce multiple outputs.

We start by following the same steps as the `Join` layer: define a struct, use [`Flux.@functor`](@ref), and define the forward pass.
```julia
using Flux
using CUDA

# custom split layer
struct Split{T}
paths::T
end

Split(paths...) = Split(paths)

Flux.@functor Split

(m::Split)(x::AbstractArray) = tuple(map(f -> f(x), m.paths))
```

Now we can test to see that our `Split` does indeed produce multiple outputs.
```julia
model = Chain(
Dense(10, 5),
CustomSplit(
Dense(5, 1),
Dense(5, 3),
Dense(5, 2)
)
) |> gpu

model(gpu(rand(10)))
# returns a tuple with three float vectors
```

A custom loss function for the multiple outputs may look like this:
```julia
using Statistics

# assuming model returns the output of a Split
# x is a single input
# ys is a tuple of outputs
function loss(x, ys, model)
# rms over all the mse
ŷs = model(x)
return sqrt(mean(Flux.mse(y, ŷ) for (y, ŷ) in zip(ys, ŷs)))
end
```
1 change: 1 addition & 0 deletions docs/src/models/layers.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ But in contrast to the layers described in the other sections are not readily gr
```@docs
Maxout
SkipConnection
Parallel
```

## Normalisation & Regularisation
Expand Down
10 changes: 6 additions & 4 deletions src/Flux.jl
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ using Zygote: Params, @adjoint, gradient, pullback, @nograd

export gradient

export Chain, Dense, Maxout, RNN, LSTM, GRU, SamePad, Conv, CrossCor, ConvTranspose,
AdaptiveMaxPool, AdaptiveMeanPool, GlobalMaxPool, GlobalMeanPool, MaxPool,
MeanPool, flatten, DepthwiseConv, Dropout, AlphaDropout, LayerNorm, BatchNorm,
InstanceNorm, GroupNorm, SkipConnection, params, fmap, cpu, gpu, f32, f64,
export Chain, Dense, Maxout, SkipConnection, Parallel, flatten,
RNN, LSTM, GRU,
SamePad, Conv, CrossCor, ConvTranspose, DepthwiseConv,
AdaptiveMaxPool, AdaptiveMeanPool, GlobalMaxPool, GlobalMeanPool, MaxPool, MeanPool,
Dropout, AlphaDropout, LayerNorm, BatchNorm, InstanceNorm, GroupNorm,
params, fmap, cpu, gpu, f32, f64,
testmode!, trainmode!

include("optimise/Optimise.jl")
Expand Down
48 changes: 48 additions & 0 deletions src/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -253,3 +253,51 @@ end
function Base.show(io::IO, b::SkipConnection)
print(io, "SkipConnection(", b.layers, ", ", b.connection, ")")
end

"""
Parallel(connection, layers...)
Create a 'Parallel' layer that passes an input array to each path in
`layers`, reducing the output with `connection`.
Called with one input `x`, this is equivalent to `reduce(connection, [l(x) for l in layers])`.
If called with multiple inputs, they are `zip`ped with the layers, thus `Parallel(+, f, g)(x, y) = f(x) + g(y)`.
# Examples
```jldoctest
julia> model = Chain(Dense(3, 5),
Parallel(vcat, Dense(5, 4), Chain(Dense(5, 7), Dense(7, 4))),
Dense(8, 17));
julia> size(model(rand(3)))
(17,)
julia> model = Parallel(+, Dense(10, 2), Dense(5, 2))
Parallel(+, Dense(10, 2), Dense(5, 2))
julia> size(model(rand(10), rand(5)))
(2,)
```
"""
struct Parallel{F, T}
connection::F
layers::T
end

Parallel(connection, layers...) = Parallel(connection, layers)

@functor Parallel

(m::Parallel)(x::AbstractArray) = mapreduce(f -> f(x), m.connection, m.layers)
(m::Parallel)(xs::Vararg{<:AbstractArray}) = mapreduce((f, x) -> f(x), m.connection, m.layers, xs)
(m::Parallel)(xs::Tuple) = m(xs...)

Base.getindex(m::Parallel, i::Integer) = m.layers[i]
Base.getindex(m::Parallel, i::AbstractVector) = Parallel(m.connection, m.layers[i]...)

function Base.show(io::IO, m::Parallel)
print(io, "Parallel(", m.connection, ", ")
join(io, m.layers, ", ")
print(io, ")")
end
19 changes: 18 additions & 1 deletion test/layers/basic.jl
Original file line number Diff line number Diff line change
Expand Up @@ -106,4 +106,21 @@ import Flux: activations
@test size(SkipConnection(Dense(10,10), (a,b) -> cat(a, b, dims = 2))(input)) == (10,4)
end
end
end

@testset "Parallel" begin
@testset "zero sum" begin
input = randn(10, 10, 10, 10)
@test Parallel(+, x -> zeros(size(x)), identity)(input) == input
end

@testset "concat size" begin
input = randn(10, 2)
@test size(Parallel((a, b) -> cat(a, b; dims=2), Dense(10, 10), identity)(input)) == (10, 4)
end

@testset "vararg input" begin
inputs = randn(10), randn(5), randn(4)
@test size(Parallel(+, Dense(10, 2), Dense(5, 2), Dense(4, 2))(inputs)) == (2,)
end
end
end
3 changes: 3 additions & 0 deletions test/outputsize.jl
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@

m = SkipConnection(Conv((3, 3), 3 => 16; pad = 1), (mx, x) -> cat(mx, x; dims = 3))
@test outputsize(m, (10, 10, 3, 1)) == (10, 10, 19, 1)

m = Parallel((mx, x) -> cat(mx, x; dims = 3), Conv((3, 3), 3 => 16; pad = 1), identity)
@test outputsize(m, (10, 10, 3, 1)) == (10, 10, 19, 1)
end

@testset "activations" begin
Expand Down