Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Orthogonal initialization feature. #1496

Merged
merged 39 commits into from
Feb 11, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
c277919
Add Orthogonal initialization feature.
SomTambe Feb 3, 2021
8bca693
Make necessary changes.
SomTambe Feb 3, 2021
95f94d4
Make nice changes.
SomTambe Feb 3, 2021
338c378
Update docstring.
SomTambe Feb 3, 2021
082a971
Make citation better.
SomTambe Feb 3, 2021
44965f0
Replace mapreduce thing.
SomTambe Feb 3, 2021
4a3d12b
Minor docstring change.
SomTambe Feb 3, 2021
1feb19e
Add tests.
SomTambe Feb 3, 2021
b2bc5cc
Minor changes.
SomTambe Feb 3, 2021
7a84f42
Rectified silly mistakes.
SomTambe Feb 3, 2021
28d05df
Modified docstring a bit.
SomTambe Feb 3, 2021
a8b15d1
Minor change.
SomTambe Feb 3, 2021
090fd7e
Update src/utils.jl
SomTambe Feb 3, 2021
8af7659
Removed the unwanted example.
SomTambe Feb 3, 2021
2735e0c
Merge branch 'master' of https://github.com/SomTambe/Flux.jl
SomTambe Feb 3, 2021
64d2e66
dims::Integer to give better error messages.
SomTambe Feb 3, 2021
a0191a5
Update NEWS.md
SomTambe Feb 4, 2021
d542d70
Rectified mistake.
SomTambe Feb 4, 2021
9221196
Changed orthogonal to orthogonal_init
SomTambe Feb 4, 2021
943baf2
Change the docs a bit to see if doctesting works.
SomTambe Feb 4, 2021
644bfef
Minor docstring changes.
SomTambe Feb 4, 2021
da935bb
Trying to make the doctests work
SomTambe Feb 4, 2021
4b04fdd
slight change
SomTambe Feb 5, 2021
b28b8db
Change for dims > 2.
SomTambe Feb 6, 2021
57b9af3
Add tests for dims > 2.
SomTambe Feb 6, 2021
f897c75
Merge branch 'master' into master
SomTambe Feb 6, 2021
418b316
Changed structure. Also changed the documentation a bit.
SomTambe Feb 6, 2021
5e801e2
Merge pull request #1 from FluxML/master
SomTambe Feb 8, 2021
21cdfc8
Make necessary changes.
SomTambe Feb 8, 2021
3e749da
Add `orthogonal` to docs/src/utilities.md
SomTambe Feb 8, 2021
7a2b610
Update src/utils.jl
SomTambe Feb 8, 2021
23a9c5b
Update src/utils.jl
SomTambe Feb 8, 2021
691ca35
Update src/utils.jl
SomTambe Feb 8, 2021
786eb8e
Changed the docs a bit.
SomTambe Feb 8, 2021
54bf710
Update src/utils.jl
SomTambe Feb 9, 2021
a80bea9
Add `rng` which I had forgotten.
SomTambe Feb 9, 2021
c632cf2
Slight change
SomTambe Feb 9, 2021
2c9f4b8
modified tests a bit
SomTambe Feb 9, 2021
8f2e4ed
Rectified mistake.
SomTambe Feb 9, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

## v0.12.0

* Add [Orthogonal Matrix initialization](https://github.com/FluxML/Flux.jl/pull/1496) as described in [Exact solutions to the nonlinear dynamics of learning in deep linear neural networks](https://arxiv.org/abs/1312.6120).
* Added [Focal Loss function](https://github.com/FluxML/Flux.jl/pull/1489) to Losses module
* The Dense layer now supports inputs with [multiple batch dimensions](https://github.com/FluxML/Flux.jl/pull/1405).
* Dense and Conv layers no longer perform [implicit type conversion](https://github.com/FluxML/Flux.jl/pull/1394).
Expand Down
1 change: 1 addition & 0 deletions docs/src/utilities.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Flux.glorot_uniform
Flux.glorot_normal
Flux.kaiming_uniform
Flux.kaiming_normal
Flux.orthogonal
Flux.sparse_init
```

Expand Down
66 changes: 66 additions & 0 deletions src/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,72 @@ end
kaiming_normal(dims...; kwargs...) = kaiming_normal(Random.GLOBAL_RNG, dims...; kwargs...)
kaiming_normal(rng::AbstractRNG; init_kwargs...) = (dims...; kwargs...) -> kaiming_normal(rng, dims...; init_kwargs..., kwargs...)

"""
orthogonal([rng=GLOBAL_RNG], dims...; gain = 1)

Return an `Array` of size `dims` which is a (semi) orthogonal matrix, as described in [1].

The input must have at least 2 dimensions.
For `length(dims) > 2`, a `prod(dims[1:(end - 1)])` by `dims[end]` orthogonal matrix
is computed before reshaping it to the original dimensions.

# Examples
```jldoctest; setup = :(using LinearAlgebra)
julia> W = Flux.orthogonal(5, 7);

julia> summary(W)
"5×7 Array{Float32,2}"

julia> W * W' ≈ I(5)
true

julia> W2 = Flux.orthogonal(7, 5);

julia> W2 * W2' ≈ I(7)
false

julia> W2' * W2 ≈ I(5)
true
darsnack marked this conversation as resolved.
Show resolved Hide resolved

julia> W3 = Flux.orthogonal(3, 3, 2, 4);

julia> transpose(reshape(W3, :, 4)) * reshape(W3, :, 4) ≈ I(4)
true
```

SomTambe marked this conversation as resolved.
Show resolved Hide resolved
# See also
* kaiming initialization using normal distribution: [`kaiming_normal`](@ref Flux.kaiming_normal)
* kaiming initialization using uniform distribution: [`kaiming_uniform`](@ref Flux.kaiming_uniform)
* glorot initialization using normal distribution: [`glorot_normal`](@ref Flux.glorot_normal)
* glorot initialization using uniform distribution: [`glorot_uniform`](@ref Flux.glorot_uniform)
* sparse initialization: [`sparse_init`](@ref Flux.sparse_init)

# References
[1] Saxe, McClelland, Ganguli. "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks", ICLR 2014, https://arxiv.org/abs/1312.6120

"""
function orthogonal(rng::AbstractRNG, rows::Integer, cols::Integer; gain = 1)
mat = rows > cols ? randn(rng, Float32, rows, cols) : randn(rng, Float32, cols, rows)

Q, R = LinearAlgebra.qr(mat)
Q = Array(Q) * sign.(LinearAlgebra.Diagonal(R))
if rows < cols
Q = transpose(Q)
end
Comment on lines +226 to +228
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another small one-liner trick and feel free to take any of it, or just ignore it.

Suggested change
if rows < cols
Q = transpose(Q)
end
Q = rows < cols ? transpose(Q) : Q

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I should keep my thing, looks more elegant 😄

Copy link
Member

@mcabbott mcabbott Feb 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reason this strikes several of us as weird is partly that it's not type-stable to re-use Q, not just for different things, but for different types depending on the values of rows, cols. This isn't performance-critical code but that's where everyone's taste was honed.

Again, I would write

return rows > cols ? gain .* M : gain .* transpose(M)

where M is some name for the thing which isn't Q anymore, and the two branches match the branches which generate the random numbers above. They could both be written out on several lines, mat = if rows > cos; randn(... etc, but however they are written, I think they should put the then/else clauses in the same order.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the mirrored if-else clause are a bit confusing. Should change that if nothing else.


return gain * Q
end

function orthogonal(rng::AbstractRNG, d1::Integer, ds::Integer...; kwargs...)
dims = (d1, ds...)
rows = prod(dims[1:end-1])
cols = dims[end]
return reshape(orthogonal(rng, rows, cols; kwargs...), dims)
end

orthogonal(dims::Integer...; kwargs...) = orthogonal(Random.GLOBAL_RNG, dims...; kwargs...)
orthogonal(rng::AbstractRNG; init_kwargs...) = (dims::Integer...; kwargs...) -> orthogonal(rng, dims...; init_kwargs..., kwargs...)

"""
sparse_init([rng=GLOBAL_RNG], dims...; sparsity, std = 0.01)

Expand Down
17 changes: 16 additions & 1 deletion test/utils.jl
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
using Flux
using Flux: throttle, nfan, glorot_uniform, glorot_normal, kaiming_normal, kaiming_uniform, sparse_init, stack, unstack, Zeros
using Flux: throttle, nfan, glorot_uniform, glorot_normal, kaiming_normal, kaiming_uniform, orthogonal, sparse_init, stack, unstack, Zeros
using StatsBase: var, std
using Random
using Test
Expand Down Expand Up @@ -96,6 +96,21 @@ end
end
end

@testset "orthogonal" begin
# A matrix of dim = (m,n) with m > n should produce a QR decomposition. In the other case, the transpose should be taken to compute the QR decomposition.
for (rows,cols) in [(5,3),(3,5)]
v = orthogonal(rows, cols)
rows < cols ? (@test v * v' ≈ I(rows)) : (@test v' * v ≈ I(cols))
end
for mat in [(3,4,5),(2,2,5)]
v = orthogonal(mat...)
cols = mat[end]
rows = div(prod(mat),cols)
v = reshape(v, (rows,cols))
rows < cols ? (@test v * v' ≈ I(rows)) : (@test v' * v ≈ I(cols))
end
end

@testset "sparse_init" begin
# sparse_init should yield an error for non 2-d dimensions
# sparse_init should yield no zero elements if sparsity < 0
Expand Down