diff --git a/NEWS.md b/NEWS.md index 68d36fdc34..87333f8717 100644 --- a/NEWS.md +++ b/NEWS.md @@ -7,7 +7,7 @@ See also [github's page](https://github.com/FluxML/Flux.jl/releases) for a compl This also adds `show` methods for pretty printing. ## v0.14.12 -* New `SignDecay` optimiser, like `` WeightNorm` but for L1 norm. +* New `SignDecay` optimiser, like `WeightDecay` but for L1 norm. ## v0.14.0 (July 2023) * Flux now requires julia v1.9 or later. diff --git a/Project.toml b/Project.toml index 5ec702c6b1..bc31cd5d3f 100644 --- a/Project.toml +++ b/Project.toml @@ -1,6 +1,6 @@ name = "Flux" uuid = "587475ba-b771-5e3f-ad9e-33799f191a9c" -version = "0.14.12" +version = "0.14.13" [deps] Adapt = "79e6a3ab-5dfb-504d-930d-738a2a938a0e" diff --git a/docs/src/models/advanced.md b/docs/src/models/advanced.md index ab045d96be..fb36553788 100644 --- a/docs/src/models/advanced.md +++ b/docs/src/models/advanced.md @@ -142,7 +142,7 @@ Join(combine, paths...) = Join(combine, paths) ``` Notice that we parameterized the type of the `paths` field. This is necessary for fast Julia code; in general, `T` might be a `Tuple` or `Vector`, but we don't need to pay attention to what it specifically is. The same goes for the `combine` field. -The next step is to use [`Functors.@layer`](@ref) to make our struct behave like a Flux layer. This is important so that calling `params` on a `Join` returns the underlying weight arrays on each path. +The next step is to use [`Flux.@layer`](@ref) to make our struct behave like a Flux layer. This is important so that calling `Flux.setup` on a `Join` maps over the underlying trainable arrays on each path. ```julia Flux.@layer Join ``` diff --git a/docs/src/models/basics.md b/docs/src/models/basics.md index fb0f2d5488..cf83764349 100644 --- a/docs/src/models/basics.md +++ b/docs/src/models/basics.md @@ -255,7 +255,7 @@ m(5) # => 26 ## Layer Helpers -There is still one problem with this `Affine` layer, that Flux does not know to look inside it. This means that [`Flux.train!`](@ref) won't see its parameters, nor will [`gpu`](@ref) be able to move them to your GPU. These features are enabled by the [`@functor`](@ref Functors.@functor) macro: +There is still one problem with this `Affine` layer, that Flux does not know to look inside it. This means that [`Flux.train!`](@ref) won't see its parameters, nor will [`gpu`](@ref) be able to move them to your GPU. These features are enabled by the [`@layer`](@ref Flux.@layer) macro: ```julia Flux.@layer Affine diff --git a/docs/src/models/functors.md b/docs/src/models/functors.md index ab0883c95e..861528cda9 100644 --- a/docs/src/models/functors.md +++ b/docs/src/models/functors.md @@ -2,7 +2,11 @@ Flux models are deeply nested structures, and [Functors.jl](https://github.com/FluxML/Functors.jl) provides tools needed to explore such objects, apply functions to the parameters they contain, and re-build them. -New layers should be annotated using the `Functors.@functor` macro. This will enable [`params`](@ref Flux.params) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU. +!!! compat "Flux ≤ 0.14" + All layers were previously defined with the `Functors.@functor` macro. + This still works, but it is recommended that you use the new [`Flux.@layer`](@ref Flux.@layer) macro instead. + Both allow [`Flux.setup`](@ref Flux.setup) to see the parameters inside, and [`gpu`](@ref) to move them to the GPU, but [`Flux.@layer`](@ref Flux.@layer) also overloads printing, + and offers a way to define `trainable` at the same time. `Functors.jl` has its own [notes on basic usage](https://fluxml.ai/Functors.jl/stable/#Basic-Usage-and-Implementation) for more details. Additionally, the [Advanced Model Building and Customisation](@ref man-advanced) page covers the use cases of `Functors` in greater details. diff --git a/docs/src/models/layers.md b/docs/src/models/layers.md index 31db9cd204..177a3eca94 100644 --- a/docs/src/models/layers.md +++ b/docs/src/models/layers.md @@ -12,7 +12,7 @@ The `Dense` exemplifies several features: * The bias vector is always initialised [`Flux.zeros32`](@ref). The keyword `bias=false` will turn this off, i.e. keeping the bias permanently zero. -* It is annotated with [`@functor`](@ref Functors.@functor), which means that [`params`](@ref Flux.params) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU. +* It is annotated with [`@layer`](@ref Flux.@layer), which means that [`Flux.setup`](@ref Flux.setup) will see the contents, and [`gpu`](@ref Flux.gpu) will move their arrays to the GPU. By contrast, `Chain` itself contains no parameters, but connects other layers together. The section on [dataflow layers](@ref man-dataflow-layers) introduces others like this. diff --git a/docs/src/saving.md b/docs/src/saving.md index 16f944ef08..37c0470704 100644 --- a/docs/src/saving.md +++ b/docs/src/saving.md @@ -16,12 +16,12 @@ julia> struct MyModel net end -julia> Flux.@functor MyModel +julia> Flux.@layer MyModel julia> MyModel() = MyModel(Chain(Dense(10, 5, relu), Dense(5, 2))); julia> model = MyModel() -MyModel(Chain(Dense(10 => 5, relu), Dense(5 => 2))) +MyModel(Chain(Dense(10 => 5, relu), Dense(5 => 2))) # 67 parameters julia> model_state = Flux.state(model); diff --git a/docs/src/training/optimisers.md b/docs/src/training/optimisers.md index 25d817454e..6b8a80c25e 100644 --- a/docs/src/training/optimisers.md +++ b/docs/src/training/optimisers.md @@ -112,6 +112,7 @@ Similar to optimisers, Flux also defines some simple decays that can be used in ExpDecay InvDecay WeightDecay +SignDecay ``` ## Gradient Clipping diff --git a/docs/src/training/training.md b/docs/src/training/training.md index 6dd80897b5..0370c86a3d 100644 --- a/docs/src/training/training.md +++ b/docs/src/training/training.md @@ -384,6 +384,9 @@ Flux.thaw!(opt_state) The earlier "implicit" equivalent was to pass to `gradient` an object referencing only part of the model, such as `Flux.params(bimodel.layers.enc)`. +While `adjust!` and `freeze!`/`thaw!` make temporary modifications to the optimiser state, +permanently removing some fields of a new layer type from training is usually done +when defining the layer, by calling for example [`@layer`](@ref Flux.@layer)` NewLayer trainable=(weight,)`. ## Implicit or Explicit? diff --git a/src/Flux.jl b/src/Flux.jl index 5675f7c10f..a8720b7905 100644 --- a/src/Flux.jl +++ b/src/Flux.jl @@ -34,11 +34,11 @@ export Chain, Dense, Embedding, Maxout, SkipConnection, Parallel, PairwiseFusion @compat(public, ( # mark unexported symbols as API, on Julia 1.11 # modules - Losses, + Losses, Train, # layers Bilinear, Scale, dropout, # utils - outputsize, state, + outputsize, state, create_bias, @layer, )) include("optimise/Optimise.jl") diff --git a/src/functor.jl b/src/functor.jl index 34fe52db35..f09ac6ae93 100644 --- a/src/functor.jl +++ b/src/functor.jl @@ -286,7 +286,7 @@ _paramtype(::Type{T}, x::AbstractArray{<:Complex{<:AbstractFloat}}) where {T<:Ab f32(m) Converts the `eltype` of model's *floating point* parameters to `Float32` (which is Flux's default). -Recurses into structs marked with [`@functor`](@ref). +Recurses into structs marked with [`@layer`](@ref Flux.@layer). See also [`f64`](@ref) and [`f16`](@ref). """ @@ -296,7 +296,7 @@ f32(m) = _paramtype(Float32, m) f64(m) Converts the `eltype` of model's *floating point* parameters to `Float64`. -Recurses into structs marked with [`@functor`](@ref). +Recurses into structs marked with [`@layer`](@ref Flux.@layer). See also [`f32`](@ref) and [`f16`](@ref). """ @@ -306,7 +306,7 @@ f64(m) = _paramtype(Float64, m) f16(m) Converts the `eltype` of model's *floating point* parameters to `Float16`. -Recurses into structs marked with [`@functor`](@ref). +Recurses into structs marked with [`@layer`](@ref Flux.@layer). Support for `Float16` is limited on many CPUs. Julia may convert to `Float32` for each operation, which is slow. @@ -330,7 +330,7 @@ Chain( """ f16(m) = _paramtype(Float16, m) -# Functors for certain Julia data structures +# Functors for certain Julia data structures -- PIRACY, should move to Functors.jl @functor Cholesky trainable(c::Cholesky) = () diff --git a/src/layers/macro.jl b/src/layers/macro.jl index 9e770add87..2fb6db0faf 100644 --- a/src/layers/macro.jl +++ b/src/layers/macro.jl @@ -7,12 +7,12 @@ This macro replaces most uses of `@functor`. Its basic purpose is the same: When you define a new layer, this tells Flux to explore inside it to see the parameters it trains, and also to move them to the GPU, change precision, etc. + Like `@functor`, this assumes your struct has the default constructor, to enable re-building. +If you define an inner constructor (i.e. a function within the `struct` block) things may break. The keyword `trainable` allows you to limit this exploration, instead of visiting all `fieldnames(T)`. Note that it is never necessary to tell Flux to ignore non-array objects such as functions or sizes. -* If some fields look like parameters but should not be trained, - then `trainable` lets you specify which fields to include, while the rest are ignored. The macro also handles overloads of `show` for pretty printing. * By default, it adds methods to 3-arg `Base.show` to treat your layer much like `Dense` or `Conv`. @@ -21,7 +21,7 @@ The macro also handles overloads of `show` for pretty printing. (You probably still want to define 2-arg `show(io::IO, x::Layer)`, the macro does not touch this.) -Note that re-running the macro with different options may not overwrite all methods, you will need to restart. +Note that re-running the macro with different options may not remove all methods, you will need to restart. # Example ```jldoctest