Make `loss(f,x,y) == loss(f(x), y)` #2090

mcabbott · 2022-10-18T16:35:41Z

If train! stops accepting implicit parameters, as in #2082, then its loss function needs to accept the model as an argument, rather than close over it.

This makes all the built-in ones do so, to avoid defining loss(m,x,y) = mse(m(x), y) etc. yourself every time.

(Defining loss(x,y) = mse(model(x), y) every time used to be the idiom for closing over the model, and IMO this is pretty confusing. It means "loss function" means two things. Cleaner to delete this entirely than to update it to a 3-arg version.)

PR Checklist

Tests are added
Entry in NEWS.md
Documentation, if applicable

darsnack · 2022-10-20T13:31:18Z

docs/src/models/losses.md

+All loss functions in Flux have a method which takes the model as the first argument, and calculates the prediction `ŷ = model(x)`.
+This is convenient for [`train!`](@ref Flux.train)`(loss, model, [(x,y), (x2,y2), ...], opt)`:

 ```julia
-loss(ŷ, y)                         # defaults to `mean`
-loss(ŷ, y, agg=sum)                # use `sum` for reduction
-loss(ŷ, y, agg=x->sum(x, dims=2))  # partial reduction
-loss(ŷ, y, agg=x->mean(w .* x))    # weighted mean
-loss(ŷ, y, agg=identity)           # no aggregation.
+loss(model, x, y) = loss(model(x), y)
+```


GH won't let me suggest on this easily, but right now, it almost reads like you need to define the 3-arg loss to work with train! (which is the exact opposite intent!). Something like

All loss functions in Flux have a method which takes the model as the first argument, and calculates the prediction `ŷ = model(x)`, and finally the loss `loss(ŷ, y)`. This is convenient for passing the loss function directly to [`train!`](@ref Flux.train)`(loss, model, [(x,y), (x2,y2), ...], opt)`. For a custom loss, you can replicate this as: ```julia myloss(model, x, y) = myloss(model(x), y) ```

Yes I wondered this too. In this doc section "loss" is an example of any built-in one.

I wonder if it should use say mse everywhere, and say "Flux has a method like this already defined:"?

Yeah, maybe it is clearer to start this section by saying something like "Using Flux.Losses.mse as an example, ...". Then say, for this specific point,

All loss functions in Flux have a method which takes the model as the first argument, and calculates the loss such that ```julia Flux.Losses.mse(model, x, y) == Flux.Losses.mse(model(x), y) ``` This is convenient for passing the loss function directly to [`train!`](@ref Flux.train)`(loss, model, [(x,y), (x2,y2), ...], opt)`.

Turns out I was half-done with changing this section locally to work through defining a new one, rather than listing properties of existing ones. See what you think? Agree that if it does discuss existing ones, it should be ==.

darsnack · 2022-10-20T13:32:27Z

A NEWS entry for this feature would be good too

mcabbott · 2022-10-20T14:08:07Z

src/losses/Losses.jl

+    """
+        $($loss)(model, x, y)
+
+    This method calculates `ŷ = model(x)`. Accepts the same keyword arguments.


Kept this docstring short. Not so sure whether or not it will show up in the doc listing, nor whether it should.

ToucheSir · 2022-10-20T14:26:01Z

Sorry, I have to say that I'm really not a fan of this signature because it excludes a bunch of models while adding one more thing to know for loss function authors. For example, what does mse(m, x, y) even mean if you're doing self-supervised learning and m is some siamese network?

Given that the existing train! API requires users to define their own zero-arg "loss" function already, could we not keep that constraint (bring your own function) and pass in (m, x, y)? This would be strictly less confusing than the status quo and we could rename the callback to "forward pass" or some such.

mcabbott · 2022-10-20T14:33:54Z

Yes I agree it's specialised to some uses.

It just seems slightly weird to force people to define a function which is the just adjusting the signature to work, not doing any work or making any choices. They are forced to do so now because, in addition, this function closes over the model. So it must be re-defined if you change the model.

I suppose it seems especially odd if the "official" documented way is that you must name this trivial function. And perhaps writing always something like this would be less odd:

train!(model, [(x1,y1), (x2,y2), ...], opt) do m,x,y
  mse(m(x), y)
end

However, that's still quite a bit of boilerplate to say "use mse". And I know some people find the do super-confusing at first.

ToucheSir · 2022-10-21T04:38:32Z

If it were just a matter of clarifying how the do syntax works, we could address this with a docs issue. But to the brevity point, ideally we'd be able to extract out some loss(f,x,y) = loss(f(x), y) helper so that individual loss functions don't have to be responsible for being model-aware? It would be one more verb/noun to learn, but it would save us confused users who ported over a loss(x, y) function from some other library and don't understand the resulting MethodError (I'm assuming that if they don't understand do, they'd have a hard time with this too). If this wrapper were a named type, there's even a chance to toss in optimization state and thus simplify #2082, but I haven't thought too hard about that yet.

mcabbott · 2022-10-21T16:13:41Z

confused users who ported over a loss(x, y) function from some other library and don't understand the resulting MethodError

Right now this is worse, loss(x, y) = norm(x - y) will result in zero gradients but no error.

For implicit-Flux, having methods like mse(m) = (x,y) -> mse(m(x), y) would allow train!(mse(model), params(model), data, opt) which is less obscure than what we have now. Or it could be spelled train!(applyloss(mse, model), params(model), data, opt) with one more verb.

For explicit-Flux, we could have train!(applyloss(mse), model, data, opt). Not a big fan of a verb exclusively to translate built-in loss to what built-in train! wants. though.

We could also just make train!(loss, model, [(x1,y1), (x2,y2)], opt) call loss(model(x1), y1). The rule is then loss(model(data[1][1]), data[1][2:end]...) instead of loss(model, data[1]...) in #2082? Not sure either.

ToucheSir · 2022-10-22T17:47:41Z

Right now this is worse, loss(x, y) = norm(x - y) will result in zero gradients but no error.

Yeah, that's a good argument for having the model(x) part more explicit. Is it too bad to ask users to write a 2-liner?

data = [(x1,y1), (x2,y2), ...]
train!((m, x, y) -> mse(m(x), y), model, data, opt)

Most users can directly copy-paste this, and those who have more complex forward passes can either define a separate function or ease into learning the do syntax. But if one just wants to add a regularization term?

train!((m, x, y) -> mse(m(x), y) + Optimisers.total(norm, m), model, data, opt)

mcabbott · 2022-11-29T17:41:44Z

OK, https://fluxml.ai/Flux.jl/previews/PR2114/training/training/ takes this view that we should just always make an anon. function. It emphasises gradient + update over train!, and for gradient you are always going to want that. And it explains the do block several times.

make loss(f,x,y) work

90a4335

mcabbott mentioned this pull request Oct 18, 2022

Add explicit train!, unify update!, and auto-translate the two Adams #2082

Merged

3 tasks

mcabbott added 2 commits October 18, 2022 13:03

rm module docstring

8917099

tweak

39055ae

mcabbott force-pushed the loss_f branch from a9a0c8a to 39055ae Compare October 18, 2022 17:09

darsnack approved these changes Oct 20, 2022

View reviewed changes

news, doc example

3980a1c

mcabbott commented Oct 20, 2022

View reviewed changes

mcabbott mentioned this pull request Dec 26, 2022

accept integer labels in (logit)crossentropy #2141

Open

3 tasks

mcabbott mentioned this pull request Feb 12, 2023

Propose accuracy functions #2181

Draft

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make `loss(f,x,y) == loss(f(x), y)` #2090

Make `loss(f,x,y) == loss(f(x), y)` #2090

Uh oh!

mcabbott commented Oct 18, 2022 •

edited

Loading

Uh oh!

darsnack Oct 20, 2022 •

edited

Loading

Uh oh!

mcabbott Oct 20, 2022

Uh oh!

darsnack Oct 20, 2022 •

edited

Loading

Uh oh!

mcabbott Oct 20, 2022

Uh oh!

darsnack commented Oct 20, 2022

Uh oh!

mcabbott Oct 20, 2022

Uh oh!

ToucheSir commented Oct 20, 2022 •

edited

Loading

Uh oh!

mcabbott commented Oct 20, 2022 •

edited

Loading

Uh oh!

ToucheSir commented Oct 21, 2022

Uh oh!

mcabbott commented Oct 21, 2022

Uh oh!

ToucheSir commented Oct 22, 2022 •

edited

Loading

Uh oh!

mcabbott commented Nov 29, 2022

Uh oh!

Uh oh!

Uh oh!

Make loss(f,x,y) == loss(f(x), y) #2090

Are you sure you want to change the base?

Make loss(f,x,y) == loss(f(x), y) #2090

Uh oh!

Conversation

mcabbott commented Oct 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Checklist

Uh oh!

darsnack Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcabbott Oct 20, 2022

Choose a reason for hiding this comment

Uh oh!

darsnack Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mcabbott Oct 20, 2022

Choose a reason for hiding this comment

Uh oh!

darsnack commented Oct 20, 2022

Uh oh!

mcabbott Oct 20, 2022

Choose a reason for hiding this comment

Uh oh!

ToucheSir commented Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcabbott commented Oct 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ToucheSir commented Oct 21, 2022

Uh oh!

mcabbott commented Oct 21, 2022

Uh oh!

ToucheSir commented Oct 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mcabbott commented Nov 29, 2022

Uh oh!

Uh oh!

Make `loss(f,x,y) == loss(f(x), y)` #2090

Make `loss(f,x,y) == loss(f(x), y)` #2090

mcabbott commented Oct 18, 2022 •

edited

Loading

darsnack Oct 20, 2022 •

edited

Loading

darsnack Oct 20, 2022 •

edited

Loading

ToucheSir commented Oct 20, 2022 •

edited

Loading

mcabbott commented Oct 20, 2022 •

edited

Loading

ToucheSir commented Oct 22, 2022 •

edited

Loading