Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic scheduling policies and a scheduler #1506

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

darsnack
Copy link
Member

This change adds basic scheduling policies via ParameterSchedulers.jl. You can find a full list of the available schedules here. Of course, I am happy to transfer ownership of ParameterSchedulers.jl to FluxML if that's desired. Also happy to go over the ParameterSchedulers.jl design and make changes/adjust the names and API.

The ideal scheduling interface would require #1481. In lieu of that, this PR adds Scheduler(schedule, opt) as the API with the caveat that only the LR can be scheduled. Once #1481 is merged, then this can be extended to schedule any parameter in opt without needing a breaking API change. (If we were to allow any parameter at this stage, then we would need a breaking API change).

Any schedule is just a simple iterator (i.e. implements Base.iterate and Base.getindex). That's pretty much the full interface. A summary of how this works in Flux:

# average use-case
# schedule an optimizer and advance with optimizer
opt = Schedule.Scheduler(Schedule.Exp= lr, γ = 0.5), Descent(lr))
Flux.train!(loss, ps, data, opt)

# more advanced case
# advance schedule every epoch
schedule = Schedule.Exp= lr, γ = 0.5)
for (epoch, eta) in zip(1:nepochs, schedule)
  opt.eta = eta
  Flux.train!(loss, ps, data, opt)
end

# even more advanced case
schedule = Schedule.ScheduleIterator(Schedule.Exp= lr, γ = 0.5))
for epoch in 1:nepochs
  for (i, (x, y)) in enumerate(data)
    if my_crazy_function(epoch, i) # doesn't evaluate true every iteration
      opt.eta = next!(schedule)
    end

    gs = Flux.gradient(() -> loss(x, y, m), ps)
    Optimise.update!(opt, ps, gs)
  end
end

A couple of notes:

  • I don't like the name ScheduleIterator, so please suggest something better if you can think of it
  • Things like opt.eta = schedule[t] or opt.eta = next!(schedule) can be wrapped into something like step!(opt, schedule) or set!(opt, schedule) (like @CarloLucibello suggested on Slack)
  • I think we should merge something like this since the deprecation path for Use Optimisers.jl #1481 is something we still need to discuss

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • Final review from @dhairyagandhi96 (for API changes).

@darsnack
Copy link
Member Author

Note that this PR gives us all the scheduling policies available in PyTorch + more.

@darsnack
Copy link
Member Author

One question is how we handle the docs. I am guessing the same as NNlib functions?

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Feb 12, 2021

Can I request punting on this? I also am not clear about what benefits we get putting it into the repo as opposed to adding it as a separate package. Further, I'd be inclined to work towards a scheduler interface in optimisers.jl instead.

We also have been speaking about moving chunks to their own independent repos.

Happy to link from the website

@darsnack
Copy link
Member Author

Sure, we can punt, but I feel like this is a core feature that our framework lacks. I'd rather just have the discussion.

Certainly, it may make more sense in Optimisers.jl. But, I think like the current optimizers, this switch can happen quite seamlessly when Flux is ready to transition to Optimisers.jl. This PR is a total of 18 lines of code, and it adheres to the current optimizers interface (no additional interfaces). So moving it over is about as hard as moving Descent over.

Of course, if you are referring to the schedules themselves, this is just reexporting them. I think that similar to optimizers, this is something that we want. Alternatively, we can just have docs and point users to the package like we do for MLDatasets.jl.

@DhairyaLGandhi
Copy link
Member

We'd not really have much trouble defining our own with the current interface, and I'd certainly want to have schedulers, but we'd move over to optimisers.jl soon ish so we can start playing with it, and I'd rather work on schedulers there.

Linking from the website and docs seems like the way to go to me.

Thanks for understanding

@darsnack
Copy link
Member Author

darsnack commented Feb 12, 2021

There's two pieces here: the schedules and the scheduler. In the case of the former, we cannot define all kinds of schedules with our interface without going to great lengths to circumvent the limitations of the interface. apply! is really well-suited to multiplicative scheduling policies but not others. Moreover, it limits the schedules to only be used with optimizers. But as I have said before, there's no reason this limitation should exist, and there are other things besides the LR that people try to schedule.

If #1481 lands soon-ish, then I agree that this PR is better suited for Optimisers.jl. But I have my doubts about the mutability discussion being resolved soon. We don't have to merge this PR immediately by any means. I'm happy to wait and see if we can resolve #1481 soon-ish (and would be quite happy to be proven wrong 😉).

@CarloLucibello
Copy link
Member

I have serious concerns about the transition to Optimizers.jl, the whole design may need more discussion, and if we want to do serious benchmarks and add a proper deprecation path, It's quite unlikely #1481 is going to land soon. I encourage people to complete the v0.12 milestone instead of rushing on that

@DhairyaLGandhi
Copy link
Member

Any remaining breakages in that pr are not optimisers related I don't think. Deprecations can be handled.

@CarloLucibello
Copy link
Member

#1481 is still missing FluxML/Optimisers.jl#9 (comment). Plus, there are important concerns in FluxML/Optimisers.jl#12

@CarloLucibello
Copy link
Member

I don't like what you call the average case, the fact that a scheduler is an optimizer, is awkward and not needed. scheduler and opt should be separate objects (although scheduler can contain a reference to opt if needed). We can add a keyword arg to train! to pass the scheduler.

@DhairyaLGandhi
Copy link
Member

The struct seems very dirty as a deprecation path. Hadn't seen FluxML/Optimisers.jl#12 earlier.

@darsnack
Copy link
Member Author

darsnack commented Feb 13, 2021

the fact that a scheduler is an optimizer, is awkward and not needed

The average case views the scheduler as an optimizer composition (higher-order optimizer). This is inline with the view that optimizer rules are gradient transformations, and optimizer compositions are rule transformations. Maybe it makes more sense with my original name "scheduled optimizer."

But if we want to punt on this syntax, then I am fine with that. As you mention, it only matters for train! at this stage. Removing the average case syntax also makes the PR a lot simpler.

@darsnack
Copy link
Member Author

Other concerns include names for everything and keyword argument constructors. I think I need to remove a lot of Unicode to match Flux practices.

Also whether we want a set!(opt, lr)/set!(opt, schedule) like function. Or are satisfied with opt.eta = lr/opt.eta = next!(schedule).

@DhairyaLGandhi
Copy link
Member

@darsnack are you planning on adding refs in the docs and removing the dependency or do we want to do it in a separate PR?

@darsnack
Copy link
Member Author

If we want to push out docs with a link reference, then I would suggest another PR. My intent isn't to close this one, as I still believe it should be reviewed, augmented to address issues, and merged. I was going to submit another PR with the doc reference, but I was planning on doing more than a link (i.e. I think there should be a small snippet in the docs as a usage example).

As for this PR, now that I have removed Scheduler @CarloLucibello's request, there isn't really any interface issue with Optimisers.jl. The only code I had written was Scheduler and that is now gone. This is just a simple re-export.

Like I explained before, scheduling policies and the scheduler (or mechanism to actually set the parameter in the optimizer) are separate. Scheduling policies should be thought of as equivalent to 0:0.1:10 in that they are nothing more than an iterator. The current schedules in Flux (ExpDecay and InvDecay) do not work this way. It's a clever trick but not a good scheduling design. It only works for schedules that can be expressed as iterative multiplies, and it is tightly coupled to the current optimizer rules (i.e. it assumes that the LR is expressed as a single multiply). It already cannot express common schedules in ML like cosine annealing or varying width step LRs without hacks to make it fit into the apply! interface. I feel fairly comfortable claiming that the current approach is inflexible, unscalable, and generally unintuitive.

Now, I understand that the code in ParameterSchedulers.jl needs to be properly reviewed and discussed, which is why I am happy to take this slow. But I find it a little frustrating that every time I bring this up, it gets shut down. I've expressed I'm open to all forms of changes to the code in ParameterSchedulers.jl and transferring ownership. What I've tried to do is put in my time and effort to present usable code with examples and documentation to have a constructive conversation around the design. I think it's only fair that we allow that conversation to happen.

In terms of the dependency issue, ParameterSchedulers.jl should be a negligible addition. It has no sub-dependencies since it only extends interfaces in Base, and it is a pretty small code base in total. In fact, it could easily fit into a schedules.jl file in Flux.jl, but I don't want to do that for the same reasons that Optimisers.jl is being separated. The scheduling policies are generally useful in many domains where annealing is used, so I'd like the code to be re-used without having to depend on Flux.

Co-authored-by: Carlo Lucibello <carlo.lucibello@gmail.com>
@CarloLucibello
Copy link
Member

CarloLucibello commented Feb 15, 2021

Maybe the module should live within Flux main module instead of being a submodule of Optimise.
Should be named Schedulers?

I'm ok with adding a lean external dependence. As a start, Flux should document only the foundations of the interface. These are the possible interfaces I can think off:

interface 1

using Flux.Optimise
using Flux.Schedulers

opt = ADAM()
scheduler = Schedulers.Exp(...)

for epoch in 1:epochs
   ....
   opt.eta = Schedulers.next!(scheduler)
end

interface 2

opt = ADAM()
scheduler = Schedulers.Exp(...)

for epoch in 1:epochs
   ....
   opt.eta = scheduler[epoch]
end

interface 3

opt = ADAM()
scheduler = Schedulers.Exp(...,)

for (epoch, eta) in zip(1:epochs, scheduler)
   ....
   opt.eta = eta
end

interface 4

opt = ADAM()
scheduler = Schedulers.Exp(opt, :eta, ...)

for (epoch, eta) in zip(1:epochs, scheduler)
   ....
  Scheduler.next!(scheduler)
end

interface 5

opt = ADAM()
scheduler = Schedulers.Exp(opt, :eta, ...)
train!(..., scheduler=scheduler)

interface 6

the scheduler is a optimizer. Integration in `update!

Discussion

I would document no more than interface 1,2,3 here (and actually only 1 is quite good already).

Interface 1 is the very basic interface and for me also the more convenient and the one I'm likely going to use.
Interface 2 could be useful in some situations.
Interface 3 it's not particularly useful I think, but it doesn't hurt either.
All the others need more discussion.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Feb 15, 2021

I'm uncomfortable with depending on a package much less re-export it if it's not vetted Flux code since that becomes first class API. It makes it very hard to change in the future. As you said, adding this as a dep doesn't benefit it's usability, so it's giving me further pause around the same.

I am aware of the limitations of the current api and share your frustrations and want to address them (with hopefully your learnings and guidance) consistently in Optimisers.jl.

In the meantime, to have users be directed to the right place, I offered to have the refs in the right places. I appreciate and encourage discussions around this topic and don't think I've stopped that here? I do think your involvement has particularly helped stream discussion in the right places too

@darsnack
Copy link
Member Author

Yes, sorry if my comments came off as antagonistic; it's just that this topic has come up several times, and we only now seem to be discussing it fully (thus, my frustrations). But it seems like we are moving in similar directions here, so thanks for that.

I'm uncomfortable with depending on a package much less re-export it if it's not vetted Flux code

I agree with this; my impression was that something wasn't even being considered, but maybe that's my misunderstanding. I think the best approach will be add the reference and documentation of how to use it. Let it exist in that form for some release cycles then we can evaluate how well it worked in practice with Flux code.

I am aware of the limitations of the current api and share your frustrations and want to address them (with hopefully your learnings and guidance) consistently in Optimisers.jl.

Absolutely, but just to be clear, I don't think that the optimizers and scheduling policies should be too tightly coupled. It makes more sense to me to use an interface (like 4, 5, 6 above or something similar but different) to apply scheduling policies to any optimizer in a generic way. I also still think the policies themselves should be separate from Optimisers.jl for the reasons I stated above.


Discussion on interfaces

So interfaces 1, 2, and 3 are all already concurrently supported. For 1, it would be ScheduleIterator(Exp(...)) instead, but we can change the name/eliminate the verbosity with some synaptic sugar. The schedules are specifically designed not to have mutable state, so the wrapper is needed to track that state. But we could do something like Exp(..., stateful = true) as sugar for getting a mutable Exp. The nice part of keeping the schedules immutable is that it gives us the flexibility to support interfaces 1, 2, and 3 seamlessly. I think this is nicer for users to have several options all with intuitive interfaces.

Interfaces 4, 5, and 6 are all various versions of the Scheduler that I have considered. They can all be made to work, though I'd prefer to stay away from the symbol based mechanism for specifying the parameter to mutate. I prefer closures instead like Schedulers.Exp(opt, ...; update = (o, s) -> (o.eta = s)). This looks uglier, but it means I can write code to modify any parameter without needing to look up the struct definition. Keep in mind that the default is to schedule to LR, so most people will just be writing Schedulers.Exp(opt, ...) anyways.

The cleanest version of 4, 5, and 6 requires Optimisers.jl. This would look like Schedulers.Exp(eta -> Descent(eta), ...). Since reinitializing the optimizer isn't costly in Optimisers.jl (due to explicit state), we're able to support this form. If we punt on the scheduler interface until #1481, we can get away with interface 1, 2, and 3 for now. This will allow us to vet the schedule policies while not committing to the interface with optimizers.

@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Feb 15, 2021

Let it exist in that form for some release cycles then we can evaluate how well it worked in practice with Flux code

I think adding the refs is a good idea. I'm still going to hold off on adding the dependency for the reasons I stated earlier.

I have long wanted a Callbacks.jl and Schedulers.jl too for completeness.

@darsnack darsnack mentioned this pull request Feb 17, 2021
4 tasks
bors bot added a commit that referenced this pull request Feb 19, 2021
1511: Add ParameterSchedulers.jl to docs r=darsnack a=darsnack

As per the discussion in #1506, this adds a section to the docs that briefly describes scheduling with ParameterSchedulers.jl. Only concerns before merging are the naming conventions in ParameterSchedulers.jl. If I could get some feedback on those, then I can submit a minor release before officially merging this into the Flux docs.

### PR Checklist

- [ ] ~~Tests are added~~
- [ ] ~~Entry in NEWS.md~~
- [x] Documentation, if applicable
- [ ] ~~Final review from `@dhairyagandhi96` (for API changes).~~


Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
@darsnack
Copy link
Member Author

I have a version of ParameterSchedulers.jl without the type hierarchy at darsnack/rm-optim. A couple of noted changes:

  • there are no abstract types
  • the only interface requirements are (s::MySchedule)(t) and Base.iterate(s::MySchedule[, state])
  • no more Lambda(f) schedules (you just use f directly)
  • I tried looking again at using Base.Iterators to implement things like Sequence or Loop, but this only works for iteration (you can't do (s::Sequence)(t) without collecting or just defining the behavior of Base.Iterators over again for a specific t)
  • this composes well with RFC: basic sketch of scheduling Optimisers.jl#15
  • I renamed SchedulerIterator to Stateful (not exported since the name is generic)

All in all, assuming we punt on Interfaces 4, 5, 6, I would say this can be merged (needing doc updates and proper review first of course). I don't mind if we don't, this is really just to say that it's here if we want it at some point.

@CarloLucibello
Copy link
Member

Shouldn't we wait for darsnack/rm-optim to be merged into master and a new ParameterSchedulers release tagged?

@darsnack
Copy link
Member Author

darsnack commented Feb 20, 2021

Yeah definitely. Will do that after cleaning up the docs.

bors bot added a commit that referenced this pull request Feb 23, 2021
1513: Update for latest ParameterSchedulers.jl release r=darsnack a=darsnack

Due to some finagling with #1506 and work on Optimisers.jl, I made some changes to the names/API of ParameterSchedulers.jl. This just updates the docs to reflect those names.

### PR Checklist

- [ ] ~~Tests are added~~
- [ ] ~~Entry in NEWS.md~~
- [x] Documentation, if applicable
- [ ] ~~Final review from `@dhairyagandhi96` (for API changes).~~


Co-authored-by: Kyle Daruwalla <daruwalla@wisc.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants