Should we support Tracker.jl? #2000

torfjelde · 2023-06-03T10:25:50Z

In #1979 we removed support for Tracker.jl because:

It is quite a burden to maintain (in particular when adding new features).
AFAIK there are no cases where it's preferable over Zygote or ReverseDiff.

But then @devmotion wrote the following that made me think that we should at least discuss this!

I just noticed this change - this seems quite dangerous because things might start breaking silently without us noticing it while still stating officially in the package (compat entry, docs etc.) that Tracker is supported.

So I think in case this is the official policy (which IMO is a bit sad because Tracker is quite stable, much simpler than ReverseDiff or Zygote, and started to receive updates more frequently again) then IMO it should be removed from the package, docs, etc., ideally before removing tests for it.

Originally posted by @devmotion in #1979 (comment)

torfjelde · 2023-06-03T10:27:50Z

One thing I'd like to add is that we at least warn people that Tracker.jl isn't being tested against anymore:

Turing.jl/src/essential/ad.jl

Lines 14 to 16 in fb7da8d

    
           function _setadbackend(::Val{:tracker}) 
        
               @warn "Usage of Tracker.jl with Turing.jl is no longer being actively tested and maintained; please use at your own risk. See Zygote.jl or ReverseDiff.jl for fully supported reverse-mode backends." 
        
               ADBACKEND[] = :tracker

So users will be aware in the case they try to use it. Hence I think part of the above comment is addressed.

But whether we should support or not is still something to discuss.

devmotion · 2023-06-03T14:50:54Z

It is quite a burden to maintain (in particular when adding new features).

Is there a concrete example of a recent feature for which Tracker support was not added or very time-consuming to add?

I think we definitely have to distribute our limited resources in a meaningful and productive way, and spending excessive amounts of time on fixing an AD backend does not seem like a good investment if there are good alternatives. However, dropping support for an AD system that works reliably completely is a seemingly extreme change. Maybe one way around it would be to more clearly state how well different AD systems are supported, a bit like the Tier 1/Tier 2/etc. support of Julia itself for different architectures.

It is also a bit strange IMO to drop support for the default reverse-mode backend and stop testing it, without removing it from the dependencies of Turing. Currently, still ForwardDiff and Tracker are the only AD dependencies of Turing and the only AD backends that are available for every user of Turing. For all other AD systems one has to install and load the backend separately.

In any case, my strong opinion is that anything that is declared compatible should be tested - but I think it would be fine to have different levels of support and hence mark some tests for some AD backends as broken (as we have been doing, e.g., in DistributionsAD for a long time).

torfjelde · 2023-06-05T06:02:25Z

Is there a concrete example of a recent feature for which Tracker support was not added or very time-consuming to add?

One was the tests in #1979 breaking for Tracker.jl without any obvious reason: https://github.com/TuringLang/Turing.jl/actions/runs/4769935772/jobs/8480760022

Granted, I didn't spend too much time debugging the above because we had also been running into some tests breaking for Tracker.jl in a Bijectors.jl PR too (TuringLang/Bijectors.jl#253).

But the above is really just what "broke the camels back"; supporting Tracker.jl does take a significant higher effort from us due to its lack of support for ChainRules. Both Zygote and ReverseDiff now suppports it, which means that writing a single rrule is sufficient if we drop suppport for Tracker. If we don't, we also have to write rrules for this, which is IMO prone to error + requires effort. Hence, if Tracker.jl isn't actually used by anyone together with Turing.jl, then I think dropping it is reasonable.

This of course all hinges on "is it being used", which I obviously cannot say for sure. What I'm pretty confident about is that we have not recommend Tracker.jl in years; if we suggest reverse-mode, we always suggest ReverseDiff or Zygote. I also don't really remember anyone reporting any issues or asking any questions relating to Tracker.jl, which also indicates a lack of usage (I find it hard to believe that it just works so well that nobody has issues while simultaneaously never suggesting usage of it).

I think we definitely have to distribute our limited resources in a meaningful and productive way, and spending excessive amounts of time on fixing an AD backend does not seem like a good investment if there are good alternatives. However, dropping support for an AD system that works reliably completely is a seemingly extreme change. Maybe one way around it would be to more clearly state how well different AD systems are supported, a bit like the Tier 1/Tier 2/etc. support of Julia itself for different architectures.

I do like this idea, but it's a bit unclear to me how we should implement this wrt. test-suite.

For example, if we were to put Tracker.jl in Tier 2, do we continue testing it but allow the CI to fail for this particular suite? Otherwise the ranking doesn't do much for us maintainers :/

It is also a bit strange IMO to drop support for the default reverse-mode backend and stop testing it, without removing it from the dependencies of Turing. Currently, still ForwardDiff and Tracker are the only AD dependencies of Turing and the only AD backends that are available for every user of Turing. For all other AD systems one has to install and load the backend separately.

Aaah okay that's a mistake on my end. I completely forgot to check if Tracker.jl was a dep of Turing.jl; I thought that was only ForwardDiff.

So I wholeheartedly agree with the above!

In any case, my strong opinion is that anything that is declared compatible should be tested - but I think it would be fine to have different levels of support and hence mark some tests for some AD backends as broken (as we have been doing, e.g., in DistributionsAD for a long time).

I'm honestly fine with this, as long as we define which ones we're happy to allow "breaking" changes for, and which ones we aren't (your tier-based system would do the trick) 👍

devmotion · 2023-06-05T10:25:32Z

due to its lack of support for ChainRules. Both Zygote and ReverseDiff now suppports it

There is at least interest in adding something similar to what's done in ReverseDiff to Tracker as well: FluxML/Tracker.jl#132 (Generally, in my experience the macro in ReverseDiff is also a bit buggy, limited, and does not work completely reliable, so IMO it's definitely not a first-class support of ChainRules in ReverseDiff)

For example, if we were to put Tracker.jl in Tier 2, do we continue testing it but allow the CI to fail for this particular suite?

I thought about marking the failing tests as broken, opening an issue (if someone has time to fix it, PRs would always be welcome) but not prioritizing work on such fixes.

torfjelde · 2023-06-05T15:27:09Z

There is at least interest in adding something similar to what's done in ReverseDiff to Tracker as well: FluxML/Tracker.jl#132

This would IMO go a long way!

(Generally, in my experience the macro in ReverseDiff is also a bit buggy, limited, and does not work completely reliable, so IMO it's definitely not a first-class support of ChainRules in ReverseDiff)

Sure, but that will then be caught in tests (or at least it should). So if I can simply insert a single macro call to defer to chainrules, and then tests are passing (which generally have been my experience), then that's already lowering the developer overhead substantially.

I thought about marking the failing tests as broken, opening an issue (if someone has time to fix it, PRs would always be welcome) but not prioritizing work on such fixes.

Yeah, this is a strategy I can get behind 👍

ToucheSir · 2023-07-15T23:11:07Z

I noticed a backlink to this issue from the Tracker one, so perhaps some background will help you with your decision.

There is at least interest in adding something similar to what's done in ReverseDiff to Tracker as well: FluxML/Tracker.jl#132

As I recall, that issue was created after yet another painful session fighting Zygote bugs. Unfortunately there's nothing to back up the interest: most of the AD-related work we do is necessarily spent on writing/improving ChainRules and keeping Zygote above water.

That said, if this is something that you think you might have capacity for on your side, let us know. A ReverseDiff-like solution is arguably easier now with package extensions, though in an ideal world one would be able to auto-add rules like I mentioned in FluxML/Tracker.jl#132 (comment). As-is, I fear the only plan is to wait for a better AD which doesn't require us to (re)write too many custom rules.

torfjelde mentioned this issue Jun 6, 2023

LazyArrays.jl extension is broken TuringLang/DistributionsAD.jl#245

Merged

torfjelde mentioned this issue Jul 27, 2023

CompatHelper: bump compat for Bijectors to 0.13, (keep existing compat) #2018

Merged

gdalle mentioned this issue Apr 2, 2024

Can DifferentiationInterface be useful for Turing? #2187

Closed

yebai closed this as completed Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we support Tracker.jl? #2000

Should we support Tracker.jl? #2000

torfjelde commented Jun 3, 2023 •

edited

Loading

torfjelde commented Jun 3, 2023

devmotion commented Jun 3, 2023

torfjelde commented Jun 5, 2023

devmotion commented Jun 5, 2023

torfjelde commented Jun 5, 2023

ToucheSir commented Jul 15, 2023

Should we support Tracker.jl? #2000

Should we support Tracker.jl? #2000

Comments

torfjelde commented Jun 3, 2023 • edited Loading

torfjelde commented Jun 3, 2023

devmotion commented Jun 3, 2023

torfjelde commented Jun 5, 2023

devmotion commented Jun 5, 2023

torfjelde commented Jun 5, 2023

ToucheSir commented Jul 15, 2023

torfjelde commented Jun 3, 2023 •

edited

Loading