Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure sparse backends and replace subtyping by traits #40

Merged
merged 18 commits into from
Apr 15, 2024

Conversation

gdalle
Copy link
Collaborator

@gdalle gdalle commented Apr 12, 2024

Motivation

This is a breaking PR (bumping version to v0.3.0) that restructures the handling of sparse backends, and cleans up the package in the process.

TLDR: Most (concrete) structs have not changed too much, but

  • their abstract subtypes are replaced by a mode trait
  • their sparse counterparts are replaced by the AutoSparse struct

Breaking API changes

  • Bump Julia compat lower bound to 1.10

Mode

  • Remove the abstract supertypes for the different modes (e.g. AbstractForwardMode and AbstractSparseForwardMode), and replace them with the mode trait. They were not documented but some packages used them anyway, like DifferentiationInterface.jl.
  • Make mode(ad::AbstractADType) return one of the following singletons: ForwardMode(), ReverseMode(), SymbolicMode() or the ambiguous ForwardOrReverseMode() (for cases like AutoEnzyme and AutoChainRules)
  • Assign the finite differences packages to ForwardMode

Concrete structs

@deprecate AutoSparseForwardDiff(; kwargs...) AutoSparse(AutoForwardDiff(; kwargs...))

Non-breaking API changes

Sparsity

  • Introduce an AutoSparse struct that wraps a dense_ad backend with a sparsity_detector and a coloring_algorithm (as discussed in Change handling of sparse backends #38)
  • Define getter functions for this new struct
  • Define abstract types and default (trivial) choices for
    • the sparsity detector (AbstractSparsityDetector, NoSparsityDetector)
    • the coloring algorithm (AbstractColoringAlgorithm, NoColoringAlgorithm)
  • Define utilities for getting Jacobian/Hessian sparsity patterns based on an AbstractSparsityDetector:
    • jacobian_sparsity
    • hessian_sparsity
  • Define utilities for getting column/row colorings based on an AbstractColoringAlgorithm:
    • column_coloring
    • row_coloring

Other changes

Dependencies

  • Add ChainRulesCore.jl and EnzymeCore.jl as weak dependencies to define more specific mode dispatches
    • based on ChainRulesCore.RuleConfig for ChainRulesCore.jl
    • based on EnzymeCore.Mode for EnzymeCore.jl

Package quality

  • Split source code into several files
  • Add code coverage to CI (see Add Codecov action #41)
  • Add more tests
  • Include Aqua and JET to tests for code quality
  • Add and improve docstrings
  • Structure API reference in the docs with sections
  • Clarify README

Checklist

  • Appropriate tests were added
  • Any code changes were done in a way that does not break public API
  • All documentation related to code changes were updated
  • The new code follows the
    contributor guidelines, in particular the SciML Style Guide and
    COLPRAC.
  • Any new documentation only uses public API

Questions

  • Should we document type parameters of the concrete structs? I feel like this limits us in the range of non-breaking changes we can do: for instance, turning AutoPolyesterForwardDiff{chunksize} into AutoPolyesterForwardDiff{chunksize,T} is breaking because someone might have used the former as a constructor. That is why I explicitly document keyword constructors and fields.
  • Is the weak dependency handling okay on Julia < 1.9? The alternative would be to avoid defining the mode trait altogether, which might be preferable
  • Should we pull in Diffractor.jl and Zygote.jl in the test suite to check on actual instances of the RuleConfig paradigm? I define very simple RuleConfigs in runtests.jl but it's less reliable than taking the real ones

@gdalle gdalle linked an issue Apr 12, 2024 that may be closed by this pull request
@gdalle gdalle mentioned this pull request Apr 12, 2024
@gdalle gdalle removed the request for review from ChrisRackauckas April 12, 2024 15:37
src/abstract.jl Outdated Show resolved Hide resolved
src/sparse.jl Outdated Show resolved Hide resolved
@ChrisRackauckas
Copy link
Member

Something not mentioned in the top there is that this will require that ArrayInterface.jl take on an ADTypes.jl dependency, which is fine given that this is lightweight and all, but it should be noted that its ColoringAlgorithm needs a @deprecate ColoringAlgorithm = ADTypes.AbstractColoringAlgorithm which is then used in downstream implementation of its functions. fast_matrix_colors and matrix_colors live there since they are part of the extended sparse matrix interface, so that array types can define their coloring pattern overloads (things like BandedMatrices) without taking a full dependency on the AD libraries, but then specialize whenever such structured matrices are hit with the coloring algorithm (ex. BandedMatrices has a nice analytical solution based on the band size).

@ChrisRackauckas
Copy link
Member

The symbols instead of abstract types thing is weird. That needs some explanation.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 13, 2024

Thanks for the review! Indeed you identified two parts where I was unsure:

The symbols instead of abstract types thing is weird. That needs some explanation.

See my comment #40 (comment)

will require that ArrayInterface.jl take on an ADTypes.jl dependency

Either that or the coloring names are dropped from ArrayInterface altogether? See my comment #40 (comment)

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 13, 2024

its ColoringAlgorithm needs a @deprecate ColoringAlgorithm = ADTypes.AbstractColoringAlgorithm

I can rename AbstactColoringAlgorithm to ColoringAlgorithm to minimize the refactoring work

fast_matrix_colors and matrix_colors live there since they are part of the extended sparse matrix interface,

Not sure how to deal with those though. I can also rename column_coloring and row_coloring but these are different algorithms, and I don't see a similar toggle in matrix_colors?

@ChrisRackauckas
Copy link
Member

I can rename AbstactColoringAlgorithm to ColoringAlgorithm to minimize the refactoring work

It's at least contained to two repos, so it's not too big of a deal if it's deprecated properly.

Not sure how to deal with those though. I can also rename column_coloring and row_coloring but these are different algorithms, and I don't see a similar toggle in matrix_colors?

Yeah that's mostly by adjoint. But the bigger thing would be updating all of the downstream analytical solutions to have these functions as well.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 13, 2024

But the bigger thing would be updating all of the downstream analytical solutions to have these functions as well.

If we keep the name matrix_colors and import ADTypes into ArrayInterface, there is nothing to change at all. But that would mean having only one coloring function, and if I want the row coloring I take the column coloring of the transpose?

@ChrisRackauckas
Copy link
Member

But that would mean having only one coloring function, and if I want the row coloring I take the column coloring of the transpose?

That's what we currently do, but if you're going to setup general sparse AD then it's not a good idea since you won't easily know if someone passes a coloring pattern if it should be A or A'. So I like the idea of two functions, but the work of updating all of the analytical solutions in the extensions to the two functions needs to be done as well.

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 13, 2024

I like the idea of two functions, but the work of updating all of the analytical solutions in the extensions to the two functions needs to be done as well.

I agree. Maybe ArrayInterface can keep defining matrix_colors (based on ADTypes.column_coloring), and that way as a transition we can just copy-paste the following lines in the downstream packages:

ADTypes.column_coloring(M) = ArrayInterface.matrix_colors(M)
ADTypes.row_coloring(M) = ArrayInterface.matrix_colors(M')

without changing the names in the actual function definitions for the time being

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 13, 2024

Thanks for the comments! I'll update the PR with a trait mechanism and ping you again for validation if that's okay?

@ChrisRackauckas
Copy link
Member

I agree. Maybe ArrayInterface can keep defining matrix_colors (based on ADTypes.column_coloring), and that way as a transition we can just copy-paste the following lines in the downstream packages:

Let's do the clean break. Such a clean break is the kind of thing that needs someone motivated in order to say yes. I don't see you stopping the DI push, so let's do it right. v0.3.0 here to signal the break, update the downstream in ArrayInterface and SparseDiffTools, and get sparse DI to do things the best way possible, and then integrate it into everything like NonlinearSolve.jl and DifferentialEquations.jl. You've got gusto and we've been sitting here waiting for something like DI to finally fix some of these remaining issues, so let's just pull the trigger and do it.

Copy link

codecov bot commented Apr 13, 2024

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

Thanks for integrating Codecov - We've got you covered ☂️

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 13, 2024

Not ready yet but close

@gdalle gdalle changed the title Restructure sparse backends Restructure sparse backends and replace subtyping by traits Apr 14, 2024
@gdalle
Copy link
Collaborator Author

gdalle commented Apr 14, 2024

@ChrisRackauckas I have implemented the trait mechanism and used the opportunity for other small breaking changes in the concrete structs. All of it is summarized in the first comment, which I have edited: #40 (comment)
What do you think?

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 15, 2024

For testing, while it's a hassle to add Diffractor and Zygote to the dependencies, I did perform the following check of my mode method definitions in the ChainRulesCore extension, so I'm reasonably confident:

julia> import Diffractor, Zygote

julia> using ADTypes  # on the PR branch

julia> ADTypes.mode(AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig()))
ADTypes.ReverseMode()

julia> ADTypes.mode(AutoChainRules(; ruleconfig=Diffractor.DiffractorRuleConfig()))
ADTypes.ForwardOrReverseMode()

julia> @which ADTypes.mode(AutoChainRules(; ruleconfig=Zygote.ZygoteRuleConfig()))
mode(::AutoChainRules{RC}) where RC<:(ChainRulesCore.RuleConfig{>:ChainRulesCore.HasReverseMode})
     @ ADTypesChainRulesCoreExt ~/.julia/packages/ADTypes/bOklB/ext/ADTypesChainRulesCoreExt.jl:16

julia> @which ADTypes.mode(AutoChainRules(; ruleconfig=Diffractor.DiffractorRuleConfig()))
mode(::AutoChainRules{RC}) where RC<:(ChainRulesCore.RuleConfig{>:Union{ChainRulesCore.HasForwardsMode, ChainRulesCore.HasReverseMode}})
     @ ADTypesChainRulesCoreExt ~/.julia/packages/ADTypes/bOklB/ext/ADTypesChainRulesCoreExt.jl:22

Of course there is no ChainRules-compatible backend with a forward-only RuleConfig, so we'll never hit the third method

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 15, 2024

I think we're there, the last few things to iron out are:

  • Should we really do the mode definition ourselves, or leave it to users?
  • If we define the mode, we need extensions for ChainRulesCore and EnzymeCore (fairly lightweight). Should we
    • bump Julia compat to 1.9 to make sure they remain extensions everywhere
    • accept that they are unconditional dependencies on <1.9
    • use Requires instead but it's useless on >=1.9 (I don't like this one cause at the moment the package is deps-free on >=1.9 and we should keep it that way)
  • Should we bump ADTypes version to 1.0?
  • Should we leave JET in the test suite, even though it errors on nightly? It does a great job of catching typos, and I made it so that the badge would still be green if only nightly fails

@ChrisRackauckas
Copy link
Member

If we define the mode, we need extensions for ChainRulesCore and EnzymeCore (fairly lightweight). Should we

We should define mode and at least bump to v1.9. I'd say just go to v1.10 knowing it's the next LTS and skip all of the other pain, there's other stuff to worry about in life.

Should we bump ADTypes version to 1.0?

Yes

Should we leave JET in the test suite, even though it errors on nightly? It does a great job of catching typos, and I made it so that the badge would still be green if only nightly fails

We just shouldn't run nightly, it's not for humans. We can set the new prerelease branch when it's ready, but since nightly isn't for packages (which is why the prerelease is being made) it's a waste of our time to be trying it.

gdalle and others added 4 commits April 15, 2024 13:16
@gdalle
Copy link
Collaborator Author

gdalle commented Apr 15, 2024

I'd say just go to v1.10 knowing it's the next LTS and skip all of the other pain, there's other stuff to worry about in life.

Done.

We just shouldn't run nightly, it's not for humans.

Removed nightly from CI

@gdalle
Copy link
Collaborator Author

gdalle commented Apr 15, 2024

Once I get an approving review I'll merge, but I'd like to do two things before we register v1:

  • ask for feedback, either on Slack or Discourse
  • test it with DifferentiationInterface to make sure we didn't forget something essential

@ChrisRackauckas
Copy link
Member

Yes no need to release right away, let's get downstream all set and make sure everyone is bought into this form.

Copy link
Member

@Vaibhavdixit02 Vaibhavdixit02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Codecov action Change handling of sparse backends What is the right way to specify the Enzyme mode?
3 participants