Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V3: complete overhaul #372

Merged
merged 410 commits into from
Jul 26, 2024
Merged

V3: complete overhaul #372

merged 410 commits into from
Jul 26, 2024

Conversation

kahaaga
Copy link
Member

@kahaaga kahaaga commented Jul 5, 2024

V3: breaking and other changes

This PR contains a complete overhaul of the codebase. The git diff is meaningless. It is better to have a look at the documentation (docstrings and examples, which there are plenty of).
The docs are here. Little content has changed, but the syntax has changed across the board.

This will be the definitely last silly-big PR. There will be no more large redesigns unless there is an extremely good reason for it. Given that we can now seamlessly use everything from ComplexityMeasures.jl here, I doubt there will be anything new coming up in the foreseeable future.

Package structuring

The package is organised around three main themes:

  • Association measures
  • Independence testing
  • Network inference ("causal graphs" being a subset of the network inference part)

This is now reflected systematically across folder structure documentation structure and through the type system (hence a lot of moving files, and the huge diff).

Breaking changes

Many changes are breaking. The reason is simple: I don't have the extra time to spend rigorously making everything backwards-compatible. Finalising this PR already took many weeks, and it builds on endless experimentation and back-and-forth design changes over the past few years (in tandem with the work we did in ComplexityMeasures.jl). I simply don't have more time to spend on this nitpicking this any more, given that my industry jobs take up most of my time now. I'd much rather lock the design, finish V3, and move on with finishing all the features I've been working on and finalising the accompanying paper. Hence: deprecations for earlier versions are gone, much is breaking (but these changes should be well-covered in the changelog).

The main changes are

  • Three main contents define the package: association measures/estimators, independence testing, and network inference.

  • association is the function that computes associations between variables.

  • Estimators contains measure definitions (as we do in ComplexityMeasures.jl). For example association(KSG1(MIShannon()), x, y) computes Shannon mutual information.

  • Counts/probabilities for multivariate data.

  • A complete overhaul of the multivariate information API.

  • Two different ways of discretising input data: per row/point (CodifyPoints) and per variable/column (CodifyVariables). CodifyPoints takes Encodings as inputs, and CodifyVariables takes OutcomeSpaces as inputs. See the documentation for how these work.

  • Rename SurrogateTest to SurrogateAssociationTest, since TimeseriesSurrogates.jl now defines SurrogateTest.

  • Slim down the documentation by keeping much more (condensed) stuff as tables in docstrings

  • Convenience functions are gone.

  • Example systems are removed (these should go into the PredefinedDynamicalSystems.jl package, or some other package, at some point).

  • A LOT of examples. Every association measure, every estimator, every independence testing method, and every network inference method is documented with at least one runnable example that gets executed when the docs build.

  • Increased test coverage to >92%.

  • Made a breakout-PR (Future features from 3.0 #374 ) containing a lot of unfinished content that will at some point go into the package. But this has to be done in a more structured manner in separate PRs, so I've completely removed anything temporarily from the codebase in this PR.

Big decision: package renaming?

I seriously want to rename the package. It consists of three main things

  • Association measures
  • Independence testing
  • Network inference ("causal graphs" being a subset of the network inference part)

"CausalityTools" is a loaded term, which may give the user the impression that all or most functionality here is related too "causal inference". But causality has very specific meanings in specific contexts, and I don't want to dive into that controversy here. In preparation for the paper that will accompany v3.X, I want to keep everything as neutral as possible here.

We had a discussion in the Bergen group and we agreed that "Associations.jl" is a better name.
("AssociationMeasures.jl" could also work, to mirror ComplexityMeasures.jl, however, it limits the scope of the package - we'd rather use "associations" alone, since that also can contain things like independence testing and network inference under the same umbrella).

Yay/nay?

@kahaaga kahaaga marked this pull request as ready for review July 25, 2024 13:47
@kahaaga
Copy link
Member Author

kahaaga commented Jul 25, 2024

@Datseris This is as ready as it gets from my side now. A review would be appreciated!

PS: I completely understand that this will probably take some time to go through, given the useless git diff. Please use documentation, the examples and the docstrings to guide the review! There are many, many more examples now, and they are mostly all cross-referenced.

I essentially have no more time to spend on this in the foreseeable future, so I hope this is good enough to at least publish a new version that is up to speed with the latest version ComplexityMeasures.jl. At least it will be better than the current version, which lacks some compatibility.

@kahaaga kahaaga requested a review from Datseris July 25, 2024 13:51
@Datseris
Copy link
Member

hi, sorry for the next two months I am super duper booked finishing projects as well as finishing some papers I am currently writing. I can have a detailed look at this after mid September (please ping again if I forget at this time!). I think Associations.jl is a fine name. Having lots of breaking changes is also fine as this is a major new release. And, if you rename the packge, you in fact release a new package anyways as that's how renaming a package works in Julia.

@kahaaga
Copy link
Member Author

kahaaga commented Jul 26, 2024

@Datseris Thank you for the quick response.

I think moving quickly here is preferred, as opposed to waiting several months for a release, for two reasons:

  • Given that we've submitted a paper with a section referring to compatibility with this software, I think it is crucial that we as soon as possible release a version that compatible with ComplexityMeasures V3.
  • I also have some papers from my UiB-employment that I want to finally wrap up and submit. This includes the companion paper for this software. These use the V3 syntax in this PR.

I already spent a few weeks quality-checking this improving tests (~95% coverage) and expanding with more runnable examples (all measures/estimators, independence tests and causal graph algorithms have runnable examples that are cross-referenced in docstrings). Given your busy schedule (and mine): this is good enough for a new release as is.

If anything major shows up that requires breaking changes, we solve it either by deprecations or by releasing another new major version fairly quickly. Any other minor kinks can just be sorted with regular PRs.

I propose to merge this now, change the package name and release a new package Associations.jl.

Yay/nay?

@Datseris
Copy link
Member

Okay, yes, this is fine. I went through the docs now (very quickly) and it looks very good anyways. I can open issues after you've released Associations.jl v1 anyways.

I think this needs a central tutorial akin to ComplexityMeasures.jl, but I undersatnd this would take also time :D In the next semester I may be able to get PhD funding for someone working on Causal Graphs in cloudiness and if I get that they would also contribute and maintain this library. Fingers crossed!

@kahaaga
Copy link
Member Author

kahaaga commented Jul 26, 2024

Okay, yes, this is fine. I went through the docs now (very quickly) and it looks very good anyways. I can open issues after you've released Associations.jl v1 anyways.

Perfect.

I think this needs a central tutorial akin to ComplexityMeasures.jl, but I understand this would take also time :D

It does! We can just open an issue for it. For my part, it would be better to write this is a while, when I no longer have a fresh mental model of the package in my head. That'll make it easier to get into the package beginner mindset, and know what to focus on.

In the next semester I may be able to get PhD funding for someone working on Causal Graphs in cloudiness and if I get that they would also contribute and maintain this library. Fingers crossed!

Nice! Fingers crossed for grant money💰

@kahaaga kahaaga merged commit dff4a62 into main Jul 26, 2024
2 checks passed
@kahaaga kahaaga deleted the new_probs_complexitymeasures30 branch July 26, 2024 10:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants