Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add corchatterjee as another rank-based correlation #379

Closed
juliohm opened this issue Jul 27, 2024 · 4 comments
Closed

Add corchatterjee as another rank-based correlation #379

juliohm opened this issue Jul 27, 2024 · 4 comments

Comments

@juliohm
Copy link

juliohm commented Jul 27, 2024

So nice to see more associations in Julia! 🎉 Statistics/StatsBase is stuck in time, and it is good to see people working actively in JuliaDynamics 👍🏽

Here is a feature request that I am porting from StatsBase.jl: JuliaStats/StatsBase.jl#934

If you ever need to move the package to a "ML" or "stats" organization, JuliaML is one option. I already added @Datseris there as a member and would be happy to add more people. Feel free to ping me.

@Datseris
Copy link
Member

Thanks!

I think Associations.jl makes much more sense to be in JuliaDynsmics if one looks at the pool of methods and their relation to nonlinear and complex timeseries analysis.

@kahaaga
Copy link
Member

kahaaga commented Jul 28, 2024

Hi there @juliohm! Thanks for reaching out.

The package is fairly general, so it could fit into any sort of ML/stats/dynamics organisation. It's just labels, after all. The main focus should be for it to be within an organisation where it can expect a reasonable engagement by the community. For now, I echo @Datseris: I think the package remains a good fit for JuliaDynamics, since the package is so tightly coupled with the rest of the JuliaDynamics ecosystem.

Regarding the Chatterjee coefficient, this paper is also relevant. They mention that the coefficient actually appears in an earlier work by Dette (2013). When implemented, the type should probably be called ChatterjeeCorrelation (for recognition), but also have an alias DetteCorrelation (or, at least, we should mention the duplicate work in the docstring).

@kahaaga
Copy link
Member

kahaaga commented Jul 31, 2024

@juliohm In PR #383, I've implemented the Chatterjee coefficient to the best of my ability. The PR is as done as it gets from my side.

Since this method was essentially new to me when you opened this issue, I'm naturally no expert in the method. Perhaps you would be interested in doing a short review of the PR before I merge?

If you don't have the opportunity to do a review, no worries. I'll then revisit the PR in a few days with fresh eyes, and then merge it if I can't spot any glaring errors.

Some comments

  • I don't know how my implementation relates to the Xicor.jl package you mentioned in the other issue. From briefly glancing at the source code, I see that they offer to use rank functions from StatsBase.jl. Here, I've implemented the ranking strictly as described in Chatterjee (2021): ordinal ranking with uniform random breaking of ties.
  • I compare the results for the non-tie version to the XICOR R-package, which gives identical results for the test cases I've tried. I can't directly compare the results when using the formula that takes into consideration ties, because the XICOR R-package doesn't allow user-specified rngs for the randomisation.
  • The ChatterjeeCorrelation struct store a bunch of arrays that are re-used during computation. As a result, the code may be a bit messier than needed for a naive implementation. Despite the source code being a but messier than for a naive implementation, it is way more performant for large inputs and Monte Carlo type testing. The SurrogateAssociationTest example in the PR essentially does zero allocations after initialization due to this design choice.
  • There's also potential for future improvements here, e.g. implement analytical p-value computations. But the basics are implemented, so that can be done in a separate PR.

@juliohm
Copy link
Author

juliohm commented Jul 31, 2024

That is really amazing @kahaaga 💯 Thanks for the nice addition. Will certainly use it in the future when I find an opportunity. Please feel free to go ahead with the merge and release.

@kahaaga kahaaga closed this as completed Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants