-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Chatterjee Correlation Coefficient #770
Conversation
@mborland : This might be one of the most cool and unique features we have in the library; thanks for taking this on. |
@mborland : Do you know how to quickly recover the rank of Y_{(i)}? That's one part of the paper I didn't quite understand. |
If you look at |
Ah, that's a good idea. Might want to have two:
which asserts |
@NAThompson here is the performance data:
|
@mborland : Beautiful nlog(n) complexity just as expected. BTW looks like you accidently committed a binary file. |
[ci skip]
[ci skip]
[ci skip]
[ci skip]
[ci skip]
[ci skip]
@NAThompson This is good for review. The only failure in the previous run was fixing a non-ASCII character in a comment. |
|
||
This is the problem Chatterjee's coefficient solves. | ||
Let X and Y be random variables, where Y is not constant, and let (X_i, Y_i) be samples from this distribution. | ||
Rearrange these samples so that X_(0) < X_{(1)} < ... X_{(n-1)} and create (X_{(i)}, Y_{(i)}). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the limit of an infinite amount of i.i.d data, the statistic lies in [0, 1]. | ||
However, if the data is not infinite, the statistic may be negative. | ||
If X and Y are independent, the value is zero, and if Y is a measurable function of X, then the statistic is unity. | ||
The complexity is O(n log n). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to get O(n log n) nicely rendered?
@mborland : Just put some trivial comments in-I think this is basically good to go. One last thing: Is there another really clean unit test we could add? I wonder if figure 2 of the attached could be used "morally" to simply ensure that we haven't done any silly scaling errors, i.e., just let Y = X, and attempt to show ξ ≈ 0.970, let Y = X^2 and show ξ ≈ 0.941, and Y = sin(X) and show ξ≈0.885. |
@mborland : Looks good to me; I sign off! @jzmaddock : Want to do a final sign off? |
This is good to go now. Autodiff has been consistently hanging in the drone run under USAN. |
See: https://arxiv.org/pdf/1909.10140.pdf