-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rank-based outcome space #421
Comments
What's the index...? I must misunderstood something because I can't imagine a scenario where the length of the indices is not the same as the length of the data. |
This is free as Julia stores objects by reference. |
Sorry, I typed before thinking. What I meant is: If Does that clarify it? |
right, yeah this makes sense. |
Describe the feature you'd like to have
While implementing the Chatterjee coefficient of correlation in Associations.jl, I needed to compute ranks of an input vector, according to some method. That made me think: why not have an encoding and an outcome space in ComplexityMeasures.jl that does the same? I imagine something like this:
The encoding takes as an input the input data, which allows a mapping encoding the raw values
xi
onto an indexidx_i
, and decode that index ontoxi
. Iflength(x) == length(rank(x))
, then there are as many outcomes as there are values inx
(so the outcome space is a bit useless for complexity quantification). However, in datasets with repetitions, we can havelength(rank(x)) << length(x)
. Then each rank index is an outcome, and we meaningfully estimate counts/probabilities by counting how many input data points map to each rank index.Cite scientific papers related to the feature/algorithm
I have no references atm. This may or may not have been done by someone before.
If possible, sketch out an implementation strategy
Relatively straight-forward: follow the dev docs on implementing the encoding and outcome space.
A few notes:
T <: RankType
). StatsBase.jl implements a few such ranking methods we can use for inspiration. I also implemented a randomization-tie-breaking variant for the Chatterjee coefficient that we would use.The text was updated successfully, but these errors were encountered: