Rank vector values #16

jcquarto · 2021-11-12T03:40:55Z

Ranking vector values

I often need to rank the values in a DF column (ie Vector). Esp when they represent values in an ordered way, such as a times series.

new methods for Vectors:

df[:a].rank() : returns a Vector of the same length as df[:a] with values of each element's ranked order (1 for first, 2 for second, etc.). Ties result in the same rank, but increment the rank of those below it.

So if the original vector is like [20,50,42,11], calling rank() on it will return a vector like [3,1,2,4], because 20 is the 3rd highest value, 50 is the 1st highest value, etc. An example of a tie would be [42,50,42,20] returning a ranking of [2,1,2,4], since there are two 42s they use up both the 2nd and 3rd spot (this is semi-standard way of dealing with ties when ranking)

sometimes you want to rank in reverse order , so lower values are "better". In this case use df[:a].rank(ascending=false)

more methods:

I also often want to know, "given the last value in this time series that is the best one in how many periods?", for example, "this weeks sales of Widget X is the highest in 3 weeks!".

df[:a].best_in() : returns the number of elements back from the last one has to go to find an element ranked better than the last.

Returning to the original example
if the original vector is like [20,50,42,11], calling best_in() on it returns 1, because the default ranking is ascending and the last value is the worst of the 4 elements. it's the best since ...itself. But if the ranking is in descending order, then the last value is the best value of all of them so it would return 4 : "the best in 4 periods". For this case use best_in(ascending=false)

as a mirror image, there is a method
df[:a].worst_in() : returns the number of elements back from the last one has to go to find an element ranked worst than the last.

This can be useful for red flags such as "this months sales numbers are the worst in 5 months!"

Also updated: the README.md file, with (hopefully) better written documentation that I wrong above. There are also tests for all this.

comments on code

Since the Vector class converts nils to NaNs, it can be a bit problematic to deal with those when doing <=> sorting and comparisons. So my code converts the incoming vector to an array for the purposes of ranking and then converts back for output for the rank() method. However both best_in and worst_in return integers

jcquarto added 4 commits November 11, 2021 20:52

first release of ranking method on Vectors

1fe8a43

tweak, need to reverse the array to make best_in easier

1eaf9ec

dont be clever, write clear code

f87362b

simplification to mirror Pandas for ascending

dccedc4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank vector values #16

Rank vector values #16

jcquarto commented Nov 12, 2021 •

edited

Loading

Rank vector values #16

Are you sure you want to change the base?

Rank vector values #16

Conversation

jcquarto commented Nov 12, 2021 • edited Loading

Ranking vector values

new methods for Vectors:

more methods:

comments on code

jcquarto commented Nov 12, 2021 •

edited

Loading