Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rank vector values #16

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,52 @@ Multiple groups
df.group([:a, :b]).count
```

## Ranking

Rank return a vector describing each vector element's rank compared to the other elements (ties get the same rank but increment the next ranked item's rank)

```ruby
df[:a].rank
# short-hand for df[:a].rank(ascending=true)
```
Default ranking is in ascending order, where higher numbers are considered better (lower) rank. Example: "number of widgets sold", where a higher number is "better".

Rank in desdending order, where lower numbers are considered better (lower) rank. Example: "number of widgets returned by customer", where a lower number is "better".

```ruby
df[:a].rank(ascending=false)
```

TO DO: consider allowing any datatype that is comparable (has <=> implemented). Currently does not work on vectors of Strings, although in principle it could. Awaiting a use-case.

## Best_In, Worst_In
Considering the last element of a ranked vector, this returns the number of elements it is consecutively better than until it meets a better ranked element.

```ruby
df[:a].best_in
# same as df[:a].best_in(ascending=true)
```

returns 0 if it isn't the better than its previous value or it exhaust previous values in the vector

This answers the question, "the best value in N periods". Given a vector of 13 weeks of sales, and the last value has a rank of 2, then this returns how many weeks back from that one has to go to find a better value (here, of rank 1)

As you might expect, you can specify a direction for the ranking to be used by best_in:

```ruby
df[:a].best_in(ascending=false)
```

Sometimes you want to know if it's the worst value in some number of periods:

```ruby
df[:a].worst_in
```

This and its variants are the complement to the best_in() method.

TO DO: allow user to specify which value to start looking back from, rather than only the last one. Awaiting a use-case.

## Visualization

Add [Vega](https://github.com/ankane/vega) to your application’s Gemfile:
Expand Down
46 changes: 46 additions & 0 deletions lib/rover/vector.rb
Original file line number Diff line number Diff line change
Expand Up @@ -331,6 +331,52 @@ def to_html
end
end

# ranking the values in the Vector
def rank(ascending=true)
# return a Vector reporting the ranking of the input vector, in same order as original
raise ArgumentError, "All elements must be numeric" unless all? { |vi| vi.is_a?(Numeric) }
ascending = true unless !!ascending == ascending
data = @data.to_a.map { |x| x != x ? nil : x} # converts any NaN to nil; makes sorting easier

if ascending # default
sorted = data.sort { |a,b| a && b ? b <=> a : a ? -1 : 1 } # puts nulls at the end
else
sorted = data.sort { |a,b| a && b ? a <=> b : a ? -1 : 1 } # puts nulls at the end
end

Vector.new( data.map{ |e| e ? sorted.index(e)+1 : nil } )
end

def best_in(ascending=true)
# based on the last value of the input vector
# this returns the number of elements the last value is better than
# useful when the vector represents, for example, time-ordered data (e.g., "best value in 3 weeks!")
ascending = true unless !!ascending == ascending
arr = rank(ascending=ascending).to_a.reverse
idx = 0
arr.each do |e|
return idx if idx >= arr.length
return idx if arr[0] > e
idx +=1
end
return idx
end

def worst_in(ascending=true)
# this is a compementary method to best_in
# useful when the vector represents, for example, time-ordered data (e.g., "worst value in 3 weeks!")
# a worst_in() is like calling a best_in() in the opposite direction
ascending = true unless !!ascending == ascending
arr = rank(ascending=ascending).to_a.reverse
idx = 0
arr.each do |e|
return idx if idx >= arr.length
return idx if arr[0] < e
idx +=1
end
return idx
end

private

def cast_data(data, type: nil)
Expand Down
50 changes: 50 additions & 0 deletions test/data_frame_test.rb
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,56 @@ def test_sum
assert_equal 9, df.sum("a")
end

def test_rank
df = Rover::DataFrame.new({"a" => [2, 1, 13, 10]})
assert_equal Rover::Vector.new([3,4,1,2]), df["a"].rank()
end

def test_rank_with_repeated
df = Rover::DataFrame.new({"a" => [2, 1, 13, 1, 10]})
assert_equal Rover::Vector.new([3,4,1,4,2]), df["a"].rank()
end

def test_rank_with_nil
df = Rover::DataFrame.new({"a" => [2, nil, 1, nil, 13]})
assert_equal Rover::Vector.new([2,nil,3,nil,1]), df["a"].rank()
end

def test_rank_with_nil_2
df = Rover::DataFrame.new({"a" => [2, nil, 2, nil, 13]})
assert_equal Rover::Vector.new([2,nil,2,nil,1]), df["a"].rank()
end

def test_rank_descending
df = Rover::DataFrame.new({"a" => [2, 1, 13, 10]})
assert_equal Rover::Vector.new([2,1,4,3]), df["a"].rank(ascending=false)
end

def test_rank_descending_with_nil
df = Rover::DataFrame.new({"a" => [2, nil, 1, nil, 13]})
assert_equal Rover::Vector.new([2,nil,1,nil,3]), df["a"].rank(ascending=false)
end

def test_best_in
df = Rover::DataFrame.new({"a" => [1,10,3,2]})
assert_equal 1, df["a"].best_in
end

def test_best_in_descending
df = Rover::DataFrame.new({"a" => [1,13,12,7]})
assert_equal 3, df["a"].best_in(ascending=false)
end

def test_worst_in
df = Rover::DataFrame.new({"a" => [1,10,3,2]})
assert_equal 3, df["a"].worst_in()
end

def test_worst_in_descending
df = Rover::DataFrame.new({"a" => [15,13,12,7]})
assert_equal 1, df["a"].worst_in(ascending=false)
end

# TODO better test
def test_sample
df = Rover::DataFrame.new({"a" => [1, 2, 3], "b" => ["one", "two", "three"]})
Expand Down