Similarities between documents and query may be >1

The README claims that similarities between documents and queries shouldn't be greater than 1. However:

``` python
table = tfidf.tfidf()
table.addDocument("foo", ["alpha", "bravo", "charlie", "delta", "echo", "foxtrot", "golf", "hotel"])
table.addDocument("bar", ["alpha", "bravo", "charlie", "india", "juliet", "kilo"])
table.addDocument("baz", ["kilo", "lima", "mike", "november"])
print table.similarities (["alpha", "bravo", "charlie", "india"])
```

Yields `[['foo', 0.5625], ['bar', 1.0416666666666665], ['baz', 0.0]]`. Whoops!

This is happening because the query isn't being normalized. The ranking of results should still be correct, but it'd be better if we normalized it so we can make guarantees about the output.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Similarities between documents and query may be >1 #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Similarities between documents and query may be >1 #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions