Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return value of numpy_ops.ngrams has incorrect length #512

Closed
m-alek opened this issue Jun 17, 2021 · 1 comment · Fixed by #514
Closed

Return value of numpy_ops.ngrams has incorrect length #512

m-alek opened this issue Jun 17, 2021 · 1 comment · Fixed by #514
Labels
bug Bugs and behaviour differing from documentation feat / ops Backends and maths

Comments

@m-alek
Copy link

m-alek commented Jun 17, 2021

I noticed that the Ops.ngrams returns the incorrect number of ngrams. An N-gram on a list of length K, should return something of length K - (N-1). However, it returns something of length K - N.

E.g. a 2-gram on a list of tokens: "this is text" should return [(this, is), (is, text)], which has length 2. However it returns something of length 1.

My hunch is that in L349 and L352, n should be (n-1), or something similar.

def ngrams(self, int n, const uint64_t[::1] keys):
keys_ = <uint64_t*>&keys[0]
length = max(0, keys.shape[0]-n)
cdef np.ndarray output_ = self.alloc((length,), dtype="uint64")
output = <uint64_t*>output_.data
for i in range(keys.shape[0]-n):
output[i] = hash64(&keys_[i], n*sizeof(keys_[0]), 0)
return output_

@adrianeboyd adrianeboyd added bug Bugs and behaviour differing from documentation feat / ops Backends and maths labels Jun 21, 2021
@adrianeboyd
Copy link
Contributor

Thanks for the report, that does look like a bug. Since there aren't any unit tests I have to say I'm a bit unsure if there might be some reason for this or details I've misunderstood, but it does look like it's always missing the final ngram. We'll look into it!

@adrianeboyd adrianeboyd linked a pull request Jun 24, 2021 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bugs and behaviour differing from documentation feat / ops Backends and maths
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants