You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the Ops.ngrams returns the incorrect number of ngrams. An N-gram on a list of length K, should return something of length K - (N-1). However, it returns something of length K - N.
E.g. a 2-gram on a list of tokens: "this is text" should return [(this, is), (is, text)], which has length 2. However it returns something of length 1.
My hunch is that in L349 and L352, n should be (n-1), or something similar.
Thanks for the report, that does look like a bug. Since there aren't any unit tests I have to say I'm a bit unsure if there might be some reason for this or details I've misunderstood, but it does look like it's always missing the final ngram. We'll look into it!
I noticed that the Ops.ngrams returns the incorrect number of ngrams. An N-gram on a list of length K, should return something of length K - (N-1). However, it returns something of length K - N.
E.g. a 2-gram on a list of tokens: "this is text" should return [(this, is), (is, text)], which has length 2. However it returns something of length 1.
My hunch is that in L349 and L352, n should be (n-1), or something similar.
thinc/thinc/backends/numpy_ops.pyx
Lines 347 to 354 in 501552c
The text was updated successfully, but these errors were encountered: