-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enry changes the input byte slice #196
Comments
Copy the input byte slice to defend from src-d/enry#196 Signed-off-by: Vadim Markovtsev <vadim@sourced.tech>
$ go run repro.go < ParabolicPointer.cs
C#
before eb7239ae3fc314c45ac58bf03abbd19aff2a0243
after 37a2b661c1e43cb2ed34b51c924e424278233fb0 I haven't yet figured out which strategy is the culprit, but I'm working on narrowing it down. |
Final decision is made by the DefaultClassifier which seems to be cause of this. |
Specifically, the problem happens in the guts of the The tokens should probably have been strings instead of slices, since that's never really safe when you're making changes. But for now I think we can make a reasonable fix of this by having the tokenizer operate on a copy. |
I've started a fairly straightforward test+fix in #197. |
enry.GetLanguage
changes the contents of its second argument - the byte slice. How to reproduce:reproduce.go
You should see
ParabolicPointer.cs: ParabolicPointer.zip
This is the root cause of src-d/hercules#178
The text was updated successfully, but these errors were encountered: