-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix to_hex function handling of non-ascii characters #672
Conversation
Thanks. Yes that seems wrong and i guess also One tricky thing is that fq adds a "binary" type to jq but has to be careful to not expose it to "normal jq" , ex This also reminded me that the |
I'll look into
Yes, I noticed that it might be a bit brittle. Another possible approach would be to add a first-class binary type to gojq (since you are maintaining your own fork anyway), but that might be a big undertaking. |
Great, no hurry 👍
The gojq fork fq uses extends it with a JQValue interface so that you can add new types (with some limitations) or reimplement existing types (that is how decode value work). And early versions of fq actually was a bit more liberal with exposing binary and decode value types etc but it made existing jq code, like the jq standard library, have bugs or behave in unintuitive ways. But yes there might be better ways or doing things than what fq currently do, so please experiment if you feel like it. |
to_hex and to_base64 functions were expecting `string` as an input type, which caused the value to be CastFn'ed to string, which in turn resulted in raw bytes being cast to []rune (in makeDecodeValueOut). This operation replaces invalid UTF-8 bytes with 0xFFFD, which then were passed on to the hex/base64 encoders, resulting in incorrect output. This patch fixes it by expecting `any` as an input type, which allows the function to correctly read raw bytes of the input data.
I decided to add changes for I haven't found any other functions that are affected by the same problem. |
Ok! one commit is fine and makes sense too me. I did a quick check and didn't find anything else also. Thanks for good commit message and for adding tests |
to_hex function was expecting
string
as an input type, which caused the value to be CastFn'ed to string,which in turn resulted in raw bytes being cast to []rune (in makeDecodeValueOut). This operation replaces invalid UTF-8 bytes with 0xFFFD, which then were passed on to the hex encoder, resulting in incorrect output.
Example:
This patch fixes the problem by expecting
any
as an input type forto_hex
function, which allows it to correctly read raw bytes of the input data.