Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 data corruption read from stdin (erroneous substitution by U+FFFD) #2259

Open
ChCyrill opened this issue Jan 31, 2021 · 0 comments
Open
Labels

Comments

@ChCyrill
Copy link

Describe the bug
jq replaces some valid utf8 characters read from stdin by U+FFFD (internally and in its output).

To Reproduce
There is no U+FFFD in the stdin:
cat test_raw.txt | grep '�'
But there are U+FFFD characters in the output:
cat test_raw.txt | jq --raw-input --raw-output '.' | grep '�'

test_raw.txt

  • as I understand 'Ё' characters in the example file can be replaced with any multi-byte character and issue will be still reproducible
  • it is required to use a string with length greater than jq reading buffer size (4096) to reproduce the issue

Expected behavior
jq keeps strings as it is.
In the specified case, produces the same results as the command below (reading not from stdin, but from file):
jq --null-input --raw-output '$f | .' --rawfile f test_raw.txt | grep '�'

Environment:

  • Arch Linux
  • jq version 1.6

Additional notes

Looks like the same issue was fixed for reading from files: e84d171

As I found, wrong characters replacing occurs here: https://github.com/stedolan/jq/blob/master/src/util.c:
value = jv_string_concat(value, jv_string_sized(state->buf, state->buf_valid_len));
state->buf[0] = '\0';
state->buf_valid_len = 0;

Probably, jq_util_read_more function reads only part of the utf8 character to the buffer (truncation by the 4096 buffer size leads to incomplete character reading).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants