You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
jq replaces some valid utf8 characters read from stdin by U+FFFD (internally and in its output).
To Reproduce
There is no U+FFFD in the stdin: cat test_raw.txt | grep '�'
But there are U+FFFD characters in the output: cat test_raw.txt | jq --raw-input --raw-output '.' | grep '�'
as I understand 'Ё' characters in the example file can be replaced with any multi-byte character and issue will be still reproducible
it is required to use a string with length greater than jq reading buffer size (4096) to reproduce the issue
Expected behavior
jq keeps strings as it is.
In the specified case, produces the same results as the command below (reading not from stdin, but from file): jq --null-input --raw-output '$f | .' --rawfile f test_raw.txt | grep '�'
Environment:
Arch Linux
jq version 1.6
Additional notes
Looks like the same issue was fixed for reading from files: e84d171
Probably, jq_util_read_more function reads only part of the utf8 character to the buffer (truncation by the 4096 buffer size leads to incomplete character reading).
The text was updated successfully, but these errors were encountered:
Describe the bug
jq replaces some valid utf8 characters read from stdin by U+FFFD (internally and in its output).
To Reproduce
There is no U+FFFD in the stdin:
cat test_raw.txt | grep '�'
But there are U+FFFD characters in the output:
cat test_raw.txt | jq --raw-input --raw-output '.' | grep '�'
test_raw.txt
Expected behavior
jq keeps strings as it is.
In the specified case, produces the same results as the command below (reading not from stdin, but from file):
jq --null-input --raw-output '$f | .' --rawfile f test_raw.txt | grep '�'
Environment:
Additional notes
Looks like the same issue was fixed for reading from files: e84d171
As I found, wrong characters replacing occurs here: https://github.com/stedolan/jq/blob/master/src/util.c:
value = jv_string_concat(value, jv_string_sized(state->buf, state->buf_valid_len));
state->buf[0] = '\0';
state->buf_valid_len = 0;
Probably, jq_util_read_more function reads only part of the utf8 character to the buffer (truncation by the 4096 buffer size leads to incomplete character reading).
The text was updated successfully, but these errors were encountered: