-
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-8 invalid characters are not always ignored when dumping with error_handler_t::ignore #4552
Comments
Yes, there seems to be a logic error. bytes = bytes_after_last_accept; but |
Interesting, I'll take a look perhaps. |
I had another look at the issue and this needs some discussion:
What do you think? |
I think adding a new enumerator would be the best in this case then. Fixing the semantics of error_handler_t::ignore would make anyone who currently depends on it's current behavior unhappy. and we can probably update the docs to make it more accurate. |
Yes I also think adding a new enumerator which preserves invalid UTF8 makes sense. |
Thanks for the input. Any preference on the name? |
Sounds good! |
Yeah keep sounds good. |
@gentooise @t-b @jordan-hoang Please take a look at #4555. |
Description
According to this: https://json.nlohmann.me/api/basic_json/dump/#parameters , when passing
error_handler_t::ignore
todump()
function, invalid UTF-8 characters should be ignored and copied as-is into the final string.However, I'm debugging the following minimal code:
and the final
test_dump
string containstest\005
(byte\334
is gone).Is this expected? Am I missing something?
Reproduction steps
Just try to run/debug the following:
Expected vs. actual results
Actual:
test_dump
containstest\005
Expected:
test_dump
containstest\334\005
Minimal code example
Error messages
No response
Compiler and operating system
gcc (Alpine 12.2.1_git20220924-r10) 12.2.1 20220924
Library version
3.11.2
Validation
develop
branch is used.The text was updated successfully, but these errors were encountered: