-
-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
json: Output not guaranteed to be valid UTF-8 #6151
Comments
related: #2273 |
Docs say
Even though it appears by default
iirc Borg uses the resolver to reverse 127.0.0.1 or something like that, so hypothetically it should be ASCII, though some other ways to determine the hostname involve parsing /etc/hosts or sticking /etc/hostname in front of the search domain from /etc/resolv.conf -- the latter two don't cause hangs during network problems. So... yeah... just offering a b64 field for everything that could be not-strictly-text is a good idea. The b-fields would only work with a receiver that also supports Python-style surrogate escaping. Likely never used. |
@ThomasWaldmann I would suggest removing the b-fields from 2.0 because they can deliver invalid JSON. Potentially add base64 or something like that instead. |
Actually, for 2.0, maybe it makes sense to just not allow binary values for things like comments, etc.
|
#7197 barchive, bcomment, bpath were all removed.
|
List of affected Item attributes:
List of affected Archive attributes:
|
binary bytes: - json_key = <key>_b64 - json_value == base64(value) text (potentially with surrogate escapes): - json_key1 = <key> - json_value1 = value_text (s-e replaced by ?) - json_key2 = <key>_b64 - json_value2 = base64(value_binary) json_key2/_value2 is only present if value_text required replacement of surrogate escapes (and thus does not represent the original value, but just an approximation). value_binary then gives the original bytes value (e.g. a non-utf8 bytes sequence).
binary bytes: - json_key = <key>_b64 - json_value == base64(value) text (potentially with surrogate escapes): - json_key1 = <key> - json_value1 = value_text (s-e replaced by ?) - json_key2 = <key>_b64 - json_value2 = base64(value_binary) json_key2/_value2 is only present if value_text required replacement of surrogate escapes (and thus does not represent the original value, but just an approximation). value_binary then gives the original bytes value (e.g. a non-utf8 bytes sequence).
item: path, source, user, group for non-unicode stuff borg 1.2 had "bpath". now we have: path - unicode approximation (invalid stuff replaced by ?) path_b64 - base64(path_bytes) # only if needed source has the same issue as path and is now covered also. user and group are usually unicode or even pure ASCII, but we rather are cautious and cover them also.
binary bytes: - json_key = <key>_b64 - json_value == base64(value) text (potentially with surrogate escapes): - json_key1 = <key> - json_value1 = value_text (s-e replaced by ?) - json_key2 = <key>_b64 - json_value2 = base64(value_binary) json_key2/_value2 is only present if value_text required replacement of surrogate escapes (and thus does not represent the original value, but just an approximation). value_binary then gives the original bytes value (e.g. a non-utf8 bytes sequence).
considering there was no feedback, i hope everyone is happy with it. will merge the PR soon. |
The JSON standard requires strings to be valid UTF-8. There are a bunch of JSON API fields that can contain invalid strings.
b
-prefixed fields "intentionally" output invalid stringsborg info --json
seems to output everything without removing invalid strings parts (for examplename
orcommand_line
)Maybe it would be good if
borg info
andborg list
are based on the same metadata source to not do the escape etc. at two points in code.My current suggestion for 1.2:
b
-prefixed fields and do not add new ones toborg info
. Instead, theborg info --json
output has a slightly incompatible change to remove surrogates.barchive
is no longer contained inborg list
for archives per default. Not using the field but having it in the output could already break JSON parsers. It can be added by--fomat {barchive}
.base64_
prefixed fields, probably for everything string. I'm not sure, but I wouldn't be surprised if fields likehostname
can also be invalid UTF-8 in theory.I'm also fine with dropping the
b
-prefixed fields. Not sure if that breaks anything.The text was updated successfully, but these errors were encountered: