JSON API: Encoding? #2273

enkore · 2017-03-08T16:14:59Z

Or we fix IO encoding to UTF-8 (irrespective of locale) in JSON mode, which probably makes more sense and is less error prone for downstream developers anyway.

I.e., --log-json → stderr text is always UTF-8, stdin text (so not "borg create -") is always UTF-8 (prompts, passwords), env-vars are read as UTF-8 (dito)

--json → stdout text is always UTF-8

ThomasWaldmann · 2017-08-15T17:24:19Z

As this is a quite fundamental change, guess it should get some testing, in rc2.

Guess we won't have problems if the json is written to a file, not so sure about when it gets output on the console. Or piped and processed badly by other side.

enkore · 2017-08-16T19:37:05Z

Guess we won't have problems if the json is written to a file

When you write text (strings) to stdout/stderr, then they are encoded to bytes using an encoding guessed by Python. That's independent of whether stdout is connected to a TTY/terminal or redirected to a file.

ThomasWaldmann · 2017-08-16T20:05:50Z

Yes, but you suggested to always use utf-8.

So how does e.g. a cygwin console or latin1/ascii console react when you output utf-8 on it?
Guess we could live with funny characters, but it shouldn't crash or hang.

Or when doing borg --json | othertool (and othertool guesses encoding), it might guess wrong when utf-8 is not the native system / fs encoding? If othertool is specialized on borg, it would use the right encoding, but if not, could it be told to use utf-8?

enkore · 2017-08-18T14:09:18Z

Depends on #2925;

Note: borg list already uses UTF-8 regardless of system preference (via safe_encode), but only for listing archive contents.

Yes, but you suggested to always use utf-8.

The alternative is to make step one of using --json: "Replicate the way Python guesses encodings [which changes over Python releases]." i.e. "Use Python.". That's not acceptable.

RonnyPfannschmidt · 2017-08-18T14:22:14Z

well - a completely ascii-save way could be to do unicode-escape then all unicode is escaped as \u...

knutov · 2017-08-25T00:48:48Z

just as an idea - why not to skip support for latin1 and other non-unicode terminals now?

latin1 symbols inside utf8 will look the same on latin1 terminal I suppose. Other symbols will not be readable anyway, there is no good solution for this. And there is iconv for those who need 8bit encodings and knows what he is doing.

enkore · 2017-08-27T15:52:49Z

I prepared an initial, functionally incomplete patch I was completely dissatisfied with. I've been working to fix this for good by replacing most of these interactions (os.environ, input/yes, get_passphrase, ...) to use a iosys-class that determines encoding (from Python) and decodes stuff. But this is still incomplete and touches many of the more annoying parts of the code, so it may be reasonable to just go forward with rc2 and perhaps even 1.1.0 without having this resolved yet — on most (Linux/BSD) systems it will "mostly just work", because UTF-8 is a very widespread locale codeset and typically assumed. (OpenBSD has an especially good grip on things here for a Unix, because they only support UTF-8 and 7-bit ASCII). In this case it may be best to add a short note in the docs to say that encoding will be finalized to UTF-8 later.

This will fall apart on Linux when no locale is configured (because Python will fallback to 7-bit ASCII), or glibc things no locale is configured, or considers the configuration invalid (e.g. partial or missing locale files). And of course every locale that is not UTF-8.

ThomasWaldmann · 2017-08-27T19:05:04Z

OK, so let's have some docs now and the fix later.

…gbackup#3009) (cherry picked from commit 133e847)

document utf-8 locale requirement for json mode, #2273 (#3009)

ThomasWaldmann · 2017-09-18T03:19:13Z

From https://docs.python.org/3.4/library/sys.html#sys.stdin / sys.stdout / sys.stderr:

The character encoding is platform-dependent.
Under Windows, if the stream is interactive (that is, if its isatty() method returns
True), the console codepage is used, otherwise the ANSI code page.
Under other platforms, the locale encoding is used (see locale.getpreferredencoding()).

Under all platforms though, you can override this value by setting the
PYTHONIOENCODING environment variable before starting Python.

More recent docs:

hexagonrecursion · 2021-12-07T10:18:01Z

OK, so let's have some docs now and the fix for 1.1.1 or so.

The docs were added Sep 9, 2017 #3019 (document utf-8 locale requirement for json mode). It looks like you forgot to remove the "documentation" label

ThomasWaldmann · 2021-12-07T10:42:49Z

Thanks for the hint, I removed the documentation label.

ThomasWaldmann · 2023-01-29T17:46:20Z

For stdin/stdout/stderr and JSON emitted on stdout (see frontends.rst), guess we could extend #3019 and just point there from the docs, so users invoking borg can adjust their environment variables if they do not use a locale with utf-8 encoding already:

https://docs.python.org/3.8/library/sys.html#sys.stdin reads:

Under all platforms, you can override the character encoding by setting the PYTHONIOENCODING environment variable before starting Python or by using the new -X utf8 command line option and PYTHONUTF8 environment variable. However, for the Windows console, this only applies when PYTHONLEGACYWINDOWSSTDIO is also set.

ThomasWaldmann · 2023-01-29T18:41:18Z

Hmm, did I miss something or can this "fix" just be recommending to use PYTHONIOENCODING=utf-8 if one expects (JSON) streams to be in utf-8 on a legacy OS installation that does not use a utf-8 locale already?

PYTHONUTF8 seems way too intrusive and influences how a lot of stuff works - this could break stuff that worked before.

document another way to get UTF-8 encoding on stdin/stdout/stderr, fixes #2273

enkore mentioned this issue Mar 9, 2017

Define a public API #654

Closed

enkore added the c: json api label Apr 1, 2017

enkore added this to the 1.1 - near future goals milestone Aug 6, 2017

ThomasWaldmann modified the milestones: 1.1.0rc2, 1.1 - near future goals Aug 15, 2017

enkore self-assigned this Aug 18, 2017

ThomasWaldmann modified the milestones: 1.1 - near future goals, 1.1.0rc2 Aug 27, 2017

ThomasWaldmann added the documentation label Sep 7, 2017

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Sep 7, 2017

document utf-8 locale requirement for json mode, fixes borgbackup#2273

11b8f49

enkore added the bug label Sep 7, 2017

enkore pushed a commit that referenced this issue Sep 8, 2017

document utf-8 locale requirement for json mode, #2273 (#3009)

133e847

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Sep 8, 2017

document utf-8 locale requirement for json mode, borgbackup#2273 (bor…

a990605

…gbackup#3009) (cherry picked from commit 133e847)

ThomasWaldmann added a commit that referenced this issue Sep 9, 2017

Merge pull request #3019 from ThomasWaldmann/json-utf8-locale-1.1

cd107c6

document utf-8 locale requirement for json mode, #2273 (#3009)

enkore modified the milestones: 1.1.x, 1.1.0 release Sep 9, 2017

enkore removed their assignment Oct 14, 2017

ThomasWaldmann modified the milestones: 1.1.1rc1, 1.1.2rc1 Oct 14, 2017

ThomasWaldmann removed this from the 1.1.2rc1 milestone Nov 4, 2017

ThomasWaldmann added this to the 1.1.x milestone Nov 4, 2017

ghost mentioned this issue Aug 26, 2021

backport #5969

Closed

ThomasWaldmann removed the documentation label Dec 7, 2021

ThomasWaldmann mentioned this issue Jan 22, 2022

json: Output not guaranteed to be valid UTF-8 #6151

Closed

ThomasWaldmann modified the milestones: 1.1.x, 1.2.x Feb 20, 2022

ThomasWaldmann modified the milestones: 1.2.x, 2.x Jun 26, 2022

ThomasWaldmann modified the milestones: 2.x, 2.0.0b5 Jan 29, 2023

borgbackup deleted a comment from enkore Jan 29, 2023

ThomasWaldmann self-assigned this Feb 1, 2023

ThomasWaldmann closed this as completed in 856d98c Feb 1, 2023

ThomasWaldmann added a commit that referenced this issue Feb 1, 2023

Merge pull request #7315 from ThomasWaldmann/pythonioencoding

f25f6a8

document another way to get UTF-8 encoding on stdin/stdout/stderr, fixes #2273

enkore mentioned this issue May 24, 2023

borg2: it's coming! #6602

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JSON API: Encoding? #2273

JSON API: Encoding? #2273

enkore commented Mar 8, 2017

ThomasWaldmann commented Aug 15, 2017

enkore commented Aug 16, 2017

ThomasWaldmann commented Aug 16, 2017

enkore commented Aug 18, 2017 •

edited

Loading

RonnyPfannschmidt commented Aug 18, 2017

knutov commented Aug 25, 2017 •

edited by ThomasWaldmann

Loading

enkore commented Aug 27, 2017 •

edited

Loading

ThomasWaldmann commented Aug 27, 2017 •

edited

Loading

ThomasWaldmann commented Sep 18, 2017 •

edited

Loading

hexagonrecursion commented Dec 7, 2021 •

edited

Loading

ThomasWaldmann commented Dec 7, 2021

ThomasWaldmann commented Jan 29, 2023 •

edited

Loading

ThomasWaldmann commented Jan 29, 2023

JSON API: Encoding? #2273

JSON API: Encoding? #2273

Comments

enkore commented Mar 8, 2017

ThomasWaldmann commented Aug 15, 2017

enkore commented Aug 16, 2017

ThomasWaldmann commented Aug 16, 2017

enkore commented Aug 18, 2017 • edited Loading

RonnyPfannschmidt commented Aug 18, 2017

knutov commented Aug 25, 2017 • edited by ThomasWaldmann Loading

enkore commented Aug 27, 2017 • edited Loading

ThomasWaldmann commented Aug 27, 2017 • edited Loading

ThomasWaldmann commented Sep 18, 2017 • edited Loading

hexagonrecursion commented Dec 7, 2021 • edited Loading

ThomasWaldmann commented Dec 7, 2021

ThomasWaldmann commented Jan 29, 2023 • edited Loading

ThomasWaldmann commented Jan 29, 2023

enkore commented Aug 18, 2017 •

edited

Loading

knutov commented Aug 25, 2017 •

edited by ThomasWaldmann

Loading

enkore commented Aug 27, 2017 •

edited

Loading

ThomasWaldmann commented Aug 27, 2017 •

edited

Loading

ThomasWaldmann commented Sep 18, 2017 •

edited

Loading

hexagonrecursion commented Dec 7, 2021 •

edited

Loading

ThomasWaldmann commented Jan 29, 2023 •

edited

Loading