Convert utf8->latin1 before decoding JSON-RPC payloads #353

zachallaun · 2023-09-01T18:26:57Z

This (hopefully) fixes #287.

I'm not completely sure why this is occurring, but non-ascii characters were seemingly double-encoded, at least on my system (Ubuntu 22.04 via WSL, VS Code desktop on Windows). I've confirmed with :io.getopts() right before IO.read/2 is called that the encoding is set to :latin1, but the result after JsonRpc.decode/1 was that text would be utf8 double-encoded -- that is, it was what you would expect from:

latin1_data
|> :unicode.characters_to_binary(:latin1, :utf8)
|> :unicode.characters_to_binary(:latin1, :utf8)

This would result in Document.Line text containing more bytes than it should, which causes text edits to fail, leading behind random extra bytes.

I don't love this "fix" because it feels very much like a band-aid, not addressing whatever the root issue is, but I'd also rather things work in documents containing multi-byte characters.

This (hopefully) fixes #287. I'm not completely sure why this is occurring, but non-ascii characters were seemingly double-encoded, at least on my system (Ubuntu 22.04 via WSL, VS Code desktop on Windows). I've confirmed with `:io.getopts()` right before `IO.read/2` is called that the encoding is set to `:latin1`, but the result after `JsonRpc.decode/1` was that text would be utf8 double-encoded -- that is, it was what you would expect from: latin1_data |> :unicode.characters_to_binary(:latin1, :utf8) |> :unicode.characters_to_binary(:latin1, :utf8) This would result in `Document.Line` text containing more bytes than it should, which causes text edits to fail, leading behind random extra bytes. I don't love this "fix" because it feels very much like a band-aid, not addressing whatever the root issue is, but I'd also rather things work in documents containing multi-byte characters.

scottming · 2023-09-02T00:50:51Z

I tried it and it seems to fix the Chinese encoding issue and it fixes #171 as well:

I can get more people to test it.

apps/server/lib/lexical/server/transport/std_io.ex

zachallaun requested a review from scohen September 1, 2023 18:27

zachallaun force-pushed the za/issue-287 branch from 753c64f to 46172f4 Compare September 1, 2023 18:30

scottming approved these changes Sep 2, 2023

View reviewed changes

scohen reviewed Sep 3, 2023

View reviewed changes

apps/server/lib/lexical/server/transport/std_io.ex Outdated Show resolved Hide resolved

Update comment explaining utf8->latin1

799d7e2

scohen merged commit 249f476 into main Sep 3, 2023

scohen deleted the za/issue-287 branch September 3, 2023 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert utf8->latin1 before decoding JSON-RPC payloads #353

Convert utf8->latin1 before decoding JSON-RPC payloads #353

zachallaun commented Sep 1, 2023

scottming commented Sep 2, 2023

Convert utf8->latin1 before decoding JSON-RPC payloads #353

Convert utf8->latin1 before decoding JSON-RPC payloads #353

Conversation

zachallaun commented Sep 1, 2023

scottming commented Sep 2, 2023