Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert utf8->latin1 before decoding JSON-RPC payloads #353

Merged
merged 2 commits into from
Sep 3, 2023
Merged

Conversation

zachallaun
Copy link
Collaborator

This (hopefully) fixes #287.

I'm not completely sure why this is occurring, but non-ascii characters were seemingly double-encoded, at least on my system (Ubuntu 22.04 via WSL, VS Code desktop on Windows). I've confirmed with :io.getopts() right before IO.read/2 is called that the encoding is set to :latin1, but the result after JsonRpc.decode/1 was that text would be utf8 double-encoded -- that is, it was what you would expect from:

latin1_data
|> :unicode.characters_to_binary(:latin1, :utf8)
|> :unicode.characters_to_binary(:latin1, :utf8)

This would result in Document.Line text containing more bytes than it should, which causes text edits to fail, leading behind random extra bytes.

I don't love this "fix" because it feels very much like a band-aid, not addressing whatever the root issue is, but I'd also rather things work in documents containing multi-byte characters.

@zachallaun zachallaun requested a review from scohen September 1, 2023 18:27
This (hopefully) fixes #287.

I'm not completely sure why this is occurring, but non-ascii characters
were seemingly double-encoded, at least on my system (Ubuntu 22.04 via
WSL, VS Code desktop on Windows). I've confirmed with `:io.getopts()`
right before `IO.read/2` is called that the encoding is set to
`:latin1`, but the result after `JsonRpc.decode/1` was that text would
be utf8 double-encoded -- that is, it was what you would expect from:

    latin1_data
    |> :unicode.characters_to_binary(:latin1, :utf8)
    |> :unicode.characters_to_binary(:latin1, :utf8)

This would result in `Document.Line` text containing more bytes than it
should, which causes text edits to fail, leading behind random extra
bytes.

I don't love this "fix" because it feels very much like a band-aid, not
addressing whatever the root issue is, but I'd also rather things work
in documents containing multi-byte characters.
@scottming
Copy link
Collaborator

I tried it and it seems to fix the Chinese encoding issue and it fixes #171 as well:

I can get more people to test it.

@scohen scohen merged commit 249f476 into main Sep 3, 2023
@scohen scohen deleted the za/issue-287 branch September 3, 2023 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unicode diffing bug
3 participants