Skip to content

Commit

Permalink
Convert utf8->latin1 before decoding JSON-RPC payloads
Browse files Browse the repository at this point in the history
This (hopefully) fixes #287.

I'm not completely sure why this is occurring, but non-ascii characters
were seemingly double-encoded, at least on my system (Ubuntu 22.04 via
WSL, VS Code desktop on Windows). I've confirmed with `:io.getopts()`
right before `IO.read/2` is called that the encoding is set to
`:latin1`, but the result after `JsonRpc.decode/1` was that text would
be utf8 double-encoded -- that is, it was what you would expect from:

    latin1_data
    |> :unicode.characters_to_binary(:latin1, :utf8)
    |> :unicode.characters_to_binary(:latin1, :utf8)

This would result in `Document.Line` text containing more bytes than it
should, which causes text edits to fail, leading behind random extra
bytes.

I don't love this "fix" because it feels very much like a band-aid, not
addressing whatever the root issue is, but I'd also rather things work
in documents containing multi-byte characters.
  • Loading branch information
zachallaun committed Sep 1, 2023
1 parent d9dcae4 commit 753c64f
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions apps/server/lib/lexical/server/transport/std_io.ex
Original file line number Diff line number Diff line change
Expand Up @@ -114,8 +114,14 @@ defmodule Lexical.Server.Transport.StdIO do

defp read(device, amount) do
case IO.read(device, amount) do
data when is_binary(data) or is_list(data) -> {:ok, data}
other -> other
data when is_binary(data) or is_list(data) ->
# This is a bit "magical" and is likely a symptom of a bug elsewhere.
# See https://github.com/lexical-lsp/lexical/issues/287 for context.
data = :unicode.characters_to_binary(data, :utf8, :latin1)
{:ok, data}

other ->
other
end
end

Expand Down

0 comments on commit 753c64f

Please sign in to comment.