Convert utf8->latin1 before decoding JSON-RPC payloads

This (hopefully) fixes #287. I'm not completely sure why this is occurring, but non-ascii characters were seemingly double-encoded, at least on my system (Ubuntu 22.04 via WSL, VS Code desktop on Windows). I've confirmed with `:io.getopts()` right before `IO.read/2` is called that the encoding is set to `:latin1`, but the result after `JsonRpc.decode/1` was that text would be utf8 double-encoded -- that is, it was what you would expect from: latin1_data |> :unicode.characters_to_binary(:latin1, :utf8) |> :unicode.characters_to_binary(:latin1, :utf8) This would result in `Document.Line` text containing more bytes than it should, which causes text edits to fail, leading behind random extra bytes. I don't love this "fix" because it feels very much like a band-aid, not addressing whatever the root issue is, but I'd also rather things work in documents containing multi-byte characters.
lexical-lsp · Sep 1, 2023 · 753c64f · 753c64f
1 parent d9dcae4
commit 753c64f
Showing 1 changed file with 8 additions and 2 deletions.
diff --git a/apps/server/lib/lexical/server/transport/std_io.ex b/apps/server/lib/lexical/server/transport/std_io.ex
@@ -114,8 +114,14 @@ defmodule Lexical.Server.Transport.StdIO do
 
   defp read(device, amount) do
     case IO.read(device, amount) do
-      data when is_binary(data) or is_list(data) -> {:ok, data}
-      other -> other
+      data when is_binary(data) or is_list(data) ->
+        # This is a bit "magical" and is likely a symptom of a bug elsewhere.
+        # See https://github.com/lexical-lsp/lexical/issues/287 for context.
+        data = :unicode.characters_to_binary(data, :utf8, :latin1)
+        {:ok, data}
+
+      other ->
+        other
     end
   end