`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well) #445

nthykier · 2024-04-06T11:25:17Z

As an optimization, I feel that pygls should choose the utf-32 encoding if the editor prefers it over utf-16.

Looking a bit at the code, it looks like _with_position_encodings:

    def _with_position_encodings(self):
        self.server_cap.position_encoding = types.PositionEncodingKind.Utf16

        general = self.client_capabilities.general
        if general is None:
            return self

        encodings = general.position_encodings
        if encodings is None:
            return self

        if types.PositionEncodingKind.Utf16 in encodings:
            return self

        if types.PositionEncodingKind.Utf32 in encodings:
            self.server_cap.position_encoding = types.PositionEncodingKind.Utf32
            return self

        if types.PositionEncodingKind.Utf8 in encodings:
            self.server_cap.position_encoding = types.PositionEncodingKind.Utf8
            return self

        logger.warning(f"Unknown `PositionEncoding`s: {encodings}")

        return self

The code here looks like it does encoding negotiation. However, in practice unless the editor explicitly attempts to hide that it supports UTF-16 (which it is required to support), then the outcome will always be UTF-16. Even both parties should have agreed on a better alternative for them. Notably, UTF-32 is advantageous for pygls, since it makes all the position code related operations trivial operations.

As an example, the LSP client eglot (from emacs) has the following encoding order: position_encodings=['utf-32', 'utf-8', 'utf-16']). Yet, the resulting encoding chosen by pygls ends up being utf-16.

The text was updated successfully, but these errors were encountered:

Closes: openlawlibrary#445

Previously, `pygls` would always use `UTF-16` except when the client tried to hide the fact that it supports `UTF-16` (which the LSP spec requires it to do in all cases). Now, `pygls` will choose the editor's preferred encoding. When it is `UTF-32`, `pygls` saves a bit of computation in most position codec related operations (`X_to_client_units` + `client_num_units` are faster, `X_from_client_units` is about the same), which is great. When it is `UTF-16` or `UTF-8`, the computational load is about the same. Closes: openlawlibrary#445

nthykier added a commit to nthykier/pygls that referenced this issue Apr 6, 2024

Respect client's preferred encoding when possible

b97f2e5

Closes: openlawlibrary#445

nthykier mentioned this issue Apr 6, 2024

fix: Respect client's preferred encoding when possible nthykier/pygls#1

Closed

6 tasks

nthykier mentioned this issue Apr 6, 2024

Respect client's preferred encoding when possible #446

Merged

6 tasks

tombh closed this as completed in 0d51815 Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well) #445

`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well) #445

nthykier commented Apr 6, 2024

pygls chooses utf-16 encoding when client prefers utf-32 (which would be faster for pygls as well) #445

pygls chooses utf-16 encoding when client prefers utf-32 (which would be faster for pygls as well) #445

Comments

nthykier commented Apr 6, 2024

`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well) #445

`pygls` chooses utf-16 encoding when client prefers utf-32 (which would be faster for `pygls` as well) #445