Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pygls chooses utf-16 encoding when client prefers utf-32 (which would be faster for pygls as well) #445

Closed
nthykier opened this issue Apr 6, 2024 · 0 comments

Comments

@nthykier
Copy link
Contributor

nthykier commented Apr 6, 2024

As an optimization, I feel that pygls should choose the utf-32 encoding if the editor prefers it over utf-16.

Looking a bit at the code, it looks like _with_position_encodings:

    def _with_position_encodings(self):
        self.server_cap.position_encoding = types.PositionEncodingKind.Utf16

        general = self.client_capabilities.general
        if general is None:
            return self

        encodings = general.position_encodings
        if encodings is None:
            return self

        if types.PositionEncodingKind.Utf16 in encodings:
            return self

        if types.PositionEncodingKind.Utf32 in encodings:
            self.server_cap.position_encoding = types.PositionEncodingKind.Utf32
            return self

        if types.PositionEncodingKind.Utf8 in encodings:
            self.server_cap.position_encoding = types.PositionEncodingKind.Utf8
            return self

        logger.warning(f"Unknown `PositionEncoding`s: {encodings}")

        return self

The code here looks like it does encoding negotiation. However, in practice unless the editor explicitly attempts to hide that it supports UTF-16 (which it is required to support), then the outcome will always be UTF-16. Even both parties should have agreed on a better alternative for them. Notably, UTF-32 is advantageous for pygls, since it makes all the position code related operations trivial operations.

As an example, the LSP client eglot (from emacs) has the following encoding order: position_encodings=['utf-32', 'utf-8', 'utf-16']). Yet, the resulting encoding chosen by pygls ends up being utf-16.

nthykier added a commit to nthykier/pygls that referenced this issue Apr 6, 2024
nthykier added a commit to nthykier/pygls that referenced this issue Apr 6, 2024
Previously, `pygls` would always use `UTF-16` except when the client
tried to hide the fact that it supports `UTF-16` (which the LSP spec
requires it to do in all cases). Now, `pygls` will choose the editor's
preferred encoding.

When it is `UTF-32`, `pygls` saves a bit of computation in most
position codec related operations (`X_to_client_units` +
`client_num_units` are faster, `X_from_client_units` is about the
same), which is great. When it is `UTF-16` or `UTF-8`, the
computational load is about the same.

Closes: openlawlibrary#445
nthykier added a commit to nthykier/pygls that referenced this issue Apr 6, 2024
Previously, `pygls` would always use `UTF-16` except when the client
tried to hide the fact that it supports `UTF-16` (which the LSP spec
requires it to do in all cases). Now, `pygls` will choose the editor's
preferred encoding.

When it is `UTF-32`, `pygls` saves a bit of computation in most
position codec related operations (`X_to_client_units` +
`client_num_units` are faster, `X_from_client_units` is about the
same), which is great. When it is `UTF-16` or `UTF-8`, the
computational load is about the same.

Closes: openlawlibrary#445
nthykier added a commit to nthykier/pygls that referenced this issue Apr 6, 2024
Previously, `pygls` would always use `UTF-16` except when the client
tried to hide the fact that it supports `UTF-16` (which the LSP spec
requires it to do in all cases). Now, `pygls` will choose the editor's
preferred encoding.

When it is `UTF-32`, `pygls` saves a bit of computation in most
position codec related operations (`X_to_client_units` +
`client_num_units` are faster, `X_from_client_units` is about the
same), which is great. When it is `UTF-16` or `UTF-8`, the
computational load is about the same.

Closes: openlawlibrary#445
nthykier added a commit to nthykier/pygls that referenced this issue Apr 6, 2024
Previously, `pygls` would always use `UTF-16` except when the client
tried to hide the fact that it supports `UTF-16` (which the LSP spec
requires it to do in all cases). Now, `pygls` will choose the editor's
preferred encoding.

When it is `UTF-32`, `pygls` saves a bit of computation in most
position codec related operations (`X_to_client_units` +
`client_num_units` are faster, `X_from_client_units` is about the
same), which is great. When it is `UTF-16` or `UTF-8`, the
computational load is about the same.

Closes: openlawlibrary#445
@tombh tombh closed this as completed in 0d51815 Apr 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant