Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong hyperlinked element (definition, documentLink) when there are no Psi #98

Open
angelozerr opened this issue Jan 16, 2024 · 19 comments
Labels
bug Something isn't working definition

Comments

@angelozerr
Copy link
Contributor

angelozerr commented Jan 16, 2024

When a file is not parsed with a specific Psi parser (TEXT, texmate), the hyperlinked element takes the full content of the file.

Here is a demo with the ugly problem:

WrongHyperlinked

To manage LSP definition we implement GotoDeclarationHandler which doesn't provide the capability to customize the source range.

I have tried to support LSP definition with other means and one idea was to implement https://github.com/JetBrains/intellij-community/blob/6c42d0fad1db66faa4e5549935ca3eab91c5b2ca/platform/core-api/src/com/intellij/model/psi/ImplicitReferenceProvider.java#L32 where we can customize the source range.

The only problem is that this method is called every time (when you are typing when the file is loaded, etc). I discovered with #356 that we can track which action is executed.

One idea is:

@SCWells72
Copy link
Contributor

I was just about to comment on #644 that it's almost certainly due to the lack of a psi.referenceContributor as that EP helps to establish the relationship between a specific reference element -- or in this case, text range of the file that represents a reference element -- and the targets of that reference. I do think that adding a reference contributor will fix this, but it will also mean that you'll need to be VERY efficient with how those references are added. You can defer resolution until the PsiReference#resolve method is called, so it's really just a matter of adding lazy-resolved references to the PsiFile for text ranges that likely could realistically act as resolvable references.

@SCWells72
Copy link
Contributor

A bit more on this for whoever ends up fielding this in the future (potentially myself). Contrary to my previous comment here, it looks like the right fix for this would be to register a targetElementEvaluator that return pseudo-PSI elements for tokens that can be navigation tokens. The issue is fundamentally happening in CtrlMouseHandler#highlightAndHint() where the result contains the text range of the entire file because that was the only target element found at the computed editor offset. If you dig into result, you'll see that the more constrained text range of the word under the mouse pointer is present as result.targetPointer.targetPointer.myElement.referent, but the range that's being highlighted as hyperlinked is from result.ranges[0] which is the full file text range.

I've tried to see if there might be some way to adjust the highlighted range without having to introduce more discrete target elements, but unfortunately I haven't been able to find anything. A little experimentation in the past past with a targetElementEvaluator that returns fake PSI elements for words that represent potential navigation sources has shown promising for resolving this specific issue, but it also introduced a number of other issues due to the fake PSI tree.

I've considered the use of textDocument/documentSymbol and semantic token information to create and maintain a skeletal PSI tree. I think it may be worth investigating that idea further to see whether or not it might be viable, particularly from a performance standpoint.

@angelozerr
Copy link
Contributor Author

angelozerr commented Feb 15, 2025

A bit more on this for whoever ends up fielding this in the future (potentially myself). Contrary to my previous comment here, it looks like the right fix for this would be to register a targetElementEvaluator that return pseudo-PSI elements for tokens that can be navigation tokens. The issue is fundamentally happening in CtrlMouseHandler#highlightAndHint() where the result contains the text range of the entire file because that was the only target element found at the computed editor offset. If you dig into result, you'll see that the more constrained text range of the word under the mouse pointer is present as result.targetPointer.targetPointer.myElement.referent, but the range that's being highlighted as hyperlinked is from result.ranges[0] which is the full file text range.

I've tried to see if there might be some way to adjust the highlighted range without having to introduce more discrete target elements, but unfortunately I haven't been able to find anything. A little experimentation in the past past with a targetElementEvaluator that returns fake PSI elements for words that represent potential navigation sources has shown promising for resolving this specific issue, but it also introduced a number of other issues due to the fake PSI tree.

If I understand correctly your comment, using targetElementEvaluator could fix this issue, but it provide other problems because targetElementEvaluator is consumed on other context.

If it that, an idea is to implement targetElementEvaluator which returns word ONLY when Ctrl+Click is done. To do that you can add an action listener to know which action is executing like I did in #356

If we can do that it would be fantastic but we should improve again to use the LSP location source range.

I've considered the use of textDocument/documentSymbol and semantic token information to create and maintain a skeletal PSI tree. I think it may be worth investigating that idea further to see whether or not it might be viable, particularly from a performance standpoint.

@angelozerr
Copy link
Contributor Author

A bit more on this for whoever ends up fielding this in the future (potentially myself). Contrary to my previous comment here, it looks like the right fix for this would be to register a targetElementEvaluator that return pseudo-PSI elements for tokens that can be navigation tokens. The issue is fundamentally happening in CtrlMouseHandler#highlightAndHint() where the result contains the text range of the entire file because that was the only target element found at the computed editor offset. If you dig into result, you'll see that the more constrained text range of the word under the mouse pointer is present as result.targetPointer.targetPointer.myElement.referent, but the range that's being highlighted as hyperlinked is from result.ranges[0] which is the full file text range.

I've tried to see if there might be some way to adjust the highlighted range without having to introduce more discrete target elements, but unfortunately I haven't been able to find anything. A little experimentation in the past past with a targetElementEvaluator that returns fake PSI elements for words that represent potential navigation sources has shown promising for resolving this specific issue, but it also introduced a number of other issues due to the fake PSI tree.

I've considered the use of textDocument/documentSymbol and semantic token information to create and maintain a skeletal PSI tree. I think it may be worth investigating that idea further to see whether or not it might be viable, particularly from a performance standpoint.

I would like to avoid consuming this lsp request since the symbol response size can be big and for large files, this method should ve avoid to be consumed.

@SCWells72
Copy link
Contributor

SCWells72 commented Feb 16, 2025

Sorry, let me elaborate a bit more.

First, regarding:

If it that, an idea is to implement targetElementEvaluator which returns word ONLY when Ctrl+Click is done. To do that you can add an action listener to know which action is executing like I did in #356

it's not about Ctrl/Cmd+Click but rather Ctrl+Cmd+Mouseover, so it's not going to result in a concrete action firing, at least not an AnAction.

And regarding this:

I would like to avoid consuming this lsp request since the symbol response size can be big and for large files, this method should ve avoid to be consumed.

including textDocument/documentSymbol would be somewhat optional but ideally included if possible because it provides (shallow) structure to an otherwise flat token stream. Minimally reusing the last semantic token information for the file -- ideally caching it along with the PsiFile so that it's automatically evicted when the file is modified -- would yield a reasonable flat PSI element "tree" for the file. For example, for the following file:

/** Doc comment. */
export class Foo {
    // Line comment
    static bar() {
        console.log('Hello, world.');
    }
}

it would yield a PSI "tree" like:

/** Doc comment. */
export class <class:"Foo"> {
    // Line comment
    <method:"bar">() {
        <variable:"console">.<member:"log">('Hello, world.');
    }
}

With the information we have about the language grammar from the file type and/or client config -- specifically brace/bracket/paren pairs, string literal bounds characters, comment prefixes/suffixes, etc. -- we could go even further and yield the following confidently:

<blockComment:"/** Doc comment. */">
<token:"export"><psiWhitespace><token:"class"><psiWhitespace><class:"Foo"><psiWhitespace><lbrace:"{">
<psiWhitespace>
<lineComment:"// Line comment">
<method:"bar"><lbrace:"("><rbrace:")"><psiWhitespace><lbrace:"{">
<psiWhitespace>
<variable:"console"><dot:"."><member:"log"><lbrace:"("><stringLiteral:"'Hello, world.'"><rbrace:")"><terminator:";">
<psiWhitespace>
<rbrace:"}">
<psiWhitespace>
<rbrace:"}">

That's already extremely useful and should result in a targetElementEvaluator that works properly from the end user's perspective.

If we found that it was possible to use textDocument/documentSymbol efficiently -- and I have some thoughts on that but wouldn't know if they'd bear out until trying it on a secondary pass -- the PSI tree would become:

<blockComment:"/** Doc comment. */">
<declaration:"Foo">
    <token:"export"><psiWhitespace><token:"class"><psiWhitespace>
    <nameIdentifier:"Foo">
        <class:"Foo">
    </nameIdentifier:"Foo">
    <psiWhitespace><lbrace:"{">
    <psiWhitespace>
    <lineComment:"// Line comment">
    <declaration:"bar">
        <nameIdentifier:"bar">
            <method:"bar">
        </nameIdentifier:"bar">
        <lbrace:"("><rbrace:")"><psiWhitespace><lbrace:"{">
        <psiWhitespace>
        <variable:"console"><dot:"."><member:"log"><lbrace:"("><stringLiteral:"'Hello, world.'"><rbrace:")"><terminator:";">
        <psiWhitespace>
        <rbrace:"}">
    <declaration:"bar">
    <psiWhitespace>
    <rbrace:"}">
</declaration:"Foo">

I'll look into prototyping the first and second phases of this as explained above that only rely on the semantic tokens that we should already have at any given time. Hopefully that might happen this coming week, but I have two pending releases of my own plugin right now, one for Salesforce Spring '25 and one for SLDS Validator LSP integration. Once I get past both of those, I'll take a look at this.

@angelozerr
Copy link
Contributor Author

angelozerr commented Feb 16, 2025

Please keep in mind that documentSymbol can be really expensive and very too big for large file.

For instance vscode uses documentSymbol to fill outline. In vscode-xml we had to limit the symbols number on server side to avoid freezing vscode. The problem was not about the vscode outline but about the deserialzation of the json.

The compute of the documentSymbol can be slow according some language server and cannot return the full symbols that we could expect.

@SCWells72
Copy link
Contributor

Concrete implementation thoughts

It looks like the set of pseudo-PSI elements that can/should be derived from semantic tokens can be based on the combination of SemanticTokenTypes and SemanticTokenModifiers. Specifically any of the following SemanticTokenTypes should be considered either declarations or references/usages:

  • Namespace
  • Type
  • Class
  • Enum
  • Interface
  • Struct
  • TypeParameter
  • Parameter
  • Variable
  • Property
  • EnumMember
  • Event
  • Function
  • Method
  • Macro
  • Decorator

They are considered declarations if found with any of the following SemanticTokenModifiers:

  • Declaration
  • Definition

and are otherwise considered references/usages.

Additionally the Keyword and Modifier SemanticTokenTypes could be easily "merged" with concrete SemanticTokenModifiers such as Static, Abstract, Async, etc., for very specific PSI elements, but just starting with PSI elements for the following token types (as applicable based on the language server) would be useful:

  • Keyword
  • Modifier
  • Comment
  • String
  • Number
  • Regexp
  • Operator

While these PSI elements would be quite coarse-grained categorically, they could include the full original token type/modifier information for any consumers that might need them.

Examples

Here are a few concrete examples from specific language servers:

TypeScript

The following TypeScript file:

/** Doc comment. */
export class Foo {
    // Line comment
    static bar() {
        console.log('Hello, world.');
    }
}

results in the following semantic token types/modifiers:

/** Doc comment. */
export class <class:declaration>Foo</class:declaration> {
    // Line comment
    static <member:declaration.static>bar</member:declaration.static>() {
        <variable:defaultLibrary>console</variable:defaultLibrary>.<member:defaultLibrary>log</member:defaultLibrary>('Hello, world.');
    }
}

and would result in the following PSI elements (only including those directly derived from semantic tokens):

/** Doc comment. */
export class <declaration:"Foo> {
    // Line comment
    static <declaration:"bar">() {
        <reference:"console">.<reference:"log">('Hello, world.');
    }
}

Go

Here's another example using Go showing other token types/modifiers including first-class tokens for comments and string literals:

package main

import "fmt"

/** Block comment */
func main() {
	// Line comment
	fmt.Println("hello world")
}

which yields the following semantic tokens:

<keyword:>package</keyword:> <namespace:>main</namespace:>

<keyword:>import</keyword:> "<namespace:>fmt</namespace:>"

<comment:>/** Block comment */</comment:>
<keyword:>func</keyword:> <function:definition>main</function:definition>() {
	<comment:>// Line comment</comment:>
	<namespace:>fmt</namespace:>.<function:>Println</function:>(<string:>"hello world"</string:>)
}

and would therefore have the following PSI elements (again, including only those derived from semantic tokens here):

<keyword:"package"> <namespace:"main">

<keyword:"import"> "<namespace:"fmt">"

<comment:"/** Block comment */">
<keyword:"func"> <declaration:"main">() {
	<comment:"// Line comment">
	<reference:"fmt">.<reference:"Println">(<stringLiteral:"\"hello world\"">)
}

Java

And another example using Java via jdtls:

/*
 * File header comment.
 */

import java.util.ArrayList;
import java.util.LinkedHashSet;
import java.util.List;
import java.util.Set;

/**
 * Class comment.
 */
public class Hello {
    /**
     * Method; comment.
     * @param args command-line arguments
     */
    public static void main(String[] args) {
        List<String> strings = new ArrayList<>(args);
    }
}

which yields the following semantic token types/modifiers:

/*
 * File header comment.
 */

import <namespace:importDeclaration>java</namespace:importDeclaration>.<namespace:importDeclaration>util</namespace:importDeclaration>.<class:public.generic.importDeclaration>ArrayList</class:public.generic.importDeclaration>;
import <namespace:importDeclaration>java</namespace:importDeclaration>.<namespace:importDeclaration>util</namespace:importDeclaration>.<class:public.generic.importDeclaration>LinkedHashSet</class:public.generic.importDeclaration>;
import <namespace:importDeclaration>java</namespace:importDeclaration>.<namespace:importDeclaration>util</namespace:importDeclaration>.<interface:public.generic.importDeclaration>List</interface:public.generic.importDeclaration>;
import <namespace:importDeclaration>java</namespace:importDeclaration>.<namespace:importDeclaration>util</namespace:importDeclaration>.<interface:public.generic.importDeclaration>Set</interface:public.generic.importDeclaration>;

/**
 * Class comment.
 */
<modifier:>public</modifier:> <modifier:>class</modifier:> <class:declaration.public>Hello</class:declaration.public> {
    /**
     * Method; comment.
     * <keyword:documentation>@param</keyword:documentation> <parameter:documentation>args</parameter:documentation> command-line arguments
     */
    <modifier:>public</modifier:> <modifier:>static</modifier:> void <method:static.declaration.public>main</method:static.declaration.public>(<class:readonly.public>String</class:readonly.public>[] <parameter:declaration>args</parameter:declaration>) {
        <interface:public.generic>List</interface:public.generic><<class:readonly.public.typeArgument>String</class:readonly.public.typeArgument>> <variable:declaration>strings</variable:declaration> = new ArrayList<>(<parameter:>args</parameter:>);
    }
}

and the following PSI elements:

/*
 * File header comment.
 */

import <reference:"java">.<reference:"util">.<reference:"ArrayList">;
import <reference:"java">.<reference:"util">.<reference:"LinkedHashSet">;
import <reference:"java">.<reference:"util">.<reference:"List">;
import <reference:"java">.<reference:"util">.<reference:"Set">;

/**
 * Class comment.
 */
<modifier:"public"> <modifier:"class"> <declaration:"Hello"> {
    /**
     * Method; comment.
     * <keyword:"@param"> <reference:"args"> command-line arguments
     */
    <modifier:"public"> <modifier:"static"> void <declaration:"main">(<class:"String">[] <declaration:"args">) {
        <reference:"List"><<reference:"String">> <declaration:"strings"> = new ArrayList<>(<reference:"args">);
    }
}

@SCWells72
Copy link
Contributor

SCWells72 commented Feb 16, 2025

Turns out that approach works pretty well! Here's a demo showing it in TypeScript, Java, and Go where I'm holding down Ctrl (on Windows) the whole time I'm mousing around these documents:

Image

This is only using semantic tokens to create a sparsely-populated PSI "tree" for the file. It doesn't yet solve the issue with Go To Declaration or Usages on a declaration showing usages, but I'm pretty sure that should be pretty simple to implement with these changes.

I have no idea yet what breaks due to these changes, but I'll try to circle back to this branch later in the coming week and see if I can polish it up for a PR.

@SCWells72
Copy link
Contributor

SCWells72 commented Feb 17, 2025

Okay, I was able to work on this quite a bit more today, and I have it pretty much working across the board. Here's a demo:

Image

As the demo hopefully conveys, the following all work properly now:

  • For LSP files against servers that support sufficiently descriptive semantic tokens, only declarations and references to declarations are highlighted on Ctrl/Cmd+Mouseover, and nothing else is ever highlighted.
  • For LSP files against servers that do not support semantic tokens such as the CSS language server in the demo, the entire file is no longer highlighted on Ctrl/Cmd+Mouseover.
  • The tooltip shown when hovering in this manner is now much more descriptive, showing the declaration type if known from the semantic tokens and their modifiers followed by the declaration name and, if hovering over a reference that resolves to a declaration in another file, the target file name is also shown. Also, the symbolic names are displayed as code in these tooltips.
  • Attempting to navigate on a reference takes you to the declaration. Attempting to navigate on a declaration shows the usages of that declaration.

Unfortunately this can only work for TextMate files and not plain text files associated with abstract file types. This is because the FileViewProviderFactory can only be registered for a language -- and plain text files do not have a non-null language including TEXT during file view provider resolution -- or a file type -- and plain text files associated with abstract file types do not have a single file type for which one could register. As a result, files with native file types or configured for TextMate Bundles will yield the best experience. Thankfully non-native file types are available as TextMate Bundles.

This branch is in pretty solid shape, but I do need to add some class/method comments describing the changes, and I need to write some tests to exercise it (and fix any existing tests that may be broken by these changes). I'll see if I can get a PR going over the next few days.

@DIGIX666
Copy link

When a file is not parsed with a specific Psi parser (TEXT, texmate), the hyperlinked element takes the full content of the file.

Here is a demo with the ugly problem:

WrongHyperlinked WrongHyperlinked

To manage LSP definition we implement GotoDeclarationHandler which doesn't provide the capability to customize the source range.

I have tried to support LSP definition with other means and one idea was to implement https://github.com/JetBrains/intellij-community/blob/6c42d0fad1db66faa4e5549935ca3eab91c5b2ca/platform/core-api/src/com/intellij/model/psi/ImplicitReferenceProvider.java#L32 where we can customize the source range.

The only problem is that this method is called every time (when you are typing when the file is loaded, etc). I discovered with #356 that we can track which action is executed.

One idea is:

Hey @angelozerr, Do you have try this idea ?

@angelozerr
Copy link
Contributor Author

When a file is not parsed with a specific Psi parser (TEXT, texmate), the hyperlinked element takes the full content of the file.

Here is a demo with the ugly problem:

WrongHyperlinked WrongHyperlinked

To manage LSP definition we implement GotoDeclarationHandler which doesn't provide the capability to customize the source range.

I have tried to support LSP definition with other means and one idea was to implement https://github.com/JetBrains/intellij-community/blob/6c42d0fad1db66faa4e5549935ca3eab91c5b2ca/platform/core-api/src/com/intellij/model/psi/ImplicitReferenceProvider.java#L32 where we can customize the source range.

The only problem is that this method is called every time (when you are typing when the file is loaded, etc). I discovered with #356 that we can track which action is executed.

One idea is:

Hey @angelozerr, Do you have try this idea ?

No I give up this idea to work on more important features. If you want to try it, yiu are welcome!

But please keep in sync with @SCWells72 who has some ideas to fix this issue

@SCWells72
Copy link
Contributor

Please see this comment for a small demo showing the fix for this issue and the one with Go To Declaration or Usages that I'll hopefully be running through the PR process later this week.

@DIGIX666
Copy link

Great!
I've just seen it. Could you tell me more about how you set it up, because I don't understand exactly? And is there an open PR? @SCWells72

@SCWells72
Copy link
Contributor

Great! I've just seen it. Could you tell me more about how you set it up, because I don't understand exactly? And is there an open PR? @SCWells72

There's not a PR yet, but hopefully I'll open one in the next day or two. The short version is that it adds a FileViewProviderFactory for LSP4IJ TextMate files (see my comment above about why this doesn't work for plain text files) and actually simulates a flat PSI tree based the language server's reported semantic tokens for the file including whether those elements represent declarations, references, comments, string and numeric literals, etc. That's the core of it, though there were several other EPs that had to be implemented to make things more whole since it's a simulated PSI tree for the file. I'll explain all of the changes when I open the PR.

@DIGIX666
Copy link

Great! I've just seen it. Could you tell me more about how you set it up, because I don't understand exactly? And is there an open PR? @SCWells72

There's not a PR yet, but hopefully I'll open one in the next day or two. The short version is that it adds a FileViewProviderFactory for LSP4IJ TextMate files (see my comment above about why this doesn't work for plain text files) and actually simulates a flat PSI tree based the language server's reported semantic tokens for the file including whether those elements represent declarations, references, comments, string and numeric literals, etc. That's the core of it, though there were several other EPs that had to be implemented to make things more whole since it's a simulated PSI tree for the file. I'll explain all of the changes when I open the PR.

Thank you for your explanations :)
I'll look for a quick alternative solution and check out your PR once it's open.

@angelozerr
Copy link
Contributor Author

Great! I've just seen it. Could you tell me more about how you set it up, because I don't understand exactly? And is there an open PR? @SCWells72

There's not a PR yet, but hopefully I'll open one in the next day or two. The short version is that it adds a FileViewProviderFactory for LSP4IJ TextMate files (see my comment above about why this doesn't work for plain text files) and actually simulates a flat PSI tree based the language server's reported semantic tokens for the file including whether those elements represent declarations, references, comments, string and numeric literals, etc. That's the core of it, though there were several other EPs that had to be implemented to make things more whole since it's a simulated PSI tree for the file. I'll explain all of the changes when I open the PR.

I repeat me but we need to be carefull with this solution. We must never have some freeze or some memory problem with large file. It is my fear.

@DIGIX666
Copy link

Great! I've just seen it. Could you tell me more about how you set it up, because I don't understand exactly? And is there an open PR? @SCWells72

There's not a PR yet, but hopefully I'll open one in the next day or two. The short version is that it adds a FileViewProviderFactory for LSP4IJ TextMate files (see my comment above about why this doesn't work for plain text files) and actually simulates a flat PSI tree based the language server's reported semantic tokens for the file including whether those elements represent declarations, references, comments, string and numeric literals, etc. That's the core of it, though there were several other EPs that had to be implemented to make things more whole since it's a simulated PSI tree for the file. I'll explain all of the changes when I open the PR.

I repeat me but we need to be carefull with this solution. We must never have some freeze or some memory problem with large file. It is my fear.

Yes, I've taken note of your concern, but as an alternative I'm thinking of creating a .bnf file to generate a parser and lexer for my psi

@SCWells72
Copy link
Contributor

I repeat me but we need to be carefull with this solution. We must never have some freeze or some memory problem with large file. It is my fear.

Hopefully when you see the actual changes, that concern will be put to rest. This all happens in response to the existing semantic tokens logic. No new LSP calls have been added for this. And if there are no semantic tokens -- either because they haven't been received yet or because the language server doesn't support semantic tokens -- my changes still fix the issue with the entire file being hyperlinked on Ctrl/Cmd+Mouse hover.

@SCWells72
Copy link
Contributor

PR for this is now open as #853.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working definition
Projects
None yet
Development

No branches or pull requests

3 participants