Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API for extensions to know the symbol kind of a range or position. #91555

Closed
alexr00 opened this issue Feb 26, 2020 · 18 comments
Closed

API for extensions to know the symbol kind of a range or position. #91555

alexr00 opened this issue Feb 26, 2020 · 18 comments
Assignees
Labels
api-proposal feature-request Request for new features or functionality tokenization Text tokenization

Comments

@alexr00
Copy link
Member

alexr00 commented Feb 26, 2020

Every person who has used the GitHub issue suggestions has asked if we can only show the suggest if position is a comment. The API would only need to work on the currently active editor.

microsoft/vscode-pull-request-github#1521

@aeschli
Copy link
Contributor

aeschli commented Mar 2, 2020

We always resisted to give out the standard token type (comment, string, regex...) of a given position.

IMO the CompletionContext of the he completion provider would be fitting place for the API. We could call it a 'hint' so that it can be undefined if haven't tokenized to the position yet.

@alexdima @jrieken What do you think?

@aeschli aeschli assigned jrieken and unassigned aeschli Mar 2, 2020
@alexdima
Copy link
Member

alexdima commented Mar 2, 2020

The original issue cannot be fixed by new API. Even if new API is added and the provider returns an empty array in this case, the fact that # is a trigger character will still lead to the suggestions popping up, and then closing down again... i.e. flicker

So IMHO the fix for the original issue is to not register # as a trigger character, at least not for python.

@alexr00
Copy link
Member Author

alexr00 commented Mar 2, 2020

New API would still help for detecting links (ex #1234 should link to issue 1234). I removed the link detection because it was very annoying to have links showing in colors and in code, when links like that should only show in comments.

@jrieken
Copy link
Member

jrieken commented May 13, 2020

The original issue cannot be fixed by new API. Even if new API is added and the provider returns an empty array in this case, the fact that # is a trigger character will still lead to the suggestions popping up, and then closing down again...

That's not correct. When suggestions are triggered automatically, e.g by a trigger character or by "just typing", then the UI only shows with results, e.g no loading-message, no no-results-message etc.

I do have mixed feelings about exposing this API because we can only expose the standard token types (comment, string, other) and maybe the current language but not what most users want - textmate tokens...

Anways, we can propose something but then I wouldn't hide it inside some context but make it a separate function inside vscode.languages, maybe

export enum TokenType {
  Comment,
  String,
  Other
}

export interface TokenInformation {
  type: TokenType;
  range: Range; 
  languageId: string;
}

export namespace languages {

  // add bail out path? return undefined, throw error, or starve caller when invoked at a bad time?
  export function getTokenInformationAtPosition(document: TextDocument, position: Position): Thenable<TokenInformation>;
}

@jrieken jrieken added this to the May 2020 milestone May 13, 2020
@jrieken jrieken added api-proposal editor-symbols definitions, declarations, references tokenization Text tokenization feature-request Request for new features or functionality and removed editor-symbols definitions, declarations, references labels May 13, 2020
@aeschli
Copy link
Contributor

aeschli commented May 13, 2020

We just introduced token types for semantic highlighting, including tokens to comments, strings, so another TokenType can be confusing.

The semantic token types were added with the goal to serve as a better abstraction than TextMate scopes. And the semantic token types come along with a mapping to TextMate scopes. So it's thinkable to use these new token types as the way to provide more information. We could even make use of the semantic information that we now get.

But I'm not sure that is really that helpful when computing code completions. You need more than just a token. Rather a context. Am I at a place where I complete an expression, a statement, a parameter, a regex, and so on....

So +1 for a simpler way to learn about the context you are in. I still think its the best to add it to the code completion context. It's also what we had in JDT.

@jrieken
Copy link
Member

jrieken commented May 14, 2020

I still think its the best to add it to the code completion context.

I don't understand how that would change the information that we expose? It would still be the same, right? It would be something like this.

export enum TokenType {
  Comment,
  String,
  Other
}

export interface TokenInformation {
  type: TokenType;
  range: Range; 
  languageId: string;
}

export interface CompletionContext {
  tokenInformation: TokenInformation
}

Or a more minimal version like this

export enum TokenType {
  Comment,
  String,
  Other
}

export interface CompletionContext {
  type: TokenType;
}

But overall it doesn't change the kind and extend of information that we are exposing.

@alexdima
Copy link
Member

@jrieken I like more the minimal proposal. I think the data can't be safely fetched via an async call because that would be a source of subtle bugs, like perhaps the file was changed in the meantime between the suggest provider's invocation and the time the async token type reply comes back... That's why we always try to have read APIs be synchronous... But we don't want to always push all the tokens in the file to the extension host...

@aeschli
Copy link
Contributor

aeschli commented May 14, 2020

I like the last one, but don't call it TokenType, maybe CompletionContextType.

I wonder if it's a good idea to make it an enum, maybe better an object:

interface CompletionContextInfo {
  inComment: boolean;
}

That way it can be extended without breakage, accepting that some of the properties are overlapping/excluding.

interface CompletionContextInfo {
  inComment: boolean;
  inDocComment: boolean;
  inString: boolean;
  atNewDeclaration: boolean;
  ... 
}

It' a way to give out context information in a synthesized way and avoids giving out TextMate scopes or tokens or ASTs.
We can have a mapping from TextMate scopes to context info, or even have the LS compute it.

Really just an idea, needs more thinking.

@jrieken
Copy link
Member

jrieken commented May 14, 2020

An object or an enum transports the same information and the enum has the advantage that it cannot express invalid states, like { inComment: true, inString: true, inDocComment: true, etc}

@jrieken
Copy link
Member

jrieken commented May 14, 2020

@jrieken I like more the minimal proposal.

Yeah, the minimal proposal has value but my argument is that folks want this information in various places, like for link detection etc. Then would mean

  • we eventually add this to more and more places
  • we live with a well-known hack that goes like this: register fake completion provider, invoke it via api-command, capture context, use it.

The consistency argument makes sense but I would actually spec this API to be inconsistent or false at times, e.g we don't to be forced to fully tokenise files before invoking a completion item provider.

@alexdima
Copy link
Member

I agree with you, but I don't know how to give out this information even if I wanted to.

  • IMHO an async read API has the potential to introduce very many bugs in extensions because most people will not check their state after an await. e.g.
//...
const lineText = document.lineAt(5).text;
const tokens = await vscode.getTokenAtPosition(document, 5, 10);
// is lineText really equal to document.lineAt(5).text now?

To make matters slightly worse, most of the times (under low CPU load or when testing patiently), things will be fast enough such that a user edit would never "sneak in" in with the await. So I think there will be very many difficult to reproduce bugs in extensions and these bugs will make their way to us and bother us forever :).

  • a sync read API is also very problematic because it is not smart to push MB and MB of tokens to the extension host from the renderer, especially since there might be nobody there to read them.

  • also, exposing TM scopes is by itself problematic, and while we are a bit locked into them due to themes which assume TM scopes, maybe there is a future where we can have a non-TM syntax highlighting story, at least when using our default themes.

In conclusion, I think it is prudent to expose only the standard token types and keeping things small to help this very important feature is IMHO the easiest way forward.

(btw I have also thought about perhaps creating an npm library that can tokenize a language and give out standard token types very quickly. I started some time ago and wrote one for TS over here -- https://github.com/microsoft/vscode/blob/master/src/vs/editor/common/modes/tokenization/typescript.ts -- but I got distracted and didn't finish that work )

@aeschli
Copy link
Contributor

aeschli commented May 14, 2020

An object or an enum transports the same information

No, the object can eventually express overlapping information:

interface CompletionContextInfo {
  atNewStatement: true
  atNewDeclaration: true;
}

meaning, the current location is good to complete a declaration (e.g. a new function), and it is also good to complete a new statement (a log statement).
A snippet would then new able to specify for which contexts it wants to show up.

{ inComment: true, inDocComment: true} is a similar, valid example. inDocComment could have been added at a later release, without breaking any users of inComment.
{ inComment: true, inString: true} might be invalid, but maybe there's a crazy language that has 'comment strings'.

@jrieken
Copy link
Member

jrieken commented May 14, 2020

In conclusion, I think it is prudent to expose only the standard token types

Just to clarify that I am 💯% for that - I don't think we should leak TM tokens anywhere

Talking about shortcuts, we could simply continue in the spirt of emmet which found it own very "pragmatic" workaround:

return commandService.executeCommand<void>('emmet.expandAbbreviation', EmmetEditorAction.getLanguage(modeService, editor, grammarContributions));

@jrieken
Copy link
Member

jrieken commented May 19, 2020

Despite ongoing discussion, I have implemented the getTokenInformationAtPosition proposal because that's fit for completions (position-based requests) and link detection. The alternatives, context or registration-time data, don't work for the link provider case and therefore I went with an explicit function. This has the problem of its data being likely out of sync. Overall, this needs more thinking and "problem-scoping"

@jrieken jrieken modified the milestones: May 2020, June 2020 May 28, 2020
@alexdima
Copy link
Member

alexdima commented Jun 4, 2020

How about if the link provider returns a link collection and says something like "only apply these links on comments" and then the UI side takes the incoming links and removes the ones which are not on comments?

And how does this really work? Do we really want to make 50 cross renderer-extension host requests in a file with 50 occurences of #123 ? And that on each keystroke since each keystroke invokes the link computer? IMHO that is a very bad solution.

@alexdima
Copy link
Member

alexdima commented Jun 4, 2020

This looks to be quite serious. I have created #99356

@jrieken
Copy link
Member

jrieken commented Jun 4, 2020

This looks to be quite serious.

I'd say it is serious if it freezes the UI but generally this is as severe as any API call that's being invoked many, many times, e.g in a loop. If we think the sample in #99356 is realistic then we can easily debounce calls the API-layer. However, as I have repeatedly said this is temporary solution to get us off the start line. The API is proposed and we have no plans to finalise it.

@jrieken
Copy link
Member

jrieken commented Jun 15, 2020

closing this as duplicate of #580. tho, no further action planned

@jrieken jrieken closed this as completed Jun 15, 2020
@github-actions github-actions bot locked and limited conversation to collaborators Jul 31, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-proposal feature-request Request for new features or functionality tokenization Text tokenization
Projects
None yet
Development

No branches or pull requests

5 participants
@jrieken @alexdima @aeschli @alexr00 and others