API for extensions to know the symbol kind of a range or position. #91555

alexr00 · 2020-02-26T10:20:49Z

Every person who has used the GitHub issue suggestions has asked if we can only show the suggest if position is a comment. The API would only need to work on the currently active editor.

microsoft/vscode-pull-request-github#1521

aeschli · 2020-03-02T13:41:35Z

We always resisted to give out the standard token type (comment, string, regex...) of a given position.

IMO the CompletionContext of the he completion provider would be fitting place for the API. We could call it a 'hint' so that it can be undefined if haven't tokenized to the position yet.

@alexdima @jrieken What do you think?

alexdima · 2020-03-02T14:08:16Z

The original issue cannot be fixed by new API. Even if new API is added and the provider returns an empty array in this case, the fact that # is a trigger character will still lead to the suggestions popping up, and then closing down again... i.e. flicker

So IMHO the fix for the original issue is to not register # as a trigger character, at least not for python.

alexr00 · 2020-03-02T14:51:56Z

New API would still help for detecting links (ex #1234 should link to issue 1234). I removed the link detection because it was very annoying to have links showing in colors and in code, when links like that should only show in comments.

jrieken · 2020-05-13T15:00:00Z

The original issue cannot be fixed by new API. Even if new API is added and the provider returns an empty array in this case, the fact that # is a trigger character will still lead to the suggestions popping up, and then closing down again...

That's not correct. When suggestions are triggered automatically, e.g by a trigger character or by "just typing", then the UI only shows with results, e.g no loading-message, no no-results-message etc.

I do have mixed feelings about exposing this API because we can only expose the standard token types (comment, string, other) and maybe the current language but not what most users want - textmate tokens...

Anways, we can propose something but then I wouldn't hide it inside some context but make it a separate function inside vscode.languages, maybe

export enum TokenType {
  Comment,
  String,
  Other
}

export interface TokenInformation {
  type: TokenType;
  range: Range; 
  languageId: string;
}

export namespace languages {

  // add bail out path? return undefined, throw error, or starve caller when invoked at a bad time?
  export function getTokenInformationAtPosition(document: TextDocument, position: Position): Thenable<TokenInformation>;
}

aeschli · 2020-05-13T16:34:18Z

We just introduced token types for semantic highlighting, including tokens to comments, strings, so another TokenType can be confusing.

The semantic token types were added with the goal to serve as a better abstraction than TextMate scopes. And the semantic token types come along with a mapping to TextMate scopes. So it's thinkable to use these new token types as the way to provide more information. We could even make use of the semantic information that we now get.

But I'm not sure that is really that helpful when computing code completions. You need more than just a token. Rather a context. Am I at a place where I complete an expression, a statement, a parameter, a regex, and so on....

So +1 for a simpler way to learn about the context you are in. I still think its the best to add it to the code completion context. It's also what we had in JDT.

jrieken · 2020-05-14T07:11:36Z

I still think its the best to add it to the code completion context.

I don't understand how that would change the information that we expose? It would still be the same, right? It would be something like this.

export enum TokenType {
  Comment,
  String,
  Other
}

export interface TokenInformation {
  type: TokenType;
  range: Range; 
  languageId: string;
}

export interface CompletionContext {
  tokenInformation: TokenInformation
}

Or a more minimal version like this

export enum TokenType {
  Comment,
  String,
  Other
}

export interface CompletionContext {
  type: TokenType;
}

But overall it doesn't change the kind and extend of information that we are exposing.

alexdima · 2020-05-14T07:30:27Z

@jrieken I like more the minimal proposal. I think the data can't be safely fetched via an async call because that would be a source of subtle bugs, like perhaps the file was changed in the meantime between the suggest provider's invocation and the time the async token type reply comes back... That's why we always try to have read APIs be synchronous... But we don't want to always push all the tokens in the file to the extension host...

aeschli · 2020-05-14T08:17:28Z

I like the last one, but don't call it TokenType, maybe CompletionContextType.

I wonder if it's a good idea to make it an enum, maybe better an object:

interface CompletionContextInfo {
  inComment: boolean;
}

That way it can be extended without breakage, accepting that some of the properties are overlapping/excluding.

interface CompletionContextInfo {
  inComment: boolean;
  inDocComment: boolean;
  inString: boolean;
  atNewDeclaration: boolean;
  ... 
}

It' a way to give out context information in a synthesized way and avoids giving out TextMate scopes or tokens or ASTs.
We can have a mapping from TextMate scopes to context info, or even have the LS compute it.

Really just an idea, needs more thinking.

jrieken · 2020-05-14T08:53:58Z

An object or an enum transports the same information and the enum has the advantage that it cannot express invalid states, like { inComment: true, inString: true, inDocComment: true, etc}

jrieken · 2020-05-14T08:57:39Z

@jrieken I like more the minimal proposal.

Yeah, the minimal proposal has value but my argument is that folks want this information in various places, like for link detection etc. Then would mean

we eventually add this to more and more places
we live with a well-known hack that goes like this: register fake completion provider, invoke it via api-command, capture context, use it.

The consistency argument makes sense but I would actually spec this API to be inconsistent or false at times, e.g we don't to be forced to fully tokenise files before invoking a completion item provider.

alexdima · 2020-05-14T09:44:09Z

I agree with you, but I don't know how to give out this information even if I wanted to.

IMHO an async read API has the potential to introduce very many bugs in extensions because most people will not check their state after an await. e.g.

//...
const lineText = document.lineAt(5).text;
const tokens = await vscode.getTokenAtPosition(document, 5, 10);
// is lineText really equal to document.lineAt(5).text now?

To make matters slightly worse, most of the times (under low CPU load or when testing patiently), things will be fast enough such that a user edit would never "sneak in" in with the await. So I think there will be very many difficult to reproduce bugs in extensions and these bugs will make their way to us and bother us forever :).

a sync read API is also very problematic because it is not smart to push MB and MB of tokens to the extension host from the renderer, especially since there might be nobody there to read them.
also, exposing TM scopes is by itself problematic, and while we are a bit locked into them due to themes which assume TM scopes, maybe there is a future where we can have a non-TM syntax highlighting story, at least when using our default themes.

In conclusion, I think it is prudent to expose only the standard token types and keeping things small to help this very important feature is IMHO the easiest way forward.

(btw I have also thought about perhaps creating an npm library that can tokenize a language and give out standard token types very quickly. I started some time ago and wrote one for TS over here -- https://github.com/microsoft/vscode/blob/master/src/vs/editor/common/modes/tokenization/typescript.ts -- but I got distracted and didn't finish that work )

aeschli · 2020-05-14T09:45:59Z

An object or an enum transports the same information

No, the object can eventually express overlapping information:

interface CompletionContextInfo {
  atNewStatement: true
  atNewDeclaration: true;
}

meaning, the current location is good to complete a declaration (e.g. a new function), and it is also good to complete a new statement (a log statement).
A snippet would then new able to specify for which contexts it wants to show up.

{ inComment: true, inDocComment: true} is a similar, valid example. inDocComment could have been added at a later release, without breaking any users of inComment.
{ inComment: true, inString: true} might be invalid, but maybe there's a crazy language that has 'comment strings'.

jrieken · 2020-05-14T10:48:35Z

In conclusion, I think it is prudent to expose only the standard token types

Just to clarify that I am 💯% for that - I don't think we should leak TM tokens anywhere

Talking about shortcuts, we could simply continue in the spirt of emmet which found it own very "pragmatic" workaround:

vscode/src/vs/workbench/contrib/emmet/browser/emmetActions.ts

Line 86 in 85d9833

    
           return commandService.executeCommand<void>('emmet.expandAbbreviation', EmmetEditorAction.getLanguage(modeService, editor, grammarContributions));

jrieken · 2020-05-19T14:30:06Z

Despite ongoing discussion, I have implemented the getTokenInformationAtPosition proposal because that's fit for completions (position-based requests) and link detection. The alternatives, context or registration-time data, don't work for the link provider case and therefore I went with an explicit function. This has the problem of its data being likely out of sync. Overall, this needs more thinking and "problem-scoping"

alexdima · 2020-06-04T10:31:05Z

How about if the link provider returns a link collection and says something like "only apply these links on comments" and then the UI side takes the incoming links and removes the ones which are not on comments?

And how does this really work? Do we really want to make 50 cross renderer-extension host requests in a file with 50 occurences of #123 ? And that on each keystroke since each keystroke invokes the link computer? IMHO that is a very bad solution.

alexdima · 2020-06-04T10:44:11Z

This looks to be quite serious. I have created #99356

jrieken · 2020-06-04T11:17:33Z

This looks to be quite serious.

I'd say it is serious if it freezes the UI but generally this is as severe as any API call that's being invoked many, many times, e.g in a loop. If we think the sample in #99356 is realistic then we can easily debounce calls the API-layer. However, as I have repeatedly said this is temporary solution to get us off the start line. The API is proposed and we have no plans to finalise it.

jrieken · 2020-06-15T14:31:11Z

closing this as duplicate of #580. tho, no further action planned

alexr00 assigned aeschli Feb 26, 2020

alexr00 mentioned this issue Feb 26, 2020

Only suggest issues if already in a comment? microsoft/vscode-pull-request-github#1521

Closed

aeschli assigned jrieken and unassigned aeschli Mar 2, 2020

jrieken added this to the May 2020 milestone May 13, 2020

jrieken added api-proposal editor-symbols definitions, declarations, references tokenization Text tokenization feature-request Request for new features or functionality and removed editor-symbols definitions, declarations, references labels May 13, 2020

kieferrm mentioned this issue May 18, 2020

Iteration Plan for May 2020 #97432

Closed

61 tasks

jrieken added a commit that referenced this issue May 19, 2020

first cut of token_at_position api, #91555

7ab052a

jrieken modified the milestones: May 2020, June 2020 May 28, 2020

alexdima mentioned this issue Jun 4, 2020

getTokenInformationAtPosition API causes too much traffic and blocks the renderer process #99356

Open

jrieken removed this from the June 2020 milestone Jun 15, 2020

jrieken closed this as completed Jun 15, 2020

github-actions bot locked and limited conversation to collaborators Jul 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API for extensions to know the symbol kind of a range or position. #91555

API for extensions to know the symbol kind of a range or position. #91555

alexr00 commented Feb 26, 2020

aeschli commented Mar 2, 2020 •

edited

Loading

alexdima commented Mar 2, 2020

alexr00 commented Mar 2, 2020

jrieken commented May 13, 2020

aeschli commented May 13, 2020 •

edited

Loading

jrieken commented May 14, 2020

alexdima commented May 14, 2020

aeschli commented May 14, 2020

jrieken commented May 14, 2020

jrieken commented May 14, 2020

alexdima commented May 14, 2020

aeschli commented May 14, 2020

jrieken commented May 14, 2020 •

edited

Loading

jrieken commented May 19, 2020

alexdima commented Jun 4, 2020

alexdima commented Jun 4, 2020

jrieken commented Jun 4, 2020 •

edited

Loading

jrieken commented Jun 15, 2020

API for extensions to know the symbol kind of a range or position. #91555

API for extensions to know the symbol kind of a range or position. #91555

Comments

alexr00 commented Feb 26, 2020

aeschli commented Mar 2, 2020 • edited Loading

alexdima commented Mar 2, 2020

alexr00 commented Mar 2, 2020

jrieken commented May 13, 2020

aeschli commented May 13, 2020 • edited Loading

jrieken commented May 14, 2020

alexdima commented May 14, 2020

aeschli commented May 14, 2020

jrieken commented May 14, 2020

jrieken commented May 14, 2020

alexdima commented May 14, 2020

aeschli commented May 14, 2020

jrieken commented May 14, 2020 • edited Loading

jrieken commented May 19, 2020

alexdima commented Jun 4, 2020

alexdima commented Jun 4, 2020

jrieken commented Jun 4, 2020 • edited Loading

jrieken commented Jun 15, 2020

aeschli commented Mar 2, 2020 •

edited

Loading

aeschli commented May 13, 2020 •

edited

Loading

jrieken commented May 14, 2020 •

edited

Loading

jrieken commented Jun 4, 2020 •

edited

Loading