Expose a proper tokenization/colorization API to extensions #1967

dajoh · 2016-01-13T01:39:25Z

Rich context-sensitive syntax colorization is very hard to do (if not impossible) with tmLanguage syntax definitions. The functionality for specifying custom colorizers seems to be there, but not exposed to extensions (ITokenizationSupport).

One way to expose colorization would be to just let the extension provide an ITokenizationSupport implementation, and have that completely override the tmLanguage syntax definition (if any).

Another way is to let multiple tokenizers work in parallel, each classifying different tokens of the program. For example: A tmLanguage based tokenizer is used to classify easy tokens such as keywords, strings, and literals. A custom tokenizer is used to classify tokens such as identifiers (which generally need context information, think type names). A reason for wanting this is is that classifying tokens such as identifiers is generally much slower than keywords, separating the tokenizers allow for instant colorization on easy tokens to classify, but harder tokens like identifiers are classified in the background and will eventually be colorized, when ready. It might make sense to only allow one tokenizer per language, but allow for multiple (potentially async) token classifiers.

mattacosta · 2016-01-13T05:19:28Z

Rich context-sensitive syntax colorization is very hard to do (if not impossible) with tmLanguage syntax definitions. The functionality for specifying custom colorizers seems to be there, but not exposed to extensions (ITokenizationSupport).

I too support being able to replace the tokenizer implementation. For example, since I was working on first-class features for a language that I wanted to implement, I had to create an AST and its associated lexer/parser (which was based on flex/bison by the way). It's a shame that I can't reuse the lexer for syntax highlighting as well. Returning the token/position and saving the lexer state doesn't seem that difficult.

Another way is to let multiple tokenizers work in parallel...

This part doesn't really make sense to me though.

Rohansi · 2016-01-13T11:35:57Z

Another way is to let multiple tokenizers work in parallel...

This part doesn't really make sense to me though.

I think they wouldn't be considered as tokenizers from VSCode's point of view. Expanding on @dajoh's example, you could have a tmLanguage tokenizer classifying the simple tokens quickly and have Roslyn parse and return detailed token information from another process. In this case VSCode might only need a way to change a token's style at any time.

jrieken · 2016-01-13T15:06:23Z

@alexandrudima fyi

alexdima · 2016-01-13T15:12:23Z

👍 This is a great request.

jrieken · 2016-01-13T16:18:46Z

related to #580

vilicvane · 2016-05-20T14:09:27Z

I noticed that the project TypeScript-tmLanguage is not "actively" maintained, can I assume that TypeScript will be one of the first gainer on this set of API?

tiansivive · 2016-07-07T15:00:25Z

Just wondering if there's any update on this?

jrieken · 2016-07-07T15:32:05Z

Sorry, nothing to report yet...

DanTup · 2017-05-21T16:10:24Z

I don't suppose it's likely this will happen any time soon?

happyzhao · 2017-05-24T02:18:09Z

Hope this feature would have higher priority. It will make vscode the best code editor for me.

jacehensley-wf · 2017-07-11T21:42:28Z

Has there been any update on this?

scrthq · 2017-07-28T19:00:37Z

Would love to see this added in to allow syntax highlighting for Powershell at the same level that ISE has!!! This would allow me to never touch ISE again, as I currently have to use it to sanity check certain items that aren't highlighting as expected in Code

gwk · 2017-08-24T22:58:30Z

I am very interested in a tokenization / highlighting API. I am developing a lexer generator for multiple output targets and would like to support vscode as a first class output. I put some serious effort into trying to generate regex definitions but because the generated lexer is really a state machine it got very messy.

The key requirement for me would be that the API allow the lexer to maintain state across different lines of source code, e.g. a stack of context. This is necessary to support grammars that are not strictly regular, e.g. nested string interpolation syntaxes (for example, swift lets you nested multiple levels of string interpolation, so the lexer needs a stack to switch between code and string lexing modes; multiline string literals require this state persist across lines).

Another aspect that would be great to see would be documentation on Unicode correctness. For example, I assume that the API would be operating on JavaScript UCS2 strings, and so code points outside of the BMP would be represented as surrogate pairs. Are these counted as 1 or 2 columns by the highlighting engine? This is also important for problem matchers. This stuff gets hard (e.g. deciding how wide a character will render is involved) so I wouldn't expect it to be perfect at first, but it's worth keeping these challenges in mind during the design phase.

bobbrow · 2017-10-18T15:49:55Z

The Microsoft C++ extension is also very interested in this. At the very least, we would like a way to colorize sections of code to mark them as inactive based on #ifdef/#else/#endif/etc sections. It's something that Visual Studio can do, but unfortunately we can't do this with TextMate grammars since the tokens need to be evaluated by the compiler, not regular expressions.

jrieken · 2017-10-31T11:21:53Z

Actually a dupe of #585

isidorn added feature-request Request for new features or functionality api labels Jan 13, 2016

isidorn assigned jrieken Jan 13, 2016

jrieken added this to the Backlog milestone Jan 13, 2016

rkeithhill mentioned this issue Jul 18, 2016

Color coding problem when using [ValidateScript({Test-Path "${_}`:\" -PathType Container})] PowerShell/vscode-powershell#225

Closed

DanTup mentioned this issue Jul 31, 2016

Support programatic colourization #9994

Closed

DanTup mentioned this issue May 21, 2017

Syntax highlighter should allow $ in identifiers. Dart-Code/Dart-Code#308

Closed

This was referenced May 25, 2017

String interpolation + Map = no highlighting Dart-Code/Dart-Code#305

Closed

Syntax highlighting colours function names differently when starting with capital Dart-Code/Dart-Code#311

Closed

bobbrow mentioned this issue Sep 1, 2017

Improve the colorizer (IntelliSense/Semantic-based) microsoft/vscode-cpptools#230

Closed

jrieken closed this as completed Oct 31, 2017

jrieken added the *duplicate Issue identified as a duplicate of another issue(s) label Oct 31, 2017

vscodebot bot locked and limited conversation to collaborators Dec 15, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose a proper tokenization/colorization API to extensions #1967

Expose a proper tokenization/colorization API to extensions #1967

dajoh commented Jan 13, 2016

mattacosta commented Jan 13, 2016

Rohansi commented Jan 13, 2016

jrieken commented Jan 13, 2016

alexdima commented Jan 13, 2016

jrieken commented Jan 13, 2016

vilicvane commented May 20, 2016

tiansivive commented Jul 7, 2016

jrieken commented Jul 7, 2016

DanTup commented May 21, 2017 •

edited

Loading

happyzhao commented May 24, 2017

jacehensley-wf commented Jul 11, 2017

scrthq commented Jul 28, 2017 •

edited

Loading

gwk commented Aug 24, 2017

bobbrow commented Oct 18, 2017

jrieken commented Oct 31, 2017

Expose a proper tokenization/colorization API to extensions #1967

Expose a proper tokenization/colorization API to extensions #1967

Comments

dajoh commented Jan 13, 2016

mattacosta commented Jan 13, 2016

Rohansi commented Jan 13, 2016

jrieken commented Jan 13, 2016

alexdima commented Jan 13, 2016

jrieken commented Jan 13, 2016

vilicvane commented May 20, 2016

tiansivive commented Jul 7, 2016

jrieken commented Jul 7, 2016

DanTup commented May 21, 2017 • edited Loading

happyzhao commented May 24, 2017

jacehensley-wf commented Jul 11, 2017

scrthq commented Jul 28, 2017 • edited Loading

gwk commented Aug 24, 2017

bobbrow commented Oct 18, 2017

jrieken commented Oct 31, 2017

DanTup commented May 21, 2017 •

edited

Loading

scrthq commented Jul 28, 2017 •

edited

Loading