Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SyntaxSetBuilder: Allow to list added syntaxes via a syntaxes() method #340

Merged

Conversation

Enselic
Copy link
Collaborator

@Enselic Enselic commented Jul 4, 2021

This is analogous to the existing public SyntaxSet::syntaxes() method.

Background information

I am doing some prototyping around ways to improve the startup time of bat (see sharkdp/bat#951). And the way to improve startup time is to only load the syntaxes needed for the given input file(s). Instead of having one giant SyntaxSet, we need several smaller ones.

The added method enables clients such as bat to analyze dependencies between SyntaxDefinitions and build up many smaller, self-contained SyntaxSets.

An alternative to this PR could be to move out SyntaxSetBuilder::add_from_folder() to some place where the logic can live and be used independently, i.e. without taking a detour through a SyntaxSetBuilder. That however seems to like a significantly higher hanging fruit at this point.

This is analogous to the existing public `SyntaxSet::syntaxes()` method.
@trishume trishume merged commit 6bcec62 into trishume:master Jul 4, 2021
@trishume
Copy link
Owner

trishume commented Jul 4, 2021

This seems fine with me. You may have already thought about this, but syntaxes have dependencies on each other due to embedding, and while sometimes syntaxes will only depend on one or two others, there's other cases like the Ruby or Markdown syntax where it can reference tons of different languages based on the embedded snippet name. If you don't share the serialized definition this could bloat the bat binary a lot. Sharing them will require linking them together after loading the different syntaxes (which might be pretty fast and not a problem).

As a more ambitious project I'd also accept a sufficiently good contribution of an implementation of lazy-loading syntaxes on the first usage of a rule with a cross-syntax reference. This would also allow faster startup for languages like Markdown which reference a lot of languages, although given the number of syntaxes that bat includes Markdown will be faster even with the basic version.

@Enselic Enselic deleted the add-syntax-set-builder-syntaxes-method branch July 5, 2021 03:29
@Enselic
Copy link
Collaborator Author

Enselic commented Aug 1, 2021

@trishume There is absolutely no rush or urgency, but it would be great if you could make a new release of syntect where this API is available, when you get time.

Maybe I can help you update CHANGELOG.md for a new release? Just let me know.

Also thank you for the pointers with regards to dependencies between syntaxes. In case you are curious, the bat code to analyze dependencies, and the output of it, can be found in this PR: sharkdp/bat#1762 (which is also the PR that depends on this added API).

@trishume
Copy link
Owner

trishume commented Aug 1, 2021

No problem, thanks for reminding me. This was released in v4.6.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants