Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share sublime syntaxes #919

Open
Keats opened this issue Apr 14, 2020 · 9 comments
Open

Share sublime syntaxes #919

Keats opened this issue Apr 14, 2020 · 9 comments
Labels
feature-request New feature or request

Comments

@Keats
Copy link

Keats commented Apr 14, 2020

Bat and Zola both use syntect for highlighting syntaxes and have their own folder for additional syntaxes on top of the official Sublime repo. It would be a good idea to merge so there is one known source of syntaxes that all of syntect users can contribute to.

The idea would be:

  • have a third party repo containing the syntax files
  • keep the patches that bat has since they solve issues
  • pretty much any .sublime-syntax should be accepted as long as it's not a duplicate of an existing one (Zola highlights .srt for example, which is not strictly a programming language but can be useful in various contexts)
  • build step is up to the consumer: bat seems to have a built in method for it while Zola has a small Rust script (https://github.com/getzola/zola/blob/master/components/config/examples/generate_sublime.rs) that prints a blurb pasted in the docs.

A bonus point for syntect is to have a Rust CI setup on that repo that tests every syntax for both rust-onig and fancy-regex, which can be used to report potential bugs/issues.

@Keats Keats added the feature-request New feature or request label Apr 14, 2020
@keith-hall
Copy link
Collaborator

I've seen that quite often, tmLanguage syntax grammars are manually converted to sublime-syntax format for use in syntect. Have you considered adding support for tmLanguage files directly in syntect, it seems it could save some hassle? They essentially support a subset of sublime-syntax functionality, so the existing data structures can be used, it would just be a case of parsing the plist/xml format...

@sharkdp
Copy link
Owner

sharkdp commented Apr 19, 2020

@Keats Thank you for your suggestion. As initially discussed in trishume/syntect#287, that sounds like a good idea in principle.

Before we go forward with this, I'd first like to think about the possible downsides as well. I'm only interested in following this route, if it really leads to less maintenance work overall. There are also a few questions/problems that would need to be sorted out.

Pros:

  • Synergy effect: all projects immediately benefit from contributions to the shared repo.
  • Maintenance work for including new syntaxes can be shared (PR reviews, bug reports, patches, etc.)
  • Part of the tooling and testing could be shared. CI would have to be set up just once.

Cons (some of them are specific to bat):

  • Dealing with submodules (of submodules) is not a great experience and adds friction in the contribution/maintenance process. This is kind of an obvious point, but a really big downside in my point of view. For now, contributors to bat can easily open a new PR on the bat repo to get a syntax included. With this change, we need two PRs on two separate repos to get a change in. Similarly, if bug reports against a syntax are reported, they will be reported in the downstream project repositories (yes, ideally users would directly report them upstream, but from experience I know that they will not). All of these issues have to be transported "upstream" to the syntaxes repository, duplicating the amount of issues that need to be managed. ChangeLogs and commit messages are similarly affected by this and often need to be duplicated in two places.
  • (Downstream) testing and syntax files are disconnected: In bat, we currently have a few tests in place that ensure certain things about the syntax set that is shipped with bat. We are planning to extend these kinds of tests in the future. If we move the syntaxes to another repository, there is a risk of catching bugs "too late" (only after changes have been merged upstream at the point where the submodule is updated).
  • I currently don't see this as a problem, but there might very well be situations where different downstream projects have conflicing views about the "design" of the syntax repository.

Potential issues:

  • "pretty much any .sublime-syntax should be accepted" => this is not how we approach this in bat. I'm quite hesitant to add new syntaxes due to more runtime overhead (see bat is extremely slow (>100x slower than cat) for 4 char file #925 (comment)). We currently have the following "policy" in place: syntax request issue template. How would we deal with something like this? We could probably have a list of "prioritized" categories with (1) syntaxes in included in sublimehq/Packages, (2) other popular syntaxes, by some measure (3) less popular syntaxes (4) exotic syntaxes. Downstream projects could then decide on which categories to include?
  • "pretty much any .sublime-syntax should be accepted as long as it's not a duplicate of an existing one" => in bat, we currently have a few syntaxes that override base syntaxes in Packages. At the moment we have custom JavaScript and Haskell syntaxes, I believe.

In summary, I would currently actually tend towards keeping things as they are. Maintaining bat is a large amount of work already and I see a certain chance that this change would increase the amount of maintenance work. If someone can convince me otherwise, I'm happy to consider implementing this.

@sharkdp
Copy link
Owner

sharkdp commented Apr 19, 2020

Related discussion on the syntect repo: trishume/syntect#168

@pksunkara
Copy link

Pasting my comment from a different issue:

I think the whole community would benefit if you can extract out the syntax highlighting data and maintain it separately (maybe a git submodule?). I haven't see anyone else in the rust ecosystem do it as good as you and they all have issues like this (it works in bat). Let's try to reduce duplication work and keep the efforts in one place.

I understand the concerns about git submodule. I am not knowledgable about syntect, but I would like to enquire about the feasibility of doing the syntax data as a crate? That way, we can keep it in this repo (using cargo workspaces).

@epage
Copy link
Contributor

epage commented Oct 7, 2022

Keats originally opened this up about having a central dumping ground for syntaxes which had some downsides (bloat, worse development experience for bat). We've seen though that there are cases where tools are fine "just doing whatever bat does" (broot, delta). I'm yet another person who wants to copy bat in my own tool, git-dive, a potential replacement for git-blame (so maybe #1536 won't be needed if this works out?).

delta is taking the approach of depending on bat. That requires pulling in a lot of stuff that I won't be using. The alternative I wanted to propose was one or multiple of

  • A syntax and a theme asset crate
  • A HighlightingAssets crate(s) (unsure if syntax and theme should be split out into separate crates)
    • Digging into this code, it feels there is a lot of this is generally useful

These crates would reside in the bat repo and be part of a bat workspace. I know managing multiple crates adds extra overhead. cargo-release has been a big help for me and I would be willing to take input for what, if anything, would be needed for bat to use it. Of course, there are also several other tools in that space.

There is still changelog management (I'm hopeful [cargo-changelog])https://crates.io/crates/cargo-changelog) will help) and selecting versions. For versions, I use conventional commit style so I'm less likely to miss breaking changes when choosing my version but I would even be fine if these crates bumped their major version on every release.

@sharkdp
Copy link
Owner

sharkdp commented Nov 4, 2022

@epage Thank you for your input. I'm willing to consider this now that I have some experience with cargo workspaces from another project.

The alternative I wanted to propose was one or multiple of

* A syntax and a theme asset crate

* A `HighlightingAssets ` crate(s) (unsure if syntax and theme should be split out into separate crates)
  
  * Digging into this code, it feels there is a lot of this is generally useful

I'm not sure I understand what the difference is? What would the first "syntax and theme asset crate" be? A very low-level crate that basically just includes the binary assets and one or two functions to get access to them (in a &[u8] sense)?

While that would allow even more flexibility, I'd still lean towards keeping the number of crates minimal. If we would have a proper cargo workspace though, we could also cleanly separate the bat crate into a library and the CLI.

@epage
Copy link
Contributor

epage commented Nov 4, 2022

Yes, the crates would be

  • Low level access to just the themes and syntaxes (decompresses if needed so people can integrate it with syntect)
  • The current asset loader (from disk or in-memory from above crate(s))

The reason I was thinking multiple crates

  • Access to the in-memory syntaxes and themes is more universal than the strategy for loading them from disk but the disk loader is still likely useful enough
  • (lower priority) SSGs might want the syntax support but not care about themes

@CosmicHorrorDev
Copy link
Contributor

I'm seemingly in the same position as @epage where I just want to do whatever bat does in terms of providing extra syntaxes (and to a lesser extent themes) without having to pull in all of bat as a lib

I'd be up for putting in the work to either provide it as a separate crate(s) in a workspace here, or just providing it as a separate crate on crates.io if it's deemed too much effort to maintain here

@CosmicHorrorDev
Copy link
Contributor

I went ahead and threw together a quick crate that roughly has the core functionality that I would want. All it currently provides:

  • Ways to load the embedded set of syntaxes, themes, and acknowledgements (with feature flags for each one to avoid pulling in unneeded embedded assets)
  • The LazyThemeSet implementation instead of syntect's ThemeSet to avoid having to deserialize and decompress the entire set just to use a single theme

Some misc internal changes include switching the create.sh shell file that bat uses to a cargo-xtask task along with changing the license detection logic from matching on snippets to askalono (The crate that powers cargo-deny's license detection logic)

This doesn't add the asset loader logic that bat currently uses, but that should be a pretty simple addition (I just don't need it for my purposes currently)


Re:

I'm only interested in following this route, if it really leads to less maintenance work overall.

It would be pretty easy to switch things over and integrate some variation of my crate into bat, but I'm pretty doubtful that it will reduce the maintenance work. I'd be happy to just provide the crate separately and quietly keep things up to date with bat's assets in the background

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants