Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: Move inline tokens to external scanner and split the grammar into two pieces #55

Closed
wants to merge 46 commits into from

Conversation

treeman
Copy link
Owner

@treeman treeman commented Sep 9, 2024

This is a major rewrite containing two large changes and many smaller improvements.

The biggest change is that it splits the parser in two (like tree-sitter-markdown) where two parsers must be used to parse a Djot file. This is a big breaking change as many capture groups have also changed so all query files needs to be updated.

The other big change is to let the external parser collect a stack of inline elements, solving a bunch of issues from the Djot spec (such as parsing *not strong *strong* properly).

In the process I also fixed a bunch of other issues.

Closes #41, #42, #43, #44, #45, #46, #49, #50, #52, #53, #54

@clason
Copy link

clason commented Oct 22, 2024

@treeman Can you explain what happened here? It looks like the master branch now has the (WIP?) split parser, which broke nvim-treesitter, but this PR was not merged.

@clason
Copy link

clason commented Oct 22, 2024

And while I'm here: tree-sitter build complains about

Warning: Found non-static non-tree-sitter functions in the external scannner
  `_init`
  `_set_delayed_token`
Consider making these functions static, they can cause conflicts when another tree-sitter project uses the same function name

@treeman
Copy link
Owner Author

treeman commented Oct 22, 2024

@treeman Can you explain what happened here? It looks like the master branch now has the (WIP?) split parser, which broke nvim-treesitter, but this PR was not merged.

Oh crap... I must've accidentally pushed the split branch into master somehow. That's what I get for not making master protected. I've reverted master now.

Anyway, this branch is should be stable and I've been using it for a few months without issues. Top-level package.json and similar files are missing (markdown has them) and I'm not sure how to test those properly...

But we should be able to make nvim-treesitter use the split parser approach (we need to update the grammars though). This should fix a bunch of bugs and hopefully be faster in some cases.

@clason
Copy link

clason commented Oct 22, 2024

Oh crap... I must've accidentally pushed the split branch into master somehow. That's what I get for not making master protected. I've reverted master now.

Thank you!

Anyway, this branch is should be stable and I've been using it for a few months without issues. Top-level package.json and similar files are missing (markdown has them) and I'm not sure how to test those properly...

Just do tree-sitter generate (or tree-sitter init --update) with the latest CLI 0.24.3 and you'll get all those (and more)... I don't think you need to test them, but you could look at the workflows used in https://github.com/tree-sitter-grammars/template.

But we should be able to make nvim-treesitter use the split parser approach (we need to update the grammars though). This should fix a bunch of bugs and hopefully be faster in some cases.

Sure! Someone needs to make a PR to update the parsers and -- more importantly -- the queries, though. (And that someone has to be you, I'm afraid ;))

@treeman
Copy link
Owner Author

treeman commented Jan 25, 2025

The split parser is basically done.

However after playing around with the bindings I realized that using a split parser gives a fairly negative user experience:

  • Users needs to install two grammars instead of just one
  • The parse tree is quite ugly with lots of extra inline nodes
  • It's annoying to have multiple grammars if you want to do something custom using the parser (such as tree-sitter supported transformations)

So I'll try to backport the fixes and features to the single grammar version. I don't know if I can do that in a good way but I want to try it before committing to the more awkward solution of multiple grammars.

treeman added a commit that referenced this pull request Jan 28, 2025
Complete rewrite of the inline parser together with
many changes for the block parsing.
Should make the parser follow the spec a lot closer
in particular with inline precedence rules.

#55
@treeman
Copy link
Owner Author

treeman commented Jan 28, 2025

Closed in favor of #56

@treeman treeman closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Verbatim inside indented list isn't recognized
2 participants