-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a Tree-sitter Wasm binding and a Wasm-based web playground #321
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
maxbrunsfeld
force-pushed
the
wasm
branch
3 times, most recently
from
April 25, 2019 05:22
af43295
to
78dd4f4
Compare
maxbrunsfeld
force-pushed
the
wasm
branch
4 times, most recently
from
April 26, 2019 15:59
e00bdae
to
bbcba64
Compare
maxbrunsfeld
force-pushed
the
wasm
branch
2 times, most recently
from
April 26, 2019 21:20
0045f18
to
15aa5ac
Compare
maxbrunsfeld
added a commit
that referenced
this pull request
May 1, 2019
Add a Tree-sitter Wasm binding and a Wasm-based web playground
Ok, I've addressed the code size problem with the core library, by just whitelisting the functions that emscripten should export. Here are the new sizes: uncompressed:
gzipped:
|
I found myself wondering
In case others end up here wondering the same, here it is: |
Does the Wasm binary include Tree-sitter Highlight? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #252
Features
This PR adds the following:
A Wasm library
Tree-sitter can now be used from JavaScript on the web!
You can run the new script
script/build-wasm
to create two filestree-sitter.js
andtree-sitter.wasm
in thetarget/release
directory. This JavaScript library defines a globalTreeSitter
object withParser
,Tree
, andNode
classes similar to the classes in node-tree-sitter. Library initialization and parser loading are now asynchronous in the web environment:A CLI command for compiling parsers to Wasm
If you have
docker
installed, you can now compile a given Tree-sitter parser (e.g.tree-sitter-python
) using the newtree-sitter build-wasm
subcommand:You can omit the path if you're already in the
tree-sitter-python
directory. This will output a single file calledtree-sitter-python.wasm
in the current directory.A Web Playground
When this is merged, there will be a new
Playground
page on the docs site tree-sitter.github.io that allows you to try a number of different parsers interactively from the web browser.Caveats
Code Size
The current implementation uses Emscripten's side modules feature to allow each compiled and loaded separately.
Unfortunately, in order to successfully load parser that use C++ in their external scanners, the core library needs to contain any standard library functions that these scanners might use. Currently, I'm following this advice and just including all of libc++ in the core module.
The result is that the core library is quite large 😕
tree-sitter.js
tree-sitter.wasm
The individual parser binaries aren't too bad, but are still larger than your average javascript library:
tree-sitter-c.wasm
tree-sitter-javascript.wasm
tree-sitter-rust.wasm
Memory Management
Currently, in order to free the memory that a
Parser
or aTree
owns on the Wasm heap, you need to call.delete()
on it.In seems like is a common problem for Wasm-backed JS libraries, and a solution is already being implemented: the ECMAScript Weakrefs API, which is in stage 2 of the proposal process. This API will allow us to make get rid of the
.delete
methods.Tasks
Next Steps
There are probably ways to avoid some of this bloat, especially in the core library. For comparison, the native-compiled Tree-sitter library is only around 490K uncompressed. Maybe we can include only a subset of libc++ in the core library. Alternatively, maybe we can somehow include the necessary pieces of libc++ into each parser library.
Emscripten and Wasm both seem to be changing pretty quickly, so I didn't put too much effort into optimizing the builds yet. I think I'll wait and do that in a separate pass.