Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Tree-sitter Wasm binding and a Wasm-based web playground #321

Merged
merged 7 commits into from
Apr 28, 2019

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Apr 24, 2019

Closes #252

Features

This PR adds the following:

A Wasm library

Tree-sitter can now be used from JavaScript on the web!

You can run the new script script/build-wasm to create two files tree-sitter.js and tree-sitter.wasm in the target/release directory. This JavaScript library defines a global TreeSitter object with Parser, Tree, and Node classes similar to the classes in node-tree-sitter. Library initialization and parser loading are now asynchronous in the web environment:

await TreeSitter.init();
const JSLanguage = await TreeSitter.Language.load("/path/to/tree-sitter-javascript.wasm");
const parser = new TreeSitter.Parser();
parser.setLanguage(JSLanguage);

A CLI command for compiling parsers to Wasm

If you have docker installed, you can now compile a given Tree-sitter parser (e.g. tree-sitter-python) using the new tree-sitter build-wasm subcommand:

tree-sitter build-wasm path/to/tree-sitter-python

You can omit the path if you're already in the tree-sitter-python directory. This will output a single file called tree-sitter-python.wasm in the current directory.

A Web Playground

When this is merged, there will be a new Playground page on the docs site tree-sitter.github.io that allows you to try a number of different parsers interactively from the web browser.

Caveats

Code Size

The current implementation uses Emscripten's side modules feature to allow each compiled and loaded separately.

Unfortunately, in order to successfully load parser that use C++ in their external scanners, the core library needs to contain any standard library functions that these scanners might use. Currently, I'm following this advice and just including all of libc++ in the core module.

The result is that the core library is quite large 😕

file size (minified) size (gzipped)
tree-sitter.js 793K 155K
tree-sitter.wasm 813K 342K

The individual parser binaries aren't too bad, but are still larger than your average javascript library:

file size (uncompressed) size (gzipped)
tree-sitter-c.wasm 694K 53K
tree-sitter-javascript.wasm 1.0M 83K
tree-sitter-rust.wasm 2.2M 154K

Memory Management

Currently, in order to free the memory that a Parser or a Tree owns on the Wasm heap, you need to call .delete() on it.

In seems like is a common problem for Wasm-backed JS libraries, and a solution is already being implemented: the ECMAScript Weakrefs API, which is in stage 2 of the proposal process. This API will allow us to make get rid of the .delete methods.

Tasks

  • Get parsers loading and working
  • Implement remaining APIs
    • Tree.edit
    • TreeCursor
  • Build the wasm library on CI
  • Add unit tests for the wasm library, based on the ones in node-tree-sitter
  • Make the playground page nicer

Next Steps

There are probably ways to avoid some of this bloat, especially in the core library. For comparison, the native-compiled Tree-sitter library is only around 490K uncompressed. Maybe we can include only a subset of libc++ in the core library. Alternatively, maybe we can somehow include the necessary pieces of libc++ into each parser library.

Emscripten and Wasm both seem to be changing pretty quickly, so I didn't put too much effort into optimizing the builds yet. I think I'll wait and do that in a separate pass.

@maxbrunsfeld maxbrunsfeld force-pushed the wasm branch 3 times, most recently from af43295 to 78dd4f4 Compare April 25, 2019 05:22
@maxbrunsfeld maxbrunsfeld force-pushed the wasm branch 4 times, most recently from e00bdae to bbcba64 Compare April 26, 2019 15:59
@maxbrunsfeld maxbrunsfeld force-pushed the wasm branch 2 times, most recently from 0045f18 to 15aa5ac Compare April 26, 2019 21:20
@maxbrunsfeld maxbrunsfeld merged commit e388311 into master Apr 28, 2019
@maxbrunsfeld maxbrunsfeld deleted the wasm branch April 28, 2019 00:50
maxbrunsfeld added a commit that referenced this pull request May 1, 2019
Add a Tree-sitter Wasm binding and a Wasm-based web playground
@maxbrunsfeld
Copy link
Contributor Author

Ok, I've addressed the code size problem with the core library, by just whitelisting the functions that emscripten should export. Here are the new sizes:

uncompressed:

55K  tree-sitter.js
252K tree-sitter.wasm

gzipped:

15K  tree-sitter.js.gz
83K  tree-sitter.wasm.gz

@curran
Copy link

curran commented Aug 2, 2019

I found myself wondering

Is the WASM build available as an NPM package by any chance?

In case others end up here wondering the same, here it is:

https://www.npmjs.com/package/web-tree-sitter

@pckhoi
Copy link

pckhoi commented Aug 11, 2020

Does the Wasm binary include Tree-sitter Highlight?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WebAssembly bindings
3 participants