Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert module to use NAPI instead of Nan #52

Closed
wants to merge 1 commit into from
Closed

Conversation

maxbrunsfeld
Copy link
Contributor

@maxbrunsfeld maxbrunsfeld commented Nov 6, 2019

Summary

This PR reworks all of the native code to use Node's new ABI-stable addon API (napi), instead of using NAN, a C++ helper library that wraps the V8 and Node APIs.

Benefit

ABI-stability means that this module will no longer need to be recompiled when upgrading to a new version of Node.js or electron. Even better, this is the first step toward individual language modules like tree-sitter-javascript having the same ABI stability.

In addition, Napi is also just a much cleaner and more consistent API to use than Nan + V8.

Limitations

Unfortunately, we can't achieve true ABI-stability all in one fell swoop. This module has two binary interfaces that currently depend on Nan / V8 directly:

  1. Retrieving the TSLanguage * pointers out of Language objects from language modules. Currently, the Node.js binding code that the tree-sitter CLI generates for individual languages are reliant on nan and the V8 ABI 😞 .
  2. Parsing native TextBuffer objects provided by superstring - For performance/concurrency reasons, this module can interact with TextBuffer objects directly through native code, without calling back into JavaScript. This interface also relies on nan/V8 😭 .

The module won't truly have a stable ABI until these ☝️ things are fixed. I will have to update the Tree-sitter CLI to generate binding files that use napi. I probably won't attempt to convert superstring to NAPI, but I can update the TextBufferSnapshotWrapper class, which is the one shared between modules, to expose its data pointer in a way that doesn't depend on Nan.

@rien
Copy link

rien commented Dec 3, 2019

How usable is this PR? I'm still suffering from #53 and I was wondering if this could maybe solve the issue.

@maxbrunsfeld
Copy link
Contributor Author

I don't think #53 is due to bugs in the native code; I think it's because some JavaScript code is running multiple times, even though it is designed to only be run once.

That said, this PR should be usable. All the tests pass, I just haven't verified it yet in any real application, like Atom.

@drom
Copy link

drom commented Oct 14, 2020

@maxbrunsfeld do you plan to merge this PR?
It would super useful to switch to NAPI

@gpetrov
Copy link
Contributor

gpetrov commented Oct 15, 2020

I'm looking forward to the NAPI integration as well, so that we can use the powerful tree-sitter in multi threads and web workers.

@cellog
Copy link

cellog commented Feb 15, 2021

@maxbrunsfeld I spent some time experimenting with how to upgrade the languages to use N-API. Turns out to be smaller than I expected. Here's the Javascript update:

bindings.cc:

#include "tree_sitter/parser.h"
#include <napi.h>

using namespace Napi;

extern "C" TSLanguage * tree_sitter_javascript();

void NoOpFinalizer(Napi::Env _env, TSLanguage *_ignore) {
  /* do nothing, no need to free */
}

class TreeSitterJavascriptBinding : public Napi::Addon<TreeSitterJavascriptBinding> {
  public:
    TreeSitterJavascriptBinding(Napi::Env env, Napi::Object exports) {
      DefineAddon(exports, {
        InstanceValue("name", Napi::String::New(env, "javascript"), napi_enumerable),
        InstanceValue("__parser", Napi::External<TSLanguage>::New(env, tree_sitter_javascript(), NoOpFinalizer), napi_default)  
      });
    }
};

NODE_API_ADDON(TreeSitterJavascriptBinding)

binding.gyp:

{
  "targets": [
    {
      "target_name": "tree_sitter_javascript_binding",
      "include_dirs": [
        "<!@(node -p \"require('node-addon-api').include\")",
        "src"
      ],
      "sources": [
        "src/parser.c",
        "src/scanner.c",
        "src/binding.cc"
      ],
      "cflags_c": [
        "-std=c99",
      ],
      "defines": [ "NAPI_DISABLE_CPP_EXCEPTIONS" ]
    }
  ]
}

index.js:

module.exports = require("bindings")("tree_sitter_javascript_binding");

try {
  module.exports.nodeTypeInfo = require("./src/node-types.json");
} catch (_) {}

This creates a non-enumerable property __parser which is the TSLanguage *. Updating the node-tree-sitter will involve retrieving the __parser property off of the language passed in, and verifying it's there, then extracting its value with the Data() method. The rest of the changes are pretty much the same. I'm going to try running the N-API conversion.js script on master and update to use my fixed up Javascript language. if it works, I'll see if I can figure out how to mod the tree-sitter-cli to spit out this changed setup. Of course, this is a breaking change, but as long as everything is bumped a major version, should be good to go.

I made this little test script:

const Javascript = require(".");

console.log(Javascript.name);
console.log(Javascript.__parser);

output:
Screen Shot 2021-02-15 at 5 56 04 PM

@cellog
Copy link

cellog commented Feb 15, 2021

last note for now. This is what I think the UnwrapLanguage function becomes with this change:

const TSLanguage *UnwrapLanguage(Napi::Env env, const Napi::Value &value) {
  if (value.IsObject()) {
    Napi::Object arg = value.As<Napi::Object>();
    Napi::Value name = arg.Get("name");
    if (!env.IsExceptionPending()) {
      if (name.IsString()) {
        Napi::Value parser = arg.Get("__parser");
        if (!env.IsExceptionPending()) {
          if (parser.IsExternal()) {
            Napi::External<TSLanguage> parserInfo = parser.As<Napi::External<TSLanguage> >();
            const TSLanguage *language = (const TSLanguage *) parserInfo.Data();
            if (language) {
              uint16_t version = ts_language_version(language);
              if (
                version < TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION ||
                version > TREE_SITTER_LANGUAGE_VERSION
              ) {
                std::string message =
                  "Incompatible language version. Compatible range: " +
                  std::to_string(TREE_SITTER_MIN_COMPATIBLE_LANGUAGE_VERSION) + " - " +
                  std::to_string(TREE_SITTER_LANGUAGE_VERSION) + ". Got: " +
                  std::to_string(ts_language_version(language));
                Napi::RangeError::New(env, message.c_str()).ThrowAsJavaScriptException();
                return nullptr;
              }
              return language;
            }
          }
        }
      }
    }
  }
  Napi::TypeError::New(env, "Invalid language object").ThrowAsJavaScriptException();

  return nullptr;
}

@talbergs
Copy link

talbergs commented Mar 11, 2021

Duplicate of #81 ?

@ahlinc
Copy link
Contributor

ahlinc commented Mar 11, 2021

@talbergs no, it's not duplicate, the #81 merge just fixed SegmentationFault for current Nan implementation.
Napi is coming soon, it requires more work than this old PR.

@gpetrov
Copy link
Contributor

gpetrov commented Nov 21, 2021

@maxbrunsfeld @cellog
With all those Node and Electron versions a good NAPI integration is getting more and more essential.

Also please consider using the Node::Buffer instead of ArrayBuffer due to the many depreciations, as advised in electron/electron#29893 (comment)

I tried to patch things up in #95 but still a full conversion and NAPI support with node buffer would be much better!

@maxbrunsfeld
Copy link
Contributor Author

Hi, would anyone be willing to take on the task of merging master into this branch? I think the time has come to land this PR.

For context - when I first authored this PR, NAPI was a bit newer, and I was worried about the risk of switching to it given that Atom relies on this node module. But at this point in time, the Node.js ecosystem has changed a lot, and it's more important than ever to switch over to NAPI. Also, Atom's pace of development has slowed down significantly, to the point where I don't think it's worth holding this up on that application's behalf.

As far as I know, the biggest chunk of work to be done is updating the new Query class (which was added after I created this PR) to use NAPI. There are various other conflicts as well.

If someone who uses this module could take ownership of this effort, that would be very helpful, and I would definitely merge it if somebody can get the conflicts resolved and tests passing again. I can give advice that pertains to the Tree-sitter APIs and C++ questions, but it has been quite a while since I've used Node.js for anything, so I have less ability to advise about Node-specific questions.

@verhovsky
Copy link
Collaborator

@MichaelBelousov
Copy link

I don't use it a lot currently but I would be happy to try figure out where I left things and determine if I can make sure tests pass and then try to merge.

@MichaelBelousov
Copy link

I can't build master on my ubuntu box even after downgrading node to 14.x and npm to 7.x, I get the same thing as #72 (comment)
May need to fix that first.

@verhovsky
Copy link
Collaborator

verhovsky commented Mar 9, 2023

@MichaelBelousov I just made node-tree-sitter build on Node 14 and Ubuntu for #127, try my master branch.

https://github.com/verhovsky/node-tree-sitter/tree/master

It depends on the tree-sitter repository and in my fork I updated it to the latest master commit.

@MichaelBelousov
Copy link

MichaelBelousov commented Mar 9, 2023

I tried #127 and it didn't work... but I just took a look and it's because I didn't init submodules when cloning. Just did, so it worked now.

@verhovsky
Copy link
Collaborator

I did the same thing today.

@cellog
Copy link

cellog commented Mar 9, 2023

I'm really grateful you're putting energy into this - it would shave minutes off of the i18n code parser we use for extracting translations if we could move to node instead of wasm!

@MichaelBelousov
Copy link

after some minor tweaks, tests pass but there's a segfault at exit during when trying to napi_delete_reference a null reference. I will dig into that when I have time in the next week. If someone wants to dig into that sooner themselves, let me know. Feel free to ping for a progress report if it's taking a while.

@MichaelBelousov
Copy link

It's the module_exports static ObjectReference in src/node.cc, because it's static, the destructor gets called after napi has invalidated all references, so calling its destructor is invalid, not sure why I didn't notice that in the debugger yesterday. I just needed to add a SuppressDestruct call. I could also move module_exports to the env's instance data which is safer but I think that can be done in this PR after this merges.

I have pruned my personal changes and rebased to try to make the history a bit cleaner. I have created a draft PR #129 while I fix conflicts

@maxbrunsfeld
Copy link
Contributor Author

Nice work @MichaelBelousov. Thanks for the update.

@MichaelBelousov
Copy link

conflicts are fixed. Continuing discussion on the PR

@MichaelBelousov
Copy link

just a note, currently this PR uses a lot of static Napi::ObjectReference's In order for this to work in a multi-threaded settings (e.g. node's worker_threads), Napi::Env.SetInstanceData should be used instead, with some added class like BindingsEnv to hold all the things that are currently static in that way.

@undefined-moe
Copy link

The package currently doesn't work in worker_threads, is this pull request going to fix this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.