Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node-api ThreadsafeFunction is a leaky abstraction #38775

Open
indutny opened this issue May 22, 2021 · 14 comments
Open

node-api ThreadsafeFunction is a leaky abstraction #38775

indutny opened this issue May 22, 2021 · 14 comments
Labels
node-api Issues and PRs related to the Node-API.

Comments

@indutny
Copy link
Member

indutny commented May 22, 2021

The premise of the API is that it should make it easy to call into a loop thread from any other thread(s). However, the semantics of napi_closing and finalize_cb mean that the API user need to wrap calls to call_threadsafe_function and release_threadsafe_function into a mutex, because they are inherently not threadsafe. This technically is documented, but very not obvious given the name of the API object.

I suggest that we change the API making it so that:

  • call_threadsafe_function never returns napi_closing and is safe to call up until internal thread_count drops to 0
  • release_threadsafe_function is safe to call up until internal thread_count drops to 0 irrespective of whether finalize_cb was called or not

Specifically, it sounds like for this to work we'll have to have an inner and outer classes for the ThreadsafeFunction. The inner would hold references to V8 objects and would be deleted on the loop/V8 thread during finalize_cb. The outer would be still accessible to the threads up until they all call the release method.

This is a breaking API change so I'd appreciate if we could have a discussion around that.

cc @nodejs/node-api @nodejs/addon-api @addaleax

See neon-bindings/neon#744 for particular woes on the API user side.

@addaleax
Copy link
Member

Since I was pinged here – my suggestion would be to leave the current feature alone, because it already is far too complex, and instead try to come up with a set of smaller APIs that give users the ability to put together what they need (in particular, a built-in wrapper for uv_async_t).

@indutny
Copy link
Member Author

indutny commented May 22, 2021

I agree. We should probably deprecate current API once we get a replacement to suggest that it is flawed and shouldn't be used.

@jasnell
Copy link
Member

jasnell commented May 23, 2021

my suggestion would be to leave the current feature alone, because it already is far too complex

+1 definitely agree

@Brooooooklyn
Copy link

+1

ThreadSafeFunction implementation is complex too in napi-rs, see napi-rs/napi-rs#522 for example

@indutny-signal
Copy link

@Brooooooklyn hah, this is exactly what I need to deal with in Neon. Thanks for sharing it!

@mhdawson
Copy link
Member

I agree with the "new/smaller" apis and then possibly deprecation.

@kjvalencik
Copy link

kjvalencik commented Jun 1, 2021

I'd be happy to participate in designing a new API. For Neon, I spent a lot of time thinking about this API and trying to simplify it for users. It has a lot of options, but I don't think all of them are strictly necessary (can be recreated by combining other features).

You can also add me to the list of people who missed this subtle need for synchronization in the shutdown path. 😁

Let me know if I can help! Cheers!

@legendecas
Copy link
Member

I'm sorry but I didn't get the tricks between the current API and the proposed API shapes here. The current napi_threadsafe_function could be mirrored to uv_async_t identically in API semantics (while the thread_count is always set to 1).

The problem raised in neon-bindings/neon#744 is the environment is tearing down before the threadsafe function makes their works done. However, when the threadsafe function is correctly referenced (the uv handle is refed), the program should not tear down the environment.

  1. call_threadsafe_function never returns napi_closing and is safe to call up until internal thread_count drops to 0
  2. release_threadsafe_function is safe to call up until internal thread_count drops to 0 irrespective of whether finalize_cb was called or not.

Please note that when the environment is tearing down, all uv handles should already be closed or the process will abort (https://github.com/nodejs/node/blob/master/src/node_worker.cc#L223).

In that case, I believe the current behavior is correct as it is notifying the user code the environment is tearing down and the user code should take care of resource clean-ups. In this sense, what does the API semantic mean when Node.js core leaves the threadsafe function internal representatives to outlives the creation environment?

@kjvalencik
Copy link

@legalcodes The problem raised in neon-bindings/neon#744 is specifically about releasing a threadsafe function and not performing work with it.

My expectation, without respect to implementation, is that a threadsafe function is always safe to release if the user has not released it yet, even if the environment has been torn down. However, currently, it may have already been destroyed and the process will crash.

This is unfortunate because it requires synchronization primitives, a finalize callback, and careful checks outside of the threadsafe function in order to ensure its used properly.

@legendecas legendecas added the node-api Issues and PRs related to the Node-API. label Jun 3, 2021
@legendecas
Copy link
Member

legendecas commented Jun 4, 2021

As discussed in the node-api team meeting today, we need to reproduce the problem in node-addon-api so that we can have a clear vision of the issue.

@Brooooooklyn
Copy link

Brooooooklyn commented Jun 4, 2021

The problem in napi-rs/napi-rs#518 is napi-rs tried to release thread safe function in Drop which was already unref by user, and the thread panic. So I have to maintain the ref count in ThreadSafeFunction struct myself.

@kjvalencik
Copy link

@legendecas I can put together a minimal reproduction in Rust, would that be sufficient? If not, I can also write one in C, but it might take me a bit longer.

@mhdawson
Copy link
Member

@kjvalencik are you still able to provide a reproduce. Having it be in C would be best as the Node-api team is not that familiar with rust and it would also take one more factor out of the equation.

@mhdawson
Copy link
Member

In the Node-api team meeting @NickNaso mentioned he will try to create the C test case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
node-api Issues and PRs related to the Node-API.
Projects
None yet
Development

No branches or pull requests

8 participants