Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared ThreadsafeFunction in EventQueue RFC #42

Closed

Conversation

indutny
Copy link

@indutny indutny commented May 19, 2021

Copy link
Member

@kjvalencik kjvalencik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@indutny Thanks for providing this. I think this is a really interesting idea.

One complexity concern I have is how dropping a referenced EventQueue would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0, the shared queue needs to unreference itself.

However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.

The example above wouldn't print "every second" once per second. The prints
will be delayed by the closures scheduled on `another_queue`.

While the example above reproduces the issue - it is quite pathological and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this use case is valid. The example is a bit extreme for the sake of simplicity to demonstrate the point, but I do think it could be an issue in real apps.

For example, consider a Node app that's part of a data processing pipeline. It would be expected to saturate resources and that might include a stream of events on the event loop. It may be beneficial to also be able to cancel the pipeline at any time.

If command events are served on a separate channel from processing events, it allows the app to push large volumes of data without delaying commands. Alternatively, the app would need to manage back-pressure.

Copy link

@indutny-signal indutny-signal May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data processing pipeline would likely be bound by IO and so it wouldn't be able to trigger the 1000 call loop in the n-api optimization, but the starvation can indeed be triggered by posting way too many events that take long time to process from one producer and some events from another producer that should interrupt that stream.

Was the preemption ever documented for neon APIs? I'm looking through the comments in the EventQueue.rs and the only thing that they say is that the function will be executed in JS thread without saying anything about preemption between different queues.

That being said the example is indeed valid.

store instances of `EventQueue` in it themselves, shifting the burden of
performance optimization from Neon's hands to users.

Another alternative would be to make `EventQueue::new()` use new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the better solution.

  • cx.queue(): Handle to global queue
  • EventQueue::new: Creates a new queue
  • (Maybe) EventQueue::handle<'a, C: Context<'a>>(&self, cx: &mut C). Create an independent handle to a queue. cx.queue() could use this internally.

I think this also extends better to bounded queues. We might have EventQueue::bounded(1000) and I think the asymmetry to EventQueue::new and EventQueue::bounded could be confusing. It's more clear if they both create new queues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to change RFC to add this? It already matches the PR that I've opened for the change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll probably just do it. 😁

improvements for Neon users.

These improvements are perhaps most noticeable when Neon addon is used from the
Electron, because scheduling new UV ticks in Electron are very costly.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I've come across ticks being more expensive in Electron than normal Node. Do you happen to have a reference?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a comment with an example app that uses C++ node-addon-api to trigger it: electron/electron#28957 (comment)

The high-level overview is that electron polls the UV loop in another thread and interrupts the browser's event loop to schedule UV callbacks. Sadly, these interruptions don't happen immediately because of the interactivity that the browser process has to maintain. The delay is a fraction of millisecond, but you can see how it is easy for it to add up to 2-3 seconds depending on how many closures you want to invoke with EventQueue.

```

Calling `queue.send()` several times in a row using different `cx.queue()` would
be significantly faster:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you happen to have any benchmarks showing the performance benefit? I think that would help to add some weight to this RFC. I don't have any intuition to the difference. 1us, 10us, 1ms?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. Running app/preload.js in this repo with unoptimized threadsafe function takes 128ms vs 10ms with optimized.

With the recent [node.js changes][0] `ThreadsafeFunction` will loop over pending
calls, invoking up to 1000 of them in the same tick instead of running each new
call on a separate UV loop tick. This is an implementation detail that is
intentionally not covered by `napi_create_threadsafe_function` documentation to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I think the word "intentionally" should be removed. There's no discussion on the implementation PR that not documenting the behavior is intentional or not.

intentionally not covered by `napi_create_threadsafe_function` documentation to
allow for potential performance optimizations in the future.

Neon doesn't currently leverage it without the proposed changes because a new
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neon does leverage the change, but it requires the user to continue to re-use the same queue. This is one of the reasons queues are Sync.

I think it's important to note that this is an API change with the intent to guide users towards the choice that is optimal in most situations.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@indutny Thanks for providing this. I think this is a really interesting idea.

One complexity concern I have is how dropping a referenced EventQueue would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0, the shared queue needs to unreference itself.

However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.

This is indeed the case, but I don't believe that it will cause any performance losses because of how fast the threadsafe functions are now. I've added an extra optimization to the PR to neon to avoid this extra work when EventQueue owns the ThreadsafeFunction and will drop it.

creates a new `EventQueue` every time as requested. So instead of doing that it
is proposed that we use existing `lifetime::InstanceData` internal API layer to
hold a wrapper around `ThreadsafeFunction` tailored to execute Rust closures
specifically, and to use an `Arc` reference to a `RwLock` over this wrapper in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a RwLock necessary? The only non-threadsafe operations on EventQueue are unref and reference. Both of these require a Context which already enforces single threaded-ness.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The .reference()/.unref() methods of ThreadsafeFunction have mut self, so ThreadsafeTrampoline needs to be mutable in order to call them. Is there any better way of doing it?

@indutny
Copy link
Author

indutny commented May 19, 2021

@indutny Thanks for providing this. I think this is a really interesting idea.

One complexity concern I have is how dropping a referenced EventQueue would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0, the shared queue needs to unreference itself.

However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.

This indeed the case, but there's little to no performance penalty for doing it this way with the shared ThreadsafeFunction because it is stupid fast in Node now 😂 I've added an optimization to my PR to avoid this extra work for the EventQueue::new() because it owns the ThreadsafeFunction and will drop it automatically.

Thank you so much for a thorough review. I believe I've addressed your feedback. PTAL.

@kjvalencik
Copy link
Member

Closing because the contents have been merged into #32

@kjvalencik kjvalencik closed this Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants