-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shared ThreadsafeFunction in EventQueue RFC #42
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@indutny Thanks for providing this. I think this is a really interesting idea.
One complexity concern I have is how dropping a referenced EventQueue
would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to 0
, the shared queue needs to unreference itself.
However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.
text/0030-shared-event-queue.md
Outdated
The example above wouldn't print "every second" once per second. The prints | ||
will be delayed by the closures scheduled on `another_queue`. | ||
|
||
While the example above reproduces the issue - it is quite pathological and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this use case is valid. The example is a bit extreme for the sake of simplicity to demonstrate the point, but I do think it could be an issue in real apps.
For example, consider a Node app that's part of a data processing pipeline. It would be expected to saturate resources and that might include a stream of events on the event loop. It may be beneficial to also be able to cancel the pipeline at any time.
If command events are served on a separate channel from processing events, it allows the app to push large volumes of data without delaying commands. Alternatively, the app would need to manage back-pressure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data processing pipeline would likely be bound by IO and so it wouldn't be able to trigger the 1000 call loop in the n-api optimization, but the starvation can indeed be triggered by posting way too many events that take long time to process from one producer and some events from another producer that should interrupt that stream.
Was the preemption ever documented for neon APIs? I'm looking through the comments in the EventQueue.rs and the only thing that they say is that the function will be executed in JS thread without saying anything about preemption between different queues.
That being said the example is indeed valid.
text/0030-shared-event-queue.md
Outdated
store instances of `EventQueue` in it themselves, shifting the burden of | ||
performance optimization from Neon's hands to users. | ||
|
||
Another alternative would be to make `EventQueue::new()` use new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the better solution.
cx.queue()
: Handle to global queueEventQueue::new
: Creates a new queue- (Maybe)
EventQueue::handle<'a, C: Context<'a>>(&self, cx: &mut C)
. Create an independent handle to a queue.cx.queue()
could use this internally.
I think this also extends better to bounded queues. We might have EventQueue::bounded(1000)
and I think the asymmetry to EventQueue::new
and EventQueue::bounded
could be confusing. It's more clear if they both create new queues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to change RFC to add this? It already matches the PR that I've opened for the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll probably just do it. 😁
improvements for Neon users. | ||
|
||
These improvements are perhaps most noticeable when Neon addon is used from the | ||
Electron, because scheduling new UV ticks in Electron are very costly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I've come across ticks being more expensive in Electron than normal Node. Do you happen to have a reference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a comment with an example app that uses C++ node-addon-api to trigger it: electron/electron#28957 (comment)
The high-level overview is that electron polls the UV loop in another thread and interrupts the browser's event loop to schedule UV callbacks. Sadly, these interruptions don't happen immediately because of the interactivity that the browser process has to maintain. The delay is a fraction of millisecond, but you can see how it is easy for it to add up to 2-3 seconds depending on how many closures you want to invoke with EventQueue
.
text/0030-shared-event-queue.md
Outdated
``` | ||
|
||
Calling `queue.send()` several times in a row using different `cx.queue()` would | ||
be significantly faster: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to have any benchmarks showing the performance benefit? I think that would help to add some weight to this RFC. I don't have any intuition to the difference. 1us, 10us, 1ms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. Running app/preload.js
in this repo with unoptimized threadsafe function takes 128ms vs 10ms with optimized.
text/0030-shared-event-queue.md
Outdated
With the recent [node.js changes][0] `ThreadsafeFunction` will loop over pending | ||
calls, invoking up to 1000 of them in the same tick instead of running each new | ||
call on a separate UV loop tick. This is an implementation detail that is | ||
intentionally not covered by `napi_create_threadsafe_function` documentation to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, I think the word "intentionally" should be removed. There's no discussion on the implementation PR that not documenting the behavior is intentional or not.
text/0030-shared-event-queue.md
Outdated
intentionally not covered by `napi_create_threadsafe_function` documentation to | ||
allow for potential performance optimizations in the future. | ||
|
||
Neon doesn't currently leverage it without the proposed changes because a new |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neon does leverage the change, but it requires the user to continue to re-use the same queue. This is one of the reasons queues are Sync
.
I think it's important to note that this is an API change with the intent to guide users towards the choice that is optimal in most situations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@indutny Thanks for providing this. I think this is a really interesting idea.
One complexity concern I have is how dropping a referenced
EventQueue
would work. With a single shared queue, the queue needs to track how many times it has been referenced. When the count drops to0
, the shared queue needs to unreference itself.However, unreferencing can only be performed on the event loop. The only solution I can think of is sending an event to unreference itself. This could end up being costly if referenced queues are frequently created and dropped.
This is indeed the case, but I don't believe that it will cause any performance losses because of how fast the threadsafe functions are now. I've added an extra optimization to the PR to neon to avoid this extra work when EventQueue
owns the ThreadsafeFunction
and will drop it.
text/0030-shared-event-queue.md
Outdated
creates a new `EventQueue` every time as requested. So instead of doing that it | ||
is proposed that we use existing `lifetime::InstanceData` internal API layer to | ||
hold a wrapper around `ThreadsafeFunction` tailored to execute Rust closures | ||
specifically, and to use an `Arc` reference to a `RwLock` over this wrapper in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a RwLock
necessary? The only non-threadsafe operations on EventQueue
are unref
and reference
. Both of these require a Context
which already enforces single threaded-ness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The .reference()
/.unref()
methods of ThreadsafeFunction
have mut self
, so ThreadsafeTrampoline
needs to be mutable in order to call them. Is there any better way of doing it?
This indeed the case, but there's little to no performance penalty for doing it this way with the shared ThreadsafeFunction because it is stupid fast in Node now 😂 I've added an optimization to my PR to avoid this extra work for the Thank you so much for a thorough review. I believe I've addressed your feedback. PTAL. |
Closing because the contents have been merged into #32 |
See: neon-bindings/neon#739