Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

process: use blocking threadpool for child stdio I/O #4824

Merged
merged 4 commits into from
Jul 23, 2022

Conversation

ipetkov
Copy link
Member

@ipetkov ipetkov commented Jul 11, 2022

We can't mix async pipes and child process stdio on Windows. This switches our Windows process implementation to use the blocking threadpool instead of using an async named pipe as we did in the past.

Refs #4802

@ipetkov ipetkov added A-tokio Area: The main tokio crate M-process Module: tokio/process labels Jul 11, 2022
@github-actions github-actions bot added the R-loom Run loom tests on this PR label Jul 11, 2022
tokio/Cargo.toml Outdated Show resolved Hide resolved
tokio/src/lib.rs Outdated Show resolved Hide resolved
@ipetkov ipetkov force-pushed the ivan/windows-process-stdio branch from 4a11f75 to 74a78e6 Compare July 20, 2022 22:50
@ipetkov ipetkov marked this pull request as ready for review July 20, 2022 22:51
tokio/src/io/poll_evented.rs Show resolved Hide resolved
tokio/src/process/windows.rs Outdated Show resolved Hide resolved
tokio/src/process/mod.rs Outdated Show resolved Hide resolved
Comment on lines +189 to +195
#[derive(Debug)]
pub(crate) struct ChildStdio {
// Used for accessing the raw handle, even if the io version is busy
raw: Arc<StdFile>,
// For doing I/O operations asynchronously
io: Blocking<ArcFile>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you could drop the Arc stuff if you just stored the raw handle next to the io field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::process::Stdio wants ownership of the handle (and will close it on drop) which might cause issues if the handle is still being written to in the background. Using the Arc allows us to potentially avoid a syscall to duplicate the handle if it doesn't have any pending I/O on it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could implement a try_unwrap on the Blocking type to get the same behavior without the Arc. But I don't mind either way, so I will approve it and let you decide if you want to implement that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider adding try_unwrap to Blocking, the tricky bit is that it moves the handle to and from the blocking threadpool, so if there is a pending I/O operation we cannot access anything. The conversion from ChildStd* to Stdio are synchronous, so we cannot even internally .await things. Hence the Arc approach allows us to at least clone the handle if needed

I'm going to leave it as is for now since we can always revisit it later as needed. Thanks for the review!

@ipetkov ipetkov merged commit 1578575 into master Jul 23, 2022
@ipetkov ipetkov deleted the ivan/windows-process-stdio branch July 23, 2022 17:40
crapStone pushed a commit to Calciumdibromid/CaBr2 that referenced this pull request Sep 11, 2022
This PR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dependencies | minor | `1.20.1` -> `1.21.0` |
| [tokio](https://tokio.rs) ([source](https://github.com/tokio-rs/tokio)) | dev-dependencies | minor | `1.20.1` -> `1.21.0` |

---

### Release Notes

<details>
<summary>tokio-rs/tokio</summary>

### [`v1.21.0`](https://github.com/tokio-rs/tokio/releases/tag/tokio-1.21.0)

[Compare Source](tokio-rs/tokio@tokio-1.20.1...tokio-1.21.0)

##### 1.21.0 (September 2, 2022)

This release is the first release of Tokio to intentionally support WASM. The `sync,macros,io-util,rt,time` features are stabilized on WASM. Additionally the wasm32-wasi target is given unstable support for the `net` feature.

##### Added

-   net: add `device` and `bind_device` methods to TCP/UDP sockets ([#&#8203;4882])
-   net: add `tos` and `set_tos` methods to TCP and UDP sockets ([#&#8203;4877])
-   net: add security flags to named pipe `ServerOptions` ([#&#8203;4845])
-   signal: add more windows signal handlers ([#&#8203;4924])
-   sync: add `mpsc::Sender::max_capacity` method ([#&#8203;4904])
-   sync: implement Weak version of `mpsc::Sender` ([#&#8203;4595])
-   task: add `LocalSet::enter` ([#&#8203;4765])
-   task: stabilize `JoinSet` and `AbortHandle` ([#&#8203;4920])
-   tokio: add `track_caller` to public APIs ([#&#8203;4805], [#&#8203;4848], [#&#8203;4852])
-   wasm: initial support for `wasm32-wasi` target ([#&#8203;4716])

##### Fixed

-   miri: improve miri compatibility by avoiding temporary references in `linked_list::Link` impls ([#&#8203;4841])
-   signal: don't register write interest on signal pipe ([#&#8203;4898])
-   sync: add `#[must_use]` to lock guards ([#&#8203;4886])
-   sync: fix hang when calling `recv` on closed and reopened broadcast channel ([#&#8203;4867])
-   task: propagate attributes on task-locals ([#&#8203;4837])

##### Changed

-   fs: change panic to error in `File::start_seek` ([#&#8203;4897])
-   io: reduce syscalls in `poll_read` ([#&#8203;4840])
-   process: use blocking threadpool for child stdio I/O ([#&#8203;4824])
-   signal: make `SignalKind` methods const ([#&#8203;4956])

##### Internal changes

-   rt: extract `basic_scheduler::Config` ([#&#8203;4935])
-   rt: move I/O driver into `runtime` module ([#&#8203;4942])
-   rt: rename internal scheduler types ([#&#8203;4945])

##### Documented

-   chore: fix typos and grammar ([#&#8203;4858], [#&#8203;4894], [#&#8203;4928])
-   io: fix typo in `AsyncSeekExt::rewind` docs ([#&#8203;4893])
-   net: add documentation to `try_read()` for zero-length buffers ([#&#8203;4937])
-   runtime: remove incorrect panic section for `Builder::worker_threads` ([#&#8203;4849])
-   sync: doc of `watch::Sender::send` improved ([#&#8203;4959])
-   task: add cancel safety docs to `JoinHandle` ([#&#8203;4901])
-   task: expand on cancellation of `spawn_blocking` ([#&#8203;4811])
-   time: clarify that the first tick of `Interval::tick` happens immediately ([#&#8203;4951])

##### Unstable

-   rt: add unstable option to disable the LIFO slot ([#&#8203;4936])
-   task: fix incorrect signature in `Builder::spawn_on` ([#&#8203;4953])
-   task: make `task::Builder::spawn*` methods fallible ([#&#8203;4823])

[#&#8203;4595]: tokio-rs/tokio#4595

[#&#8203;4716]: tokio-rs/tokio#4716

[#&#8203;4765]: tokio-rs/tokio#4765

[#&#8203;4805]: tokio-rs/tokio#4805

[#&#8203;4811]: tokio-rs/tokio#4811

[#&#8203;4823]: tokio-rs/tokio#4823

[#&#8203;4824]: tokio-rs/tokio#4824

[#&#8203;4837]: tokio-rs/tokio#4837

[#&#8203;4840]: tokio-rs/tokio#4840

[#&#8203;4841]: tokio-rs/tokio#4841

[#&#8203;4845]: tokio-rs/tokio#4845

[#&#8203;4848]: tokio-rs/tokio#4848

[#&#8203;4849]: tokio-rs/tokio#4849

[#&#8203;4852]: tokio-rs/tokio#4852

[#&#8203;4858]: tokio-rs/tokio#4858

[#&#8203;4867]: tokio-rs/tokio#4867

[#&#8203;4877]: tokio-rs/tokio#4877

[#&#8203;4882]: tokio-rs/tokio#4882

[#&#8203;4886]: tokio-rs/tokio#4886

[#&#8203;4893]: tokio-rs/tokio#4893

[#&#8203;4894]: tokio-rs/tokio#4894

[#&#8203;4897]: tokio-rs/tokio#4897

[#&#8203;4898]: tokio-rs/tokio#4898

[#&#8203;4901]: tokio-rs/tokio#4901

[#&#8203;4904]: tokio-rs/tokio#4904

[#&#8203;4920]: tokio-rs/tokio#4920

[#&#8203;4924]: tokio-rs/tokio#4924

[#&#8203;4928]: tokio-rs/tokio#4928

[#&#8203;4935]: tokio-rs/tokio#4935

[#&#8203;4936]: tokio-rs/tokio#4936

[#&#8203;4937]: tokio-rs/tokio#4937

[#&#8203;4942]: tokio-rs/tokio#4942

[#&#8203;4945]: tokio-rs/tokio#4945

[#&#8203;4951]: tokio-rs/tokio#4951

[#&#8203;4953]: tokio-rs/tokio#4953

[#&#8203;4956]: tokio-rs/tokio#4956

[#&#8203;4959]: tokio-rs/tokio#4959

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about these updates again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, click this checkbox.

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzMi4xODcuMCIsInVwZGF0ZWRJblZlciI6IjMyLjE4Ny4wIn0=-->

Co-authored-by: cabr2-bot <cabr2.help@gmail.com>
Reviewed-on: https://codeberg.org/Calciumdibromid/CaBr2/pulls/1532
Reviewed-by: crapStone <crapstone@noreply.codeberg.org>
Co-authored-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
Co-committed-by: Calciumdibromid Bot <cabr2_bot@noreply.codeberg.org>
nathanwhit added a commit to denoland/deno that referenced this pull request Oct 22, 2024
Fixes #26179.

The original error reported in that issue is fixed on canary, but in
local testing on my windows machine, `next build` would just hang
forever.

After some digging, what happens is that at some point in next build,
readFile promises (from `fs/promises` ) just never resolve, and so next
hangs.

It turns out the issue is saturating tokio's blocking task thread pool.
We previously limited the number of blocking threads to 32, and at some
point those threads are all in use and there's no thread available for
the file reads.

What's taking up all of those threads? The answer turns out to be
`tokio::process`. On windows, child process stdio uses the blocking
threadpool: tokio-rs/tokio#4824. When you poll
the child's stdio on windows, it spawns a blocking task per poll, and
calls `std::io::Read::read` in the blocking context. That call can block
until data is available.
Putting it all together, what happens is that Next.js spawns `2 * the
number of CPU cores` deno child subprocesses to do work. We implement
`child_process` with `tokio::process`. When the child processes' stdio
get polled, blocking tasks get spawned, and those blocking tasks might
block until data is available. So if you have 16 cores (as I do), there
are going to be potentially >32 blocking task threadpool threads taken
just by the child processes. That leaves no room for other tasks to make
progress

---

To fix this, for now, increase the size of the blocking threadpool on
windows. 4 * the number of CPU cores should be enough to leave room for
other tasks to make progress.

Longer term, this can be fixed more properly when we handroll our own
subprocess code (needed for detached processes and additional pipes on
windows).
bartlomieju pushed a commit to denoland/deno that referenced this pull request Oct 25, 2024
Fixes #26179.

The original error reported in that issue is fixed on canary, but in
local testing on my windows machine, `next build` would just hang
forever.

After some digging, what happens is that at some point in next build,
readFile promises (from `fs/promises` ) just never resolve, and so next
hangs.

It turns out the issue is saturating tokio's blocking task thread pool.
We previously limited the number of blocking threads to 32, and at some
point those threads are all in use and there's no thread available for
the file reads.

What's taking up all of those threads? The answer turns out to be
`tokio::process`. On windows, child process stdio uses the blocking
threadpool: tokio-rs/tokio#4824. When you poll
the child's stdio on windows, it spawns a blocking task per poll, and
calls `std::io::Read::read` in the blocking context. That call can block
until data is available.
Putting it all together, what happens is that Next.js spawns `2 * the
number of CPU cores` deno child subprocesses to do work. We implement
`child_process` with `tokio::process`. When the child processes' stdio
get polled, blocking tasks get spawned, and those blocking tasks might
block until data is available. So if you have 16 cores (as I do), there
are going to be potentially >32 blocking task threadpool threads taken
just by the child processes. That leaves no room for other tasks to make
progress

---

To fix this, for now, increase the size of the blocking threadpool on
windows. 4 * the number of CPU cores should be enough to leave room for
other tasks to make progress.

Longer term, this can be fixed more properly when we handroll our own
subprocess code (needed for detached processes and additional pipes on
windows).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-tokio Area: The main tokio crate M-process Module: tokio/process R-loom Run loom tests on this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants