Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(rpc): remove threads #7418

Merged
merged 1 commit into from
Jun 6, 2023
Merged

fix(rpc): remove threads #7418

merged 1 commit into from
Jun 6, 2023

Conversation

rgrinberg
Copy link
Member

replace them with evented io based on Unix.select

Signed-off-by: Rudi Grinberg me@rgrinberg.com

@rgrinberg rgrinberg force-pushed the ps/rr/fix_rpc___remove_threads branch 4 times, most recently from 9ae747d to bf1b254 Compare March 27, 2023 03:05
@rgrinberg rgrinberg requested a review from nojb March 27, 2023 03:23
@rgrinberg rgrinberg force-pushed the ps/rr/fix_rpc___remove_threads branch 3 times, most recently from 51abe86 to 46f9124 Compare March 27, 2023 17:51
@rgrinberg rgrinberg linked an issue Mar 28, 2023 that may be closed by this pull request
@rgrinberg
Copy link
Member Author

ping @nojb

Copy link
Collaborator

@nojb nojb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to review this, but I have some general questions:

  • What is the motivation for this change? In other words, what is the main expected benefit of the changes in this PR with respect to the existing code?
  • The doc mentions that pipes are used on Unix. They are not used on Windows? Is something else used instead?

@rgrinberg
Copy link
Member Author

There's a couple of benefits:

  1. We stop leaking memory on every connected client because we no longer spawn threads we don't cleanup.
  2. We no longer spawn 3 threads for every connected clients. Threads are heavy and we don't want to have too many running at once. The background threads are better saved for offloading work that can be sped up or that does background file IO.

The doc mentions that pipes are used on Unix. They are not used on Windows? Is something else used instead?

Where does it mention that? I think we switched to sockets everywhere to simplify the implementation.

@nojb
Copy link
Collaborator

nojb commented Apr 6, 2023

@rgrinberg
Copy link
Member Author

Oh yeah that's a single pipe I use to be able to interrupt the select. I'm using non-blocking pipes only on Unix - that's what I meant.

@rgrinberg
Copy link
Member Author

Some additional notes:

  1. the async reading of csexp is mostly ripped off from my previous implementation in ocamllsp. it should be mostly bug free.
  2. We should be careful to make sure we never close an fd that is being selected on.
  3. We should make sure there no fd reuse bugs. E.g. two threads should never race to close a single fd as the the thread to lose this race might close some unrelated fd.

src/async_io/async_io.mli Outdated Show resolved Hide resolved
src/async_io/async_io.mli Outdated Show resolved Hide resolved
src/async_io/async_io.mli Outdated Show resolved Hide resolved
src/async_io/async_io.mli Outdated Show resolved Hide resolved
src/async_io/async_io.ml Outdated Show resolved Hide resolved
@rgrinberg rgrinberg requested a review from anmonteiro April 22, 2023 00:23
@rgrinberg rgrinberg force-pushed the ps/rr/fix_rpc___remove_threads branch from 583dd8f to d567a19 Compare April 22, 2023 03:09
src/async_io/async_io.mli Outdated Show resolved Hide resolved
src/async_io/async_io.mli Outdated Show resolved Hide resolved
@rgrinberg rgrinberg force-pushed the ps/rr/fix_rpc___remove_threads branch from d567a19 to 902838a Compare April 25, 2023 17:18
@rgrinberg
Copy link
Member Author

rgrinberg commented May 29, 2023

A few remarks after glancing at the code. Probably not useful, but just in case:

  • I'm not sure why wait should fail if the queue has been shutdown. Seems like it should just return an empty list of events? Also, how does the thread that calls wait in a loop know how to terminate? Shouldn't it check some shutdown flag?

  • fsenv->thread is being initialized in the main thread, while wait is being called in another thread. Why does shutdown wait for the main thread to the terminate? Shouldn't it be waiting for the wait thread?

  • I would personally prefer if Fswatch_win.shutdown was idempotent. This is consistent with how we define such functions elsewhere in dune (when possible).

@nojb
Copy link
Collaborator

nojb commented May 29, 2023

A few remarks after glancing at the code.

Thanks, all good points. Not sure about the second point (I have forgotten some of the details); I'll take a closer look and report back. And I'll prepare a PR with the fixes.

@rgrinberg rgrinberg force-pushed the ps/rr/fix_rpc___remove_threads branch from 498b37e to 4e571b7 Compare June 2, 2023 13:47
@nojb
Copy link
Collaborator

nojb commented Jun 5, 2023

Argh, while investigating the Windows issue I realized that the bad behaviour is already present in main (and 3.8.0, but not 3.7.1), so in principle it should not be related to this PR. Do you see any PR merged between 3.7.1 and 3.8.0 that may have changed the behaviour with respect to threading, signals or the like?

replace them with evented io based on Unix.select

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>

<!-- ps-id: 2d55627f-c73c-41a0-8e41-e57482f0edbe -->
@rgrinberg rgrinberg force-pushed the ps/rr/fix_rpc___remove_threads branch from 4e571b7 to 2647cd1 Compare June 5, 2023 20:26
@rgrinberg
Copy link
Member Author

You can try bisecting. I can't think of anything that is Windows specific.

By the way, I discovered some issues with this PR and I've updated it to fix them. Nothing is windows specific, but perhaps windows fails in some special way. So perhaps it's worth trying this PR again.

@nojb
Copy link
Collaborator

nojb commented Jun 5, 2023

You can try bisecting. I can't think of anything that is Windows specific.

The culprit seems to be #6996 cc @Alizter

@rgrinberg
Copy link
Member Author

Hmm, that's quite weird. I would assume tui does nothing if it's not enabled.

@Alizter
Copy link
Collaborator

Alizter commented Jun 5, 2023

There were a few changes for tui (in a different PR) to improve how signals were handled. We'll have to see what went wrong. I have a windows vm so I can also take a closer look.

@nojb
Copy link
Collaborator

nojb commented Jun 6, 2023

There were a few changes for tui (in a different PR) to improve how signals were handled. We'll have to see what went wrong. I have a windows vm so I can also take a closer look.

It seems to be related to the replacement of

Dune_threaded_console.progress ()

by

Dune_console.Backend.progress

see
https://github.com/ocaml/dune/pull/6996/files#diff-3b5679e0e3859b1dc8498f8d8b27a51fafc2265b72bf2fa4593d944ee950b634L37

@rgrinberg
Copy link
Member Author

It seems to be related to the replacement of

For a quick fix, how about we go back to the unthreaded console for windows. Should be quite easy.

@nojb
Copy link
Collaborator

nojb commented Jun 6, 2023

It seems to be related to the replacement of

For a quick fix, how about we go back to the unthreaded console for windows. Should be quite easy.

I don't understand: the current code uses Dune_console.Backend.progress; the old code used Dune_threaded_console.progress. Does it mean that we currently used the "unthreaded" console and before we used to use the "threaded" one?

@rgrinberg
Copy link
Member Author

Does it mean that we currently used the "unthreaded" console and before we used to use the "threaded" one?

I think I confused myself with all these UI switches we've had :)

Looking at the current state, I see that our progress reporting does not rely on an additional thread. I don't know why, but my intention was that it would.

From the diff you've pointed out, switching from threaded to unthreaded introduced an issue. I have no idea why, but it's worth it to toggle it back to threaded to see if you can get the bug to go away.

@nojb
Copy link
Collaborator

nojb commented Jun 6, 2023

From the diff you've pointed out, switching from threaded to unthreaded introduced an issue. I have no idea why, but it's worth it to toggle it back to threaded to see if you can get the bug to go away.

I tried, but then the progress line completely disappeared, and the system did not appear very responsive... Not sure if it is an issue on Windows only or if it affects Linux as well; I'll try to investigate more later.

@rgrinberg
Copy link
Member Author

In any case, I think we're convinced this PR doesn't introduce this problem. Since it also fixes a serious issue with RPC, I'm merging.

@rgrinberg rgrinberg merged commit 12a0268 into main Jun 6, 2023
@rgrinberg rgrinberg deleted the ps/rr/fix_rpc___remove_threads branch June 6, 2023 13:22
@rgrinberg rgrinberg mentioned this pull request Jun 6, 2023
7 tasks
@rgrinberg rgrinberg added this to the 3.8.2 milestone Jun 6, 2023
@emillon emillon removed this from the 3.8.2 milestone Jun 9, 2023
emillon pushed a commit to emillon/dune that referenced this pull request Jun 13, 2023
replace them with evented io based on Unix.select

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
emillon added a commit to emillon/dune that referenced this pull request Jun 15, 2023
replace them with evented io based on Unix.select

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
emillon added a commit to emillon/dune that referenced this pull request Jun 16, 2023
replace them with evented io based on Unix.select

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit that referenced this pull request Jun 16, 2023
replace them with evented io based on Unix.select

Signed-off-by: Rudi Grinberg <me@rgrinberg.com>
Signed-off-by: Etienne Millon <me@emillon.org>
emillon added a commit to emillon/opam-repository that referenced this pull request Jun 16, 2023
CHANGES:

- Switch back to threaded console for all systems; fix unresponsive console on
  Windows (ocaml/dune#7906, @nojb)

- Respect `-p` / `--only-packages` for `melange.emit` artifacts (ocaml/dune#7849, @anmonteiro)

- Fix scanning of Coq installed files (@ejgallego, reported by
  @palmskog, ocaml/dune#7895 , fixes ocaml/dune#7893)

- Fix RPC buffer corruption issues due to multi threading. This issue was only
  reproducible with large RPC payloads (ocaml/dune#7418)

- Fix printing errors from excerpts whenever character offsets span multiple
  lines (ocaml/dune#7950, fixes ocaml/dune#7905, @rgrinberg)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Dune RPC - Switch from multithreading to select.
6 participants