-
Notifications
You must be signed in to change notification settings - Fork 158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unknown issue between smol 0.1.12
vs 0.1.13
#177
Comments
I poked around a little bit and figured that the tests in Apollo are using The new scheduler is a bit smarter in that it doesn't run all executor threads at full speed if there isn't massive amounts of work to do. So e.g. if there are only two tasks to run, there's no point in running all executor threads at once - most likely one thread can run both tasks. So now if the first task blocks (and it seems to block on the reqwest call), the whole system can get stalled because the assumption is that running any task is effectively instantaneous. I could "fix" this issue by making the scheduler a bit more aggressive, always assuming that any executor thread could get blocked on any task. However, that's not a real fix - it's merely a bandaid for an issue that is not in smol. The real solution would be to:
I find it amusing how that PR got so much pushback and negative publicity when apparently all async code blocks in some places, and it is the most effective solution to the problem :) cc @dignifiedquire - I think this is now the third instance of the same kind of issue... |
How about make scheduler configurable?
And give option to turn on diagnostic and automatic block detection (maybe as feature flag). I probably wouldn't want automatic block detection as default. |
Thanks for much for looking into this!
@stjepang I'm not sure I follow, is your hypothesis that this is being run in a single thread and stalls the system such that we hit the timeout? If that's the case, removing timeout restrictions and giving it enough time should eventually pass. This doesn't seem to be the case. |
Okay, I think we should make two changes right now to alleviate this issue:
After that, we should use @Enrico2 There are two tasks: one is running the MockServer and one is blocking on reqwest issued by Since there are only two tasks (and not hundreds of tasks), the executor is running only one thread and all other threads are sleeping. The executor doesn't bother spinning up more than on thread because the assumption is that running those two tasks should be trivial work for just one thread. That single running thread then polls the future that blocks on reqwest and we wait until the timeout. The other task with MockServer never gets the chance to run because the single executor thread is blocked until reqwest times out. What I propose we do right now is pessimistically assume that any future might block for a long time. So before we poll that future with reqwest, we should wake up at least one other executor thread to run MockServer in case the first thread gets blocked on reqwest. |
Ah! Got it. Thanks so much for the details! |
That sounds good as long as it is configurable. it could be expected behavior for some. |
I faced this issue too. I used
|
@KalitaAlexey Just to double-check I understood - the |
@stjepang, yes. It's in |
This is now fixed in smol 0.3 |
Hello,
We've recently discovered a bizarre situation involving
smol
and a minor version update that broke stuff.The interesting bit is that we can consistently reproduce a test error if we have
smol = "=0.1.13"
but if we usesmol = "=0.1.12"
, tests pass.The details: It's a bit convoluted: we don't have a direct dependency on
smol
, we get it viaasync-std
-- the core issue we've been seeing is that when setting up aMockServer
(fromwiremock
) and trying to call it from an async unit test (using#[async_std::test]
), it hangs and we get a timeout.The bizarre thing was that everything was fine, and a couple of days later without any code changes on our end, tests started failing. It seems that Cargo brought in a newer
smol
(until recently, we didn't check in ourCargo.lock
).If you'd like to reproduce locally (mac or linux, not sure about windows):
This fails up to 0.1.18 as well.
I realize this is a few versions back, if you have any insight here that can be helpful, we'd appreciate it. Thanks!
The text was updated successfully, but these errors were encountered: