-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CPU heavy request prevents Rocket from accepting other connections #2239
Comments
My best guess is that both the route and main tasks are getting assigned to the same thread, and as such block each other. Although Rocket spawns a task to handle each request, Tokio is free to simply schedule it on the same thread, and due to synchronization and context switching overhead, Tokio will do this whenever it makes sense to do so. It's also possible that Tokio polls the task at least once during spawning, which would trigger this behavior. I'm also not particularly familiar with Tokio's internals, but my understanding is that each thread has a local pool of tasks to run, and long running tasks, especially if they are waiting for IO, are placed into a shared pool to enable other threads to poll them when they can make progress. If both the main server task and your handler are in the local pool, you need to yield for Tokio to even be able to transfer one of the tasks to another thread, but even then there is no guarantee that Tokio will actually transfer one of them. Overall, there are a variety of things that could be going wrong here. |
This shouldn't happen as we use the multithreaded scheduler. Even if it does schedule two tasks on one worker, a free worker should steal the task, assuming a free worker exists.
That would be okay, but a free worker should still come by, notice a stalled task, steal it, and execute it.
This is correct. Tokio will internally use an I/O pool for certain tasks, and you can add to ~that pool via spawn_blocking. None of those tasks implicitly block any other task.
This is what work stealing is for. No task should need to yield. As soon as a new task is scheduled, workers are notified, and if they're free they'll steal. My guess about what's happening here is that you were contending on the same lock in the original code across multiple tasks, so blocking in one task caused blocking in all tasks. When you switched to using spawn_blocking, perhaps your locking became more coarse grained. If you have a copy for the source of both versions, I'd love to take a look. |
To confirm, I wrote the following: use rocket::*;
#[get("/block")]
fn block() {
let mutex = std::sync::Mutex::new(0u32);
std::mem::forget(mutex.lock().expect("unlocked"));
std::mem::forget(mutex.lock().expect("deadlock"));
}
#[get("/")]
fn index() { }
#[launch]
fn rocket() -> Rocket<Build> {
rocket::build().mount("/", routes![block, index])
} Here's a session running this server: > curl http://127.0.0.1:8000/block
# ... never returns ...
# in a new shell
> curl http://127.0.0.1:8000/
> # returned immediately @oersted Were you trying to observe behavior through a web browser? If so, see #2228. |
Thank you @SergioBenitez, yes I was using a web browser. Although parallel requests work fine now using The locking has indeed changed somewhat and is held for a smaller section of the computation, however, I'm using I did consider providing a reproducible code sample in my first post, but it is rather tricky to untangle so that it's generic and brief, and I don't leak any confidential info. It works fine with I was mainly looking to correct my mental model of how the concurrency of Rocket works based on this anomaly. In that respect, I'm more or less satisfied. If no one is able to come up with another constructive hypothesis feel free to close this issue. |
FTR I am seeing a similar issue with rc2. We have a web server that acts as a cargo registry. When a new crate is published it calls Using You can find the pstack of the deadlocked application here. It is waiting for the server to respond to
This was my attempt at a simple reproduction which does not show the behaviour use rocket::{get, routes};
#[get("/hello")]
async fn hello() -> String {
let stdout = std::process::Command::new("curl")
.args(["http://localhost:8000/world"])
.output()
.expect("failed to execute process")
.stdout;
String::from_utf8(stdout).unwrap()
}
#[get("/world")]
fn world() -> &'static str {
"Hello, world!"
}
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let rocket = rocket::build().mount("/", routes![hello, world]);
let _ = rocket.launch().await?;
Ok(())
} |
@samanpa You don't need any sort of locking to block. Executing a But you can avoid all of this by not blocking while the command is executing. Instead of using the |
True, although this happens even when there are 4 workers. I would expect another worker to handle the new incoming request. Using spawn blocking helps but want to be able to understand what is going on. e.g. the example above works with |
This trace shows that the thread is waiting for the command to exit, which is to be expected. An
This is surprising behavior; are you certain that there are workers available? Or are they potentially also stuck on the same issue?
How does it "help" in this case? The program advances, always? Or sometimes? Have you tried using the
Are you saying the example exactly as written works, or the example modified to use |
It looks like this |
I have an endpoint that can execute a long CPU heavy computation that does sync I/O and uses some sync locks (
RwLock
). I understand I should be usingspawn_blocking
for this, and indeed that works fine.But I'd like to understand why not using
spawn_blocking
has such a dramatic effect in this case, to the point that no new HTTP connections can be accepted while the sync computation is running. The computation would of course hog one of the Tokio core threads, but I would expect the other threads to work normally. In a machine with 40 cores, why would this sync computation stall the whole server?The text was updated successfully, but these errors were encountered: