Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: return better HTTP errors when not ensured #649

Merged
merged 4 commits into from
Apr 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 30 additions & 32 deletions bin/propolis-server/src/lib/server.rs
Original file line number Diff line number Diff line change
Expand Up @@ -296,12 +296,7 @@ impl DropshotEndpointContext {
self.services.vm.lock().await,
VmControllerState::as_controller,
)
.map_err(|_| {
HttpError::for_not_found(
None,
"Server not initialized (no instance)".to_string(),
)
})
.map_err(|_| not_created_error())
}
}

Expand Down Expand Up @@ -576,19 +571,21 @@ async fn instance_ensure_common(
{
let existing_properties = existing.properties();
if existing_properties.id != properties.id {
return Err(HttpError::for_status(
Some(format!(
return Err(HttpError::for_client_error(
Some(api::ErrorCode::AlreadyInitialized.to_string()),
http::status::StatusCode::CONFLICT,
format!(
"Server already initialized with ID {}",
existing_properties.id
)),
http::status::StatusCode::CONFLICT,
),
));
}

if *existing_properties != properties {
return Err(HttpError::for_status(
Some("Cannot update running server".to_string()),
return Err(HttpError::for_client_error(
Some(api::ErrorCode::AlreadyRunning.to_string()),
http::status::StatusCode::CONFLICT,
"Cannot update running server".to_string(),
));
}

Expand Down Expand Up @@ -658,10 +655,11 @@ async fn instance_ensure_common(
vm_hdl.await.unwrap()
}
.map_err(|e| {
HttpError::for_internal_error(format!(
"failed to create instance: {}",
e
))
HttpError::for_client_error(
Some(api::ErrorCode::CreateFailed.to_string()),
http::status::StatusCode::INTERNAL_SERVER_ERROR,
format!("failed to create instance: {e}"),
)
})?;

if let Some(ramfb) = vm.framebuffer() {
Expand Down Expand Up @@ -785,10 +783,7 @@ async fn instance_get_common(
) -> Result<(api::Instance, VersionedInstanceSpec), HttpError> {
let ctx = rqctx.context();
match &*ctx.services.vm.lock().await {
VmControllerState::NotCreated => Err(HttpError::for_not_found(
None,
"Server not initialized (no instance)".to_string(),
)),
VmControllerState::NotCreated => Err(not_created_error()),
VmControllerState::Created(vm) => {
Ok((
api::Instance {
Expand Down Expand Up @@ -862,10 +857,7 @@ async fn instance_state_monitor(
let vm_state = ctx.services.vm.lock().await;
match &*vm_state {
VmControllerState::NotCreated => {
return Err(HttpError::for_not_found(
None,
"Server not initialized (no instance)".to_string(),
));
return Err(not_created_error());
}
VmControllerState::Created(vm) => vm.state_watcher().clone(),
VmControllerState::Destroyed { state_watcher, .. } => {
Expand All @@ -885,12 +877,13 @@ async fn instance_state_monitor(
// means it will never reach the number the client wants it to reach.
// Inform the client of this condition so it doesn't wait forever.
state_watcher.changed().await.map_err(|_| {
HttpError::for_status(
Some(format!(
HttpError::for_client_error(
Some(api::ErrorCode::NoInstance.to_string()),
http::status::StatusCode::GONE,
format!(
"No instance present; will never reach generation {}",
gen
)),
http::status::StatusCode::GONE,
),
)
})?;
}
Expand Down Expand Up @@ -1046,10 +1039,7 @@ async fn instance_migrate_status(
let migration_id = path_params.into_inner().migration_id;
let ctx = rqctx.context();
match &*ctx.services.vm.lock().await {
VmControllerState::NotCreated => Err(HttpError::for_not_found(
None,
"Server not initialized (no instance)".to_string(),
)),
VmControllerState::NotCreated => Err(not_created_error()),
VmControllerState::Created(vm) => {
vm.migrate_status(migration_id).map_err(Into::into).map(|state| {
HttpResponseOk(api::InstanceMigrateStatusResponse {
Expand Down Expand Up @@ -1206,6 +1196,14 @@ pub fn api() -> ApiDescription<Arc<DropshotEndpointContext>> {
api
}

fn not_created_error() -> HttpError {
HttpError::for_client_error(
Some(api::ErrorCode::NoInstance.to_string()),
http::StatusCode::FAILED_DEPENDENCY,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soapbox rant, not necessarily a problem with the change: I get what we're going for here, but I've had bad luck in the past with using slightly off-meaning status codes in this way--they have a tendency, IME, to get bubbled up to a user somehow, and then the user reads how the status was originally described in whatever document or header defined it, and then folks get confused about what's really happening. In this case, RFC 4918 defines 424 Failed Dependency to mean that

The 424 (Failed Dependency) status code means that the method could
not be performed on the resource because the requested action
depended on another action and that action failed. For example, if a
command in a PROPPATCH method fails, then, at minimum, the rest of
the commands will also fail with 424 (Failed Dependency).

That's definitely close, but in the scenario that led us here (Propolis panic) no earlier requested actions actually failed--the problem is that the server reset.

Anyway, this is a soapbox rant; I don't think 424's semantics are far enough off from what we intend for this to be a blocker. But if we are leaning toward using descriptive strings in the error_code field (at least for internal control plane communication) I might be inclined to lean on those and stick to the HTTP error codes in RFC 9110.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely close, but in the scenario that led us here (Propolis panic) no earlier requested actions actually failed--the problem is that the server reset.

My rationale was that in the context of this particular propolis-server process, we depended on an earlier action (InstanceEnsure) which has not been performed at all. So, I guess indicating that it failed is not quite correct, but, hmm...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline and decided to stick with 424 Failed Dependency for now, to distinguish this case from the 404 Not Found you get from requesting a nonexistent route and the 410 Gone you get from trying to query a VM's state after it has already shut down and been destroyed.

"Server not initialized (no instance)".to_string(),
)
}

#[cfg(test)]
mod tests {
#[test]
Expand Down
46 changes: 45 additions & 1 deletion crates/propolis-api-types/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

//! Definitions for types exposed by the propolis-server API

use std::net::SocketAddr;
use std::{fmt, net::SocketAddr};

use schemars::JsonSchema;
use serde::{Deserialize, Serialize};
Expand Down Expand Up @@ -395,3 +395,47 @@ pub struct SnapshotRequestPathParams {
pub struct VCRRequestPathParams {
pub id: Uuid,
}

/// Error codes used to populate the `error_code` field of Dropshot API responses.
#[derive(
Clone, Copy, Debug, Deserialize, PartialEq, Eq, Serialize, JsonSchema,
)]
pub enum ErrorCode {
/// This `propolis-server` process has not received an `InstanceEnsure`
/// request yet.
NoInstance,
/// This `propolis-server` process has already received an `InstanceEnsure`
/// request with a different ID.
AlreadyInitialized,
/// Cannot update a running server.
AlreadyRunning,
/// Instance creation failed
CreateFailed,
}

impl fmt::Display for ErrorCode {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
fmt::Debug::fmt(self, f)
}
}

impl std::str::FromStr for ErrorCode {
type Err = &'static str;
fn from_str(s: &str) -> Result<Self, Self::Err> {
match s.trim() {
s if s.eq_ignore_ascii_case("NoInstance") => Ok(Self::NoInstance),
s if s.eq_ignore_ascii_case("AlreadyInitialized") => {
Ok(ErrorCode::AlreadyInitialized)
}
s if s.eq_ignore_ascii_case("AlreadyRunning") => {
Ok(ErrorCode::AlreadyRunning)
}
s if s.eq_ignore_ascii_case("CreateFailed") => {
Ok(ErrorCode::CreateFailed)
}
_ => Err("unknown error code, expected one of: \
'NoInstance', 'AlreadyInitialized', 'AlreadyRunning', \
'CreateFailed'"),
}
}
}
6 changes: 3 additions & 3 deletions phd-tests/tests/src/server_state_machine.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,15 +37,15 @@ async fn instance_stop_causes_destroy_test(ctx: &Framework) {

assert!(matches!(
vm.run().await.unwrap_err().status().unwrap(),
http::status::StatusCode::NOT_FOUND
http::status::StatusCode::FAILED_DEPENDENCY
));
assert!(matches!(
vm.stop().await.unwrap_err().status().unwrap(),
http::status::StatusCode::NOT_FOUND
http::status::StatusCode::FAILED_DEPENDENCY
));
assert!(matches!(
vm.reset().await.unwrap_err().status().unwrap(),
http::status::StatusCode::NOT_FOUND
http::status::StatusCode::FAILED_DEPENDENCY
));
}

Expand Down
Loading