Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when there's no default IP pool #4880

Merged
merged 1 commit into from
Jan 26, 2024

Conversation

david-crespo
Copy link
Contributor

Closes #4864

This is a bad error message to get when the problem is that there is no default IP pool configured for your current silo:

not found: ip-pool with id "Default pool for current silo"

"Default pool for current silo" is not an id, so why would we call it one? This is better:

not found: default IP pool for current silo

This PR is just making that possible.

@@ -144,11 +144,11 @@ pub enum LookupType {
ByName(String),
/// a specific id was requested
ById(Uuid),
/// a session token was requested
BySessionToken(String),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used. What's weird is it was introduced in #814, but it is also not used there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for removing this. I'm not sure how much sense it made in this context to begin with.

/// a specific id was requested with some composite type
/// (caller summarizes it)
ByCompositeId(String),
/// object selected by criteria that would be confusing to call an ID
ByOther(String),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A catchall name like this is surely a code smell, but I can't think of a better name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's okay but I'm just not 100% about it. I know it may be somewhat redundant, but I do kind of like ByQuery as it stays general but also guides the consumer into giving it something query like.

LookupType::ByCompositeId(label) => {
format!("{} with id \"{}\"", t, label)
}
LookupType::ByOther(msg) => msg,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the point of all this — I need a variant that doesn't stick all that extra structure in the message.

Comment on lines +137 to +138
let lookup_type =
LookupType::ByOther("default IP pool for current silo".to_string());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than stripping the ID part of the string (which is helpful because it'll not longer look like an error) I don't know that this provides that much more actionable context. Something like this might be more helpful in the logs:

LookupType::ByQuery(format!("ip_pool where ip_pool_resource { resource_id: {}, is_default: true }", authz_silo_id));

It doesn't have to be exactly that. I know this doesn't communicate in plain english the error (as you've done above), but it does communicate some specifics of what tables to look at if something unexpected is happening.

cc @internet-diglett do you have thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered options more like that, giving the stuff we select by, but to a certain extent I think it’s revealing too much about how the concept of default pool is implemented. It feels weird for an API to be talking about a join table by name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about combining the two? As with other errors in the system, there's both an internal and external message here. Internally we want enough context to know what to dig into if something's gone wrong. Externally we want a general message for the users to know what's failed at a high level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, interesting idea. Mostly only the 500-level errors have internal messages, except InvalidValue is a 400-level one used for things like disk size being too big and it does have that. It would be pretty invasive to change ObjectNotFound to have that.

#[derive(Clone, Debug, Deserialize, thiserror::Error, PartialEq, Serialize)]
pub enum Error {
/// An object needed as part of this operation was not found.
#[error("Object (of type {lookup_type:?}) not found: {type_name}")]
ObjectNotFound { type_name: ResourceType, lookup_type: LookupType },
/// An object already exists with the specified name or identifier.
#[error("Object (of type {type_name:?}) already exists: {object_name}")]
ObjectAlreadyExists { type_name: ResourceType, object_name: String },
/// The request was well-formed, but the operation cannot be completed given
/// the current state of the system.
#[error("Invalid Request: {}", .message.display_internal())]
InvalidRequest { message: MessagePair },
/// Authentication credentials were required but either missing or invalid.
/// The HTTP status code is called "Unauthorized", but it's more accurate to
/// call it "Unauthenticated".
#[error("Missing or invalid credentials")]
Unauthenticated { internal_message: String },
/// The specified input field is not valid.
#[error("Invalid Value: {label}, {}", .message.display_internal())]
InvalidValue { label: String, message: MessagePair },
/// The request is not authorized to perform the requested operation.
#[error("Forbidden")]
Forbidden,
/// The system encountered an unhandled operational error.
#[error("Internal Error: {internal_message}")]
InternalError { internal_message: String },
/// The system (or part of it) is unavailable.
#[error("Service Unavailable: {internal_message}")]
ServiceUnavailable { internal_message: String },
/// There is insufficient capacity to perform the requested operation.
///
/// This variant is translated to 507 Insufficient Storage, and it carries
/// both an external and an internal message. The external message is
/// intended for operator consumption and is intended to not leak any
/// implementation details.
#[error("Insufficient Capacity: {}", .message.display_internal())]
InsufficientCapacity { message: MessagePair },
#[error("Type version mismatch! {internal_message}")]
TypeVersionMismatch { internal_message: String },
#[error("Conflict: {}", .message.display_internal())]
Conflict { message: MessagePair },
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing ObjectNotFound might be the right way to do it, but that's not what I specifically had in mind.

image

In the logs above, there's the external event message as you'd expect but also a serialized version of the actual error object. Even if the "query" part of this never showed up in an error, we'd still see it in the logs (we, at least under the context at which this was initially seen).

Now, it could be that we really only see errors serialized out like that in the logs under very special circumstances (maybe only saga events?). Were that the case, it may not be worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If the LookupType itself gets logged, in theory I could put internal/external on that and we'd see it that way, without affecting the way the error message is written out for the API response.

It looks like we do see the full object in the logs because this happens during the instance create saga — this is from log output after running the test edited in this PR.

$ ntlog | rg 'default IP pool'
{"msg":"recording saga event","v":0,"name":"test_instance_ephemeral_ip_no_default_pool_error","level":20,"time":"2024-01-24T16:45:33.208653-06:00","hostname":"Davids-M1-MBP.local","pid":98315,"component":"SecStore","component":"nexus","component":"ServerContext","name":"391cc54f-028b-4b57-befe-0d3a4239ec4e","event_type":"Failed(ActionFailed { source_error: Object {\"ObjectNotFound\": Object {\"lookup_type\": Object {\"ByOther\": String(\"default IP pool for current silo\")}, \"type_name\": String(\"ip-pool\")}} })","node_id":"35","saga_id":"522d6f2d-b616-43a3-ae88-6c85f2e8596b"}
{"msg":"request completed","v":0,"name":"test_instance_ephemeral_ip_no_default_pool_error","level":30,"time":"2024-01-24T16:45:33.943636-06:00","hostname":"Davids-M1-MBP.local","pid":98315,"uri":"/v1/instances?project=springfield-squidport","method":"POST","req_id":"1d8fbacd-e242-495f-a033-1a0d7edfe88b","remote_addr":"127.0.0.1:55531","local_addr":"127.0.0.1:55523","component":"dropshot_external","name":"391cc54f-028b-4b57-befe-0d3a4239ec4e","error_message_external":"not found: default IP pool for current silo","error_message_internal":"not found: default IP pool for current silo","latency_us":1388504,"response_code":"404"}

Will think about this.

@david-crespo
Copy link
Contributor Author

Merging this as a net improvement. Will think about making it more better later.

@david-crespo david-crespo merged commit a3a9844 into main Jan 26, 2024
20 checks passed
@david-crespo david-crespo deleted the default-pool-not-found-error branch January 26, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More detail in default ip-pool lookup failures
2 participants