-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better error message when there's no default IP pool #4880
Conversation
@@ -144,11 +144,11 @@ pub enum LookupType { | |||
ByName(String), | |||
/// a specific id was requested | |||
ById(Uuid), | |||
/// a session token was requested | |||
BySessionToken(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not used. What's weird is it was introduced in #814, but it is also not used there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for removing this. I'm not sure how much sense it made in this context to begin with.
/// a specific id was requested with some composite type | ||
/// (caller summarizes it) | ||
ByCompositeId(String), | ||
/// object selected by criteria that would be confusing to call an ID | ||
ByOther(String), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A catchall name like this is surely a code smell, but I can't think of a better name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay but I'm just not 100% about it. I know it may be somewhat redundant, but I do kind of like ByQuery
as it stays general but also guides the consumer into giving it something query like.
LookupType::ByCompositeId(label) => { | ||
format!("{} with id \"{}\"", t, label) | ||
} | ||
LookupType::ByOther(msg) => msg, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the point of all this — I need a variant that doesn't stick all that extra structure in the message.
let lookup_type = | ||
LookupType::ByOther("default IP pool for current silo".to_string()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than stripping the ID part of the string (which is helpful because it'll not longer look like an error) I don't know that this provides that much more actionable context. Something like this might be more helpful in the logs:
LookupType::ByQuery(format!("ip_pool where ip_pool_resource { resource_id: {}, is_default: true }", authz_silo_id));
It doesn't have to be exactly that. I know this doesn't communicate in plain english the error (as you've done above), but it does communicate some specifics of what tables to look at if something unexpected is happening.
cc @internet-diglett do you have thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered options more like that, giving the stuff we select by, but to a certain extent I think it’s revealing too much about how the concept of default pool is implemented. It feels weird for an API to be talking about a join table by name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about combining the two? As with other errors in the system, there's both an internal and external message here. Internally we want enough context to know what to dig into if something's gone wrong. Externally we want a general message for the users to know what's failed at a high level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, interesting idea. Mostly only the 500-level errors have internal messages, except InvalidValue
is a 400-level one used for things like disk size being too big and it does have that. It would be pretty invasive to change ObjectNotFound
to have that.
omicron/common/src/api/external/error.rs
Lines 28 to 73 in c392c76
#[derive(Clone, Debug, Deserialize, thiserror::Error, PartialEq, Serialize)] | |
pub enum Error { | |
/// An object needed as part of this operation was not found. | |
#[error("Object (of type {lookup_type:?}) not found: {type_name}")] | |
ObjectNotFound { type_name: ResourceType, lookup_type: LookupType }, | |
/// An object already exists with the specified name or identifier. | |
#[error("Object (of type {type_name:?}) already exists: {object_name}")] | |
ObjectAlreadyExists { type_name: ResourceType, object_name: String }, | |
/// The request was well-formed, but the operation cannot be completed given | |
/// the current state of the system. | |
#[error("Invalid Request: {}", .message.display_internal())] | |
InvalidRequest { message: MessagePair }, | |
/// Authentication credentials were required but either missing or invalid. | |
/// The HTTP status code is called "Unauthorized", but it's more accurate to | |
/// call it "Unauthenticated". | |
#[error("Missing or invalid credentials")] | |
Unauthenticated { internal_message: String }, | |
/// The specified input field is not valid. | |
#[error("Invalid Value: {label}, {}", .message.display_internal())] | |
InvalidValue { label: String, message: MessagePair }, | |
/// The request is not authorized to perform the requested operation. | |
#[error("Forbidden")] | |
Forbidden, | |
/// The system encountered an unhandled operational error. | |
#[error("Internal Error: {internal_message}")] | |
InternalError { internal_message: String }, | |
/// The system (or part of it) is unavailable. | |
#[error("Service Unavailable: {internal_message}")] | |
ServiceUnavailable { internal_message: String }, | |
/// There is insufficient capacity to perform the requested operation. | |
/// | |
/// This variant is translated to 507 Insufficient Storage, and it carries | |
/// both an external and an internal message. The external message is | |
/// intended for operator consumption and is intended to not leak any | |
/// implementation details. | |
#[error("Insufficient Capacity: {}", .message.display_internal())] | |
InsufficientCapacity { message: MessagePair }, | |
#[error("Type version mismatch! {internal_message}")] | |
TypeVersionMismatch { internal_message: String }, | |
#[error("Conflict: {}", .message.display_internal())] | |
Conflict { message: MessagePair }, | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing ObjectNotFound
might be the right way to do it, but that's not what I specifically had in mind.
In the logs above, there's the external event message as you'd expect but also a serialized version of the actual error object. Even if the "query" part of this never showed up in an error, we'd still see it in the logs (we, at least under the context at which this was initially seen).
Now, it could be that we really only see errors serialized out like that in the logs under very special circumstances (maybe only saga events?). Were that the case, it may not be worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If the LookupType
itself gets logged, in theory I could put internal/external on that and we'd see it that way, without affecting the way the error message is written out for the API response.
It looks like we do see the full object in the logs because this happens during the instance create saga — this is from log output after running the test edited in this PR.
$ ntlog | rg 'default IP pool'
{"msg":"recording saga event","v":0,"name":"test_instance_ephemeral_ip_no_default_pool_error","level":20,"time":"2024-01-24T16:45:33.208653-06:00","hostname":"Davids-M1-MBP.local","pid":98315,"component":"SecStore","component":"nexus","component":"ServerContext","name":"391cc54f-028b-4b57-befe-0d3a4239ec4e","event_type":"Failed(ActionFailed { source_error: Object {\"ObjectNotFound\": Object {\"lookup_type\": Object {\"ByOther\": String(\"default IP pool for current silo\")}, \"type_name\": String(\"ip-pool\")}} })","node_id":"35","saga_id":"522d6f2d-b616-43a3-ae88-6c85f2e8596b"}
{"msg":"request completed","v":0,"name":"test_instance_ephemeral_ip_no_default_pool_error","level":30,"time":"2024-01-24T16:45:33.943636-06:00","hostname":"Davids-M1-MBP.local","pid":98315,"uri":"/v1/instances?project=springfield-squidport","method":"POST","req_id":"1d8fbacd-e242-495f-a033-1a0d7edfe88b","remote_addr":"127.0.0.1:55531","local_addr":"127.0.0.1:55523","component":"dropshot_external","name":"391cc54f-028b-4b57-befe-0d3a4239ec4e","error_message_external":"not found: default IP pool for current silo","error_message_internal":"not found: default IP pool for current silo","latency_us":1388504,"response_code":"404"}
Will think about this.
Merging this as a net improvement. Will think about making it more better later. |
Closes #4864
This is a bad error message to get when the problem is that there is no default IP pool configured for your current silo:
"Default pool for current silo" is not an id, so why would we call it one? This is better:
This PR is just making that possible.