Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ManagedClusterView should preserve original error when retrieving resources #604

Open
BenamarMk opened this issue May 8, 2023 · 3 comments

Comments

@BenamarMk
Copy link

Problem:
Currently, the ManagedClusterView implementation does not preserve the original error when attempting to retrieve resources from a ManagedCluster. Instead, it converts all errors to a generic error and embeds the error text in the condition message (see here), which can be frustrating and difficult to parse.

For example, when a resource is not found or when there is a connectivity issue, there is nothing that distinguishes between the two cases other than the embedded error message within the .status.condition[0].Message

To address this, we recommend improving the ManagedClusterView implementation to preserve the last error when retrieving resources. Specifically, we suggest adding a new field to the .status object that holds the last error encountered during resource retrieval. Another possibility is to add another condition that holds the error.

@zhiweiyin318
Copy link
Contributor

zhiweiyin318 commented May 25, 2023

we have different reasons for this condition when failed to get resource:
"ResourceNameInvalid"
"ResourceTypeInvalid"
"ResourceGVKInvalid"
"GetResourceFailed"

not sure to add a not found reason can help you?
do you still need the error if the process is recovered from failure?

@BenamarMk
Copy link
Author

BenamarMk commented Jun 6, 2023

For the reason GetResourceFailed, there is no distinction between a Not Found error and a Timeout error for example. Both errors end up in the message string.
The Not Found error tells us that the resource does not exist in the managed cluster. Here is an example of not found error

GetResourceFailed failed to get resource with err: volumereplicationgroups.ramendr.openshift.io "busybox-drpc" not found

And here is an example of an inaccessible cluster

GetResourceFailed failed to get resource with err: Get "https://172.30.0.1:443/apis/ramendr.openshift.io/v1alpha1/namespaces/busybox-workloads-3/volumereplicationgroups/busybox-drpc": dial tcp 172.30.0.1:443: i/o timeout

The line that builds the error message -> see here

@zhiweiyin318
Copy link
Contributor

@BenamarMk how about use the statusReason of error to replace GetResourceFailed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants