-
Notifications
You must be signed in to change notification settings - Fork 332
Introducing: polaris.readiness.ignore-offending-properties #2472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dimas-b @adutra @eric-maynard could you maybe take a look at this PR? thanks! |
| .filter( | ||
| error -> | ||
| config.ignoreOffendingProperties().stream() | ||
| .noneMatch(prop -> prop.equalsIgnoreCase(error.offendingProperty()))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach LGTM in general. However, WDYT about adding a new "ID" property to Error, e.g. error.getId().
The ID could be a static constant for cases where there could only ever be one Error instance per "check" code (e.g. checkUserPrincipalMetricTag) or it could be a deterministic (hash) function of the "type" plus some parameters (e.g. checkInsecureStorageSettings could produce IDs like storage-17af38 and storage-46fq98).
The idea is that admin users should suppress specific error instances, but not "ranges" of errors. This way, if an admin user suppresses one particular check cases, new checks will still be visible when Polaris adds them. The value of error.offendingProperty() may still be too broad in some cases.
The "hash" part being deterministic will allow admin users to propagate the same configuration to all their deployment environments. At the same time, it is not easy to guess, which will force the admin user to review what exactly needs to be suppressed. Also, if the meaning of the error changes, we can change the ID, and it will force the admin users to reassess the implications (and re-suppress).
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that admin users should suppress specific error instances
What would this flow look like, though? Is this to support cases like I want to allow setting config X, but not Y, and I want to allow setting config Z to A or B but not to C.? I fear we are at risk of overengineering this a bit. As it is, only admins have access to these configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from my POV, this is not so much about config A=X or A=Y, but more about "Polaris detected something dangerous about X". Now, if the admin user suppresses this warning, I do not want the suppression to automatically hide future warnings about "dangerous Y".
It may be related to some specific config, but may be not. I can imagine running as the root OS user falls under the same category of auto-detectable issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filtering based on error ids would also fit well with the naming change that you propose @eric-maynard then we can call the ignore method: ignoreSelectedIssues since we will now have a way of filtering by issue. Maybe the parameter hash is a bit too much. For me, I would be ok deactivating a check altogether if I know that I have a dangerous config there. It depends on how cautious we want to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be ok without the parameter hash for a start. However, having a concise but non-predictable error ID is important I think. That is to say, a user suppressing a particular error must first observe the error. It should not be easy to suppress something "proactively" :) At the same time the error ID should not be dependent on the runtime env. (i.e. be the same in all k8s pods, for example). WDYT?
| * production readiness. | ||
| */ | ||
| @WithDefault("{}") | ||
| Set<String> ignoreOffendingProperties(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the asymmetry between this and ignoreSevereIssues? IIUC this is basically a subset of severe (?) issues that the admin wants to configure the readiness check to ignore. Maybe ignoreSelectIssues?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because technically the check is based on the offending property and not the issue type. ignoreIssuesForSelectedOffendingProperties ? maybe too much?
|
This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Fixes: #2471
This PR introduces a
polaris.readiness.ignore-offending-propertiesconfig that accepts a map of properties for which readiness checks are suppressed.It can be used, for example, as follows:
The check performed is case-insesitive.