Skip to content

Conversation

@fivetran-kostaszoumpatianos
Copy link
Contributor

Fixes: #2471

This PR introduces a polaris.readiness.ignore-offending-properties config that accepts a map of properties for which readiness checks are suppressed.

It can be used, for example, as follows:

polaris.readiness.ignore-offending-properties=\
  polaris.metrics.user-principal-tag.enable-in-api-metrics,\
  polaris.features."ALLOW_INSECURE_STORAGE_TYPES",\
  polaris.features."SUPPORTED_CATALOG_STORAGE_TYPES"

The check performed is case-insesitive.

@fivetran-kostaszoumpatianos
Copy link
Contributor Author

@dimas-b @adutra @eric-maynard could you maybe take a look at this PR? thanks!

.filter(
error ->
config.ignoreOffendingProperties().stream()
.noneMatch(prop -> prop.equalsIgnoreCase(error.offendingProperty())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This approach LGTM in general. However, WDYT about adding a new "ID" property to Error, e.g. error.getId().

The ID could be a static constant for cases where there could only ever be one Error instance per "check" code (e.g. checkUserPrincipalMetricTag) or it could be a deterministic (hash) function of the "type" plus some parameters (e.g. checkInsecureStorageSettings could produce IDs like storage-17af38 and storage-46fq98).

The idea is that admin users should suppress specific error instances, but not "ranges" of errors. This way, if an admin user suppresses one particular check cases, new checks will still be visible when Polaris adds them. The value of error.offendingProperty() may still be too broad in some cases.

The "hash" part being deterministic will allow admin users to propagate the same configuration to all their deployment environments. At the same time, it is not easy to guess, which will force the admin user to review what exactly needs to be suppressed. Also, if the meaning of the error changes, we can change the ID, and it will force the admin users to reassess the implications (and re-suppress).

WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that admin users should suppress specific error instances

What would this flow look like, though? Is this to support cases like I want to allow setting config X, but not Y, and I want to allow setting config Z to A or B but not to C.? I fear we are at risk of overengineering this a bit. As it is, only admins have access to these configs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from my POV, this is not so much about config A=X or A=Y, but more about "Polaris detected something dangerous about X". Now, if the admin user suppresses this warning, I do not want the suppression to automatically hide future warnings about "dangerous Y".

It may be related to some specific config, but may be not. I can imagine running as the root OS user falls under the same category of auto-detectable issues.

Copy link
Contributor Author

@fivetran-kostaszoumpatianos fivetran-kostaszoumpatianos Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering based on error ids would also fit well with the naming change that you propose @eric-maynard then we can call the ignore method: ignoreSelectedIssues since we will now have a way of filtering by issue. Maybe the parameter hash is a bit too much. For me, I would be ok deactivating a check altogether if I know that I have a dangerous config there. It depends on how cautious we want to be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be ok without the parameter hash for a start. However, having a concise but non-predictable error ID is important I think. That is to say, a user suppressing a particular error must first observe the error. It should not be easy to suppress something "proactively" :) At the same time the error ID should not be dependent on the runtime env. (i.e. be the same in all k8s pods, for example). WDYT?

* production readiness.
*/
@WithDefault("{}")
Set<String> ignoreOffendingProperties();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the asymmetry between this and ignoreSevereIssues? IIUC this is basically a subset of severe (?) issues that the admin wants to configure the readiness check to ignore. Maybe ignoreSelectIssues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because technically the check is based on the offending property and not the issue type. ignoreIssuesForSelectedOffendingProperties ? maybe too much?

@github-actions
Copy link

This PR is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Oct 13, 2025
@github-actions github-actions bot closed this Oct 19, 2025
@github-project-automation github-project-automation bot moved this from PRs In Progress to Done in Basic Kanban Board Oct 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ignore user-specified readiness checks

3 participants