-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move Security to use auto-managed system indices #67114
Move Security to use auto-managed system indices #67114
Conversation
Pinging @elastic/es-security (Team:Security) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! I left a comment about an edge case that I think needs addressed
@@ -350,53 +308,6 @@ public void prepareIndexIfNeededThenExecute(final Consumer<Exception> consumer, | |||
} else if (indexState.indexExists() && indexState.isIndexUpToDate == false) { | |||
throw new IllegalStateException("Index [" + indexState.concreteIndexName + "] is not on the current version." | |||
+ "Security features relying on the index will not be available until the upgrade API is run on the index"); | |||
} else if (indexState.indexExists() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we still need to handle the case where the master node doesn't know about the security system index, so for BWC we should maintain this code somehow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting point.
I think we presently don't handle the cases of index auto-creation and mapping updates too well.
if .security
does not yet exist, rolling update is in progress, and an API that creates the index for the first time hits a new node, the index will be created, but the other old nodes will subsequently complain that the index is not up to date (because the format number is greater) or that the mapping is not up to date (because the mapping version is greater).
I think this might be a common issue, I vaguely remember something similar for async search. Would it be possible to defer the system index upgrade (of the mapping and of the settings/metadata) until the rolling upgrade is complete, and let the security business logic deal with possibly storing entities in the old format, or if that's not possible, return a response failure informing that the requested API (with its parameters) is not available in a mixed cluster (similar to how no APIs are available until the cluster state has been recovered)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reinstated the code for creating the security index, while changing the code so that it relies on the auto-create logic. I've also re-added code for the mappings not being up-to-date, where the runnable is ignore and the exception consumer is called. Is that what you had in mind?
I've also changed a number of locations to check for a different in minimum and maximum ES version across the cluster nodes. If there is a different, the auto-create logic will refuse to run, and the SystemIndexManager
will not try to update mappings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reinstated the code for creating the security index
I'm not sure this is need, the auto-create should handle it (now that auto-create has a predictable behaviour in a mixed cluster).
I've also re-added code for the mappings not being up-to-date, where the runnable is ignore and the exception consumer is called. Is that what you had in mind?
This fixes the problem, yes.
I've also changed a number of locations to check for a different in minimum and maximum ES version across the cluster nodes.
This is very cool. I like it that there is a clear, testable behavior in a mixed cluster scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if the create call is kept around, the rename from prepareIndexIfNeededThenExecute
to checkIndexStateThenExecute
is less fortunate 👿
@pugnascotia I'm also going to take a look at this tomorrow. I hope that's alright 🙂 |
@albertzaharovits I would be very happy if you could take a look 🙏 |
@@ -350,53 +308,6 @@ public void prepareIndexIfNeededThenExecute(final Consumer<Exception> consumer, | |||
} else if (indexState.indexExists() && indexState.isIndexUpToDate == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes around here are neat given that this is a refactoring! 👍
But I think we should let the runnable through if the index does not exist, and only then.
If the index exists, and it has an old mapping version we should hold off the runnable, because it can race with the service that updates the mapping. Related side note: can the mapping update service maybe hook into the "auto put mapping action" in a similar fashion that the system index creation hooks into the "index auto create action"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed TransportPutMappingAction
to enforce that any attempt to change mappings on a system index must supply the same mappings as the system index descriptor contains. I haven't updated TransportAutoPutMappingAction
because I don't understand when that action is used or why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not to dwell too much on it, but IMO I think it is worth investing in adapting the auto mapping update action. It is added recently, so I don't expect there to be too obscure behaviors encoded. The reason I mention it is that moving the mapping update from a ClusterStateListener to another (from SecurityIndexManager
to SystemIndexManager
) is not a great step forward.
Just a suggestion, I haven't investigated it carefully, and it's outside security's concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left two comments about the core of it.
I think we have to think through mapping and metadata upgrades in a mixed cluster scenario, and that index requests don't race with the mapping updates.
Despite that, I think this is looking very promising, we've been eager for a long time to get rid of the update logic in the SecurityIndexManager
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM though the important changes are outside Security Area's purview.
Thank you Rory!
|
||
private XContentBuilder getIndexMappings() { | ||
try { | ||
final XContentBuilder builder = jsonBuilder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I would prefer the mapping as a resource file, I wonder what's the reason for this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was following the example of other plugins that auto-created their indices. It does also make it harder to change the mappings, versus opening up a jar file and editing the json.
CreateIndexClusterStateUpdateRequest updateRequest = descriptor != null && descriptor.isAutomaticallyManaged() | ||
final boolean isSystemIndex = descriptor != null && descriptor.isAutomaticallyManaged(); | ||
|
||
if (isSystemIndex && state.nodes().getMaxNodeVersion().after(state.nodes().getMinNodeVersion())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This concerns me if we cannot auto create a system index in a mixed version cluster even if the descriptor itself does not differ between versions. I think we should allow for creation still. I wonder if we should consider an approach to pull the most up to date version of the descriptor and apply those when creating the index; however there may be cases where that would fail if a feature was used that couldn't be validated on the master. In those cases I would consider falling back to the master version and allowing the master to update the mappings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can pull the most up-to-date version of the descriptor into the master, at least not without building infrastructure to make it pull-able.
How about: in the auto-create case, we always use the master's descriptor. If we release some new feature that needs to create a system index and relies on particular features being present, perhaps it can explicitly create the index using the right settings, either waiting until the cluster is in the right state, or retrying until the creation is successful. Admittedly this is just shifting the problem, but the general system index infra can't know at the moment whether it's OK to attempt creation in a mixed cluster.
I suppose we could extend the descriptor to apply a version range? e.g.
SystemIndexDescriptor.builder().setMinimimVersion(Version.V_8_0_0)
Then the auto-create, explicit create, mappings and settings methods can check the minimum descriptor version against state.nodes().getMinNodeVersion()
. Code that relied on new features would still need to concern themselves with whether it was possible yet to create their system index. If that case becomes common/annoying enough, we can build more support infra.
I think I still want the SystemIndexManager
to hold off upgrading mappings in a mixed cluster - partly in case of a failed upgrade, and partly to avoid disagreements around mappings versions. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the general system index infra can't know at the moment whether it's OK to attempt creation in a mixed cluster
I think this is the crux, and we have to delegate this responsibility to the caller (eg Security(IndexManager)). My suggestion to fail the creation in the mixed cluster scenario was the simplest, in a sense, because the caller doesn't have to guard the system document index call with anything. But I haven't really considered the practical aspect that much more upgrades are done than changes to the system index's mapping. Moreover, most mapping changes are focused on a small portion of the total mapping. Preventing the creation of the system index in all these cases is wasteful, as the creation with the old mapping would be OK.
So, if the decision is to not reject system index creation (or mapping updates) in a mixed cluster scenario (whatever the strategy ultimately is), when indexing a document relying on new mappings (eg a new type of API key) we have to look at the cluster state, and probably reject the indexing (there could be other options, depending on the particular case). I think that's OK; I hoped we could avoid it, but not creating any index in the other cases, is probably not worth it.
...src/main/java/org/elasticsearch/action/admin/indices/settings/put/UpdateSettingsRequest.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/action/admin/indices/settings/put/UpdateSettingsRequest.java
Outdated
Show resolved
Hide resolved
Allow system indices to be created or auto-created so long as their minimum node version requirement is met.
|
||
// `TransportCreateIndexAction` will automatically apply the right mappings, settings and aliases, so none | ||
// of that needs to be specified here. | ||
CreateIndexRequest request = new CreateIndexRequest(indexState.concreteIndexName).waitForActiveShards(ActiveShardCount.ALL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to validate the mappings are good to go in cases where this node is not the master and there are mixed versions in the cluster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Backport of elastic#67114. Part of elastic#61656. Change the Security plugin so that its system indices are managed automatically by the system indices infrastructure. Also add an `origin` field to `CreateIndexRequest` and `UpdateSettingsRequest`.
Re-apply changes from 0c9b9c1, which migrates the `.tasks` system index to be managed automatically by the system indices infrastructure. Changes went into elastic#67114 that, I hope, will avoid the problems we saw before in the BWC tests in CI.
While backporting elastic#67114 via elastic#68375, I realised that there are existing upgrade scenarios that expect the `SecurityIndexManager` to update index mappings, so in the backport PR, this capability was reinstated. This commit does the same in `master`.
Part of #61656.
Change the Security plugin so that its system indices are managed automatically by the system indices infrastructure.