`.fleet-actions-results` data stream cannot be restored via the `fleet` feature state #89261

romain-chanu · 2022-08-11T04:48:07Z

Elasticsearch Version

8.3.3

Installed Plugins

No response

Java Version

bundled

OS Version

Deployment in ESS

Problem Description

.fleet-actions-results data stream cannot be restored via the fleet feature state.

Consider the following scenario (observed in the field in ESS):

Due to unforeseen situation, cluster becomes red with the following red indices:

health status index                                                                                                                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size sth
green  open   .ds-.fleet-actions-results-2022.05.04-000002                                                                                    eZO3mXu3RYOZpygHvC2dgQ   1   1          0            0       450b           225b false
red    open   .ds-.fleet-actions-results-2022.06.03-000003                                                                                    iBbSWmHaQbqJFn_aBVqaYg   1   1                                                   false
red    open   .ds-.fleet-actions-results-2022.07.03-000004                                                                                    sF3-S4uoQkybpm7ujaZBVg   1   1                                                   false
red    open   .ds-.fleet-actions-results-2022.08.02-000006                                                                                    t-U-Wrd_RpqZUqSS2a3TqA   1   1                                                   false
red    open   .fleet-actions-7                                                                                                                8zgOKVzdQIeS_YGq_JX--w   1   1                                                   false
red    open   .fleet-agents-7                                                                                                                 p7sWhvhPRaWQ_unOHIJQTQ   1   1                                                   false
red    open   .fleet-artifacts-7                                                                                                              iingfeghRJ2bfqLAGFt0Aw   1   1                                                   false
red    open   .fleet-enrollment-api-keys-7                                                                                                    8J1tyEuJSfyhMxf5HsfU2A   1   1                                                   false
red    open   .fleet-policies-7                                                                                                               HufDBhgBQraUYlNosY1ysg   1   1                                                   false
red    open   .fleet-policies-leader-7                                                                                                        jpqhCaF9SL-S0AjlWqa6xg   1   1                                                   false
red    open   .fleet-servers-7                                                                                                                5xdgNy-kSXSdsWZbM8mRHw   1   1                                                   false

User attempts to restore the fleet feature state using the following restore snapshot API:

POST _snapshot/found-snapshots/cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/_restore?wait_for_completion=false
{
  "indices": "-*",
  "ignore_unavailable": "true",
  "include_global_state": "false",
  "include_aliases": "false",
  "feature_states": [
   "fleet"
  ]
}

Above API fails with the following error:

{
  "error": {
    "root_cause": [
      {
        "type": "snapshot_restore_exception",
        "reason": "[found-snapshots:cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/H3i28HlrSiKyrLaiDCE6uA] cannot restore index [.ds-.fleet-actions-results-2022.06.03-000003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
      }
    ],
    "type": "snapshot_restore_exception",
    "reason": "[found-snapshots:cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/H3i28HlrSiKyrLaiDCE6uA] cannot restore index [.ds-.fleet-actions-results-2022.06.03-000003] because an open index with same name already exists in the cluster. Either close or delete the existing index or restore the index under a different name by providing a rename pattern and replacement name"
  },
  "status": 500
}

Checking the fleet feature state, it seems that the SystemIndexDescriptor (c.f code) does contain the .fleet-actions-results-* pattern. A couple of guesses about the reported problem:

The implementation only considers regular indices and not data streams?
The implementation considers the data stream but fails to close the backing indices before restoring them?

Steps to Reproduce

Create a cluster version 8.3.3 and deploy an Elastic Agent with the Osquery Manager integration.
Run a new live Osquery.
Observe that the .fleet-actions-results data stream is created with the respective backing indices.
Restore the fleet feature state using the restore snapshot API and observe the same error as above.

Workaround

Create fleet_superuser role

POST _security/role/fleet_superuser
{
  "indices": [
    {
      "names": [
        ".fleet*"
      ],
      "privileges": [
        "all"
      ],
      "allow_restricted_indices": true
    }
  ]
}

Create temp_user user with superuser, fleet_superuser roles:

POST _security/user/temp_user
{
  "password": "temp_password",
  "roles": [
    "superuser",
    "fleet_superuser"
  ]
}

Close .fleet-actions-results backing indices using the below cURL command:

curl -k -XPOST --user temp_user:temp_password -H 'x-elastic-product-origin:fleet' https://$CLUSTER_ADDRESS/.ds-.fleet-actions-results-2022.05.04-000002,.ds-.fleet-actions-results-2022.06.03-000003,.ds-.fleet-actions-results-2022.07.03-000004,.ds-.fleet-actions-results-2022.08.02-000006/_close

Note: for users running the cURL command on Windows, make sure to use double quotes instead for the header: "x-elastic-product-origin:fleet"

Restore fleet feature state:

POST _snapshot/found-snapshots/cloud-snapshot-2022.08.08-lywsv4teqe-zj3ygvjkria/_restore?wait_for_completion=false
{
  "indices": "-*",
  "ignore_unavailable": "true",
  "include_global_state": "false",
  "include_aliases": "false",
  "feature_states": [
    "fleet"
  ]
}

Delete temp_user user

DELETE _security/user/temp_user

Delete fleet_superuser role

DELETE _security/role/fleet_superuser

Logs (if relevant)

No response

The text was updated successfully, but these errors were encountered:

elasticsearchmachine · 2022-08-12T00:13:34Z

Pinging @elastic/es-distributed (Team:Distributed)

williamrandolph · 2022-08-18T13:11:29Z

When restoring system indices (not data streams) from a snapshot, the user isn't able to close or delete the system index, so we delete the existing system indices as we restore. It looks like we need to do the same thing for system data streams (or, if it's something we're already supposed to do, hunt for a bug or race condition that could be causing the problem). Since core/infra added this logic as part of the system indices project, it's fine with me if this issue is assigned to us.

romain-chanu · 2022-08-30T10:20:07Z

@williamrandolph -

When restoring system indices (not data streams) from a snapshot, the user isn't able to close or delete the system index, so we delete the existing system indices as we restore. It looks like we need to do the same thing for system data streams (or, if it's something we're already supposed to do, hunt for a bug or race condition that could be causing the problem). Since core/infra added this logic as part of the system indices project, it's fine with me if this issue is assigned to us.

I believe the pull request is related (#75860). We probably have missed something in the restore logic.

elasticsearchmachine · 2022-11-07T10:35:20Z

Pinging @elastic/es-core-infra (Team:Core/Infra)

Leaf-Lin · 2022-11-07T10:36:30Z

Based on comment above, I've relabeled this to Core/Infra.

Since core/infra added this logic as part of the system indices project, it's fine with me if this issue is assigned to us.

romain-chanu added >bug needs:triage Requires assignment of a team area label :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs labels Aug 11, 2022

elasticsearchmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed needs:triage Requires assignment of a team area label labels Aug 12, 2022

romain-chanu added the needs:triage Requires assignment of a team area label label Aug 12, 2022

elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Aug 12, 2022

elasticsearchmachine added the Team:Core/Infra Meta label for core/infra team label Aug 17, 2022

arteam added :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs and removed :Core/Infra/Core Core issues without another label labels Aug 17, 2022

elastic deleted a comment from elasticsearchmachine Aug 17, 2022

elasticsearchmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. and removed Team:Core/Infra Meta label for core/infra team labels Aug 17, 2022

williamrandolph self-assigned this Aug 24, 2022

romain-chanu mentioned this issue Jan 3, 2023

Unable to restore the fleet feature state #92272

Closed

stefnestor mentioned this issue Dec 29, 2023

Unable to read/delete system data streams #92271

Open

arpadkiraly unassigned williamrandolph Apr 29, 2024

romain-chanu mentioned this issue Jul 22, 2024

Get snapshot API returns duplicate information for .fleet-actions-results system data stream #111146

Open

romain-chanu mentioned this issue Jul 22, 2024

Missing .fleet-actions-results system data stream in the fleet feature state reported by Get snapshot API #111148

Open

stefnestor mentioned this issue Nov 2, 2024

.fleet-action-results cannot be accessed by cloud #116139

Open

stefnestor added the Supportability Improve our (devs, SREs, support eng, users) ability to troubleshoot/self-service product better. label Nov 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`.fleet-actions-results` data stream cannot be restored via the `fleet` feature state #89261

`.fleet-actions-results` data stream cannot be restored via the `fleet` feature state #89261

romain-chanu commented Aug 11, 2022 •

edited

Loading

elasticsearchmachine commented Aug 12, 2022

williamrandolph commented Aug 18, 2022

romain-chanu commented Aug 30, 2022

elasticsearchmachine commented Nov 7, 2022

Leaf-Lin commented Nov 7, 2022

.fleet-actions-results data stream cannot be restored via the fleet feature state #89261

.fleet-actions-results data stream cannot be restored via the fleet feature state #89261

Comments

romain-chanu commented Aug 11, 2022 • edited Loading

Elasticsearch Version

Installed Plugins

Java Version

OS Version

Problem Description

Steps to Reproduce

Workaround

Logs (if relevant)

elasticsearchmachine commented Aug 12, 2022

williamrandolph commented Aug 18, 2022

romain-chanu commented Aug 30, 2022

elasticsearchmachine commented Nov 7, 2022

Leaf-Lin commented Nov 7, 2022

`.fleet-actions-results` data stream cannot be restored via the `fleet` feature state #89261

`.fleet-actions-results` data stream cannot be restored via the `fleet` feature state #89261

romain-chanu commented Aug 11, 2022 •

edited

Loading