Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Snapshot Restore] Fix logic for finding last successful managed snapshot #159324

Conversation

ElenaStoeva
Copy link
Contributor

@ElenaStoeva ElenaStoeva commented Jun 8, 2023

Fixes #158548
Fixes #153107

Summary

This PR changes the logic for finding the last successful managed snapshot by searching amongst all snapshots rather than searching only from the filtered snapshots. This way, the last successful managed snapshot will always be the same, no matter whether the list is filtered or not.

The plugin README is updated with instructions on how to imitate a cloud-managed repository in order to create a managed snapshot.

Also, functional tests are added for checking the logic around last successful snapshots and filtering (the test suite successfully catches the incorrect behaviour when the fix from this PR is not present).

How to test:
We need to mock the cloud environment in order to have a managed repository, policy, and snapshots. Also, we add two different repo paths because we will create two repositories - one managed and one unmanaged so that we test a scenario with a list of managed and unmanaged snapshots.

  1. Start Es with yarn es snapshot --license trial -E path.repo='/tmp/es-backups','/tmp/snap' and Kibana with yarn start
  2. Add the cluster.metadata.managed_repository and cluster.metadata.managed_policies settings via Console:
PUT /_cluster/settings
{
  "persistent": {
    "cluster.metadata.managed_repository": "found-snapshots",
    "cluster.metadata.managed_policies": ["managed-policy"]
  }
}
  1. Go to Stack Management -> Snapshot and Restore
  2. Create a Shared File System repository with the same name as your managed repository setting value (found-snapshots) and with location /tmp/es-backups
  3. Create a second Shared File System repository (this one will be unmanaged) with the name unmanaged-repo and with location /tmp/snap
  4. Create a managed policy with the same name as your managed policy setting value (managed-policy) that uses the managed repository created in step 5.
  5. Create a second unmanaged policy that uses the unmanaged repository created in step 6.
  6. Run the two policies a couple of times so that multiple snapshots are generated.
  7. Go to the Snapshots tab and verify that the most recent managed snapshot is non-deletable.
  8. Filter the snapshots so that you get another snapshot on the top row. Verify that this snapshot is deletable.
Screen.Recording.2023-06-29.at.18.19.48.mov

@ElenaStoeva ElenaStoeva added Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more release_note:skip Skip the PR/issue when compiling release notes Feature:Snapshot and Restore Elasticsearch snapshots and repositories UI labels Jun 8, 2023
@ElenaStoeva ElenaStoeva self-assigned this Jun 8, 2023
@ElenaStoeva
Copy link
Contributor Author

@elasticmachine merge upstream

@ElenaStoeva ElenaStoeva marked this pull request as ready for review June 9, 2023 10:47
@ElenaStoeva ElenaStoeva requested a review from a team as a code owner June 9, 2023 10:47
@elasticmachine
Copy link
Contributor

Pinging @elastic/platform-deployment-management (Team:Deployment Management)

@alisonelizabeth
Copy link
Contributor

@ElenaStoeva I think you can imitate a managed snapshot following these instructions: #93609 (comment). That said, it's been awhile and I didn't test to confirm 😬

If this works, would you mind updating the plugin README as well with these instructions so we have it documented somewhere? Thanks!

@ElenaStoeva
Copy link
Contributor Author

@ElenaStoeva I think you can imitate a managed snapshot following these instructions: #93609 (comment). That said, it's been awhile and I didn't test to confirm 😬

Thanks @alisonelizabeth, these instructions worked! In this case, I'll update the PR description and will see if we can use this to add some tests.

If this works, would you mind updating the plugin README as well with these instructions so we have it documented somewhere? Thanks!

Sure, I'll update the README.

@ElenaStoeva ElenaStoeva marked this pull request as draft June 9, 2023 16:21
@ElenaStoeva ElenaStoeva marked this pull request as ready for review June 12, 2023 18:12
@alisonelizabeth
Copy link
Contributor

@elasticmachine merge upstream

@@ -41,6 +41,9 @@ export const SnapshotList: React.FunctionComponent<RouteComponentProps<MatchPara
location: { search },
history,
}) => {
const unfilteredSnapshotsResponse = useLoadSnapshots(DEFAULT_SNAPSHOT_LIST_PARAMS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this bug was introduced when we implemented pagination in #110266.

I'm a little concerned that we are now making 2 requests to get snapshots. Here, and also on L58. Is it possible to avoid this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this bug was introduced when we implemented pagination in #110266.

That PR was merged into v7.16 and so I pulled v7.15 locally to test and it seems the bug is present there as well. 🤔

I'm a little concerned that we are now making 2 requests to get snapshots. Here, and also on L58. Is it possible to avoid this?

Yes, I agree that having 2 requests to Es is not great in terms of performance. The main reason behind this approach is that, based on my understanding, the current implementation filters the snapshots by sending a search request to Es with a query. However, in this way, we cannot get the list of all existing snapshots and so we can't find the correct last successful snapshot since it might not be amongst the filtered ones. That's why I added the second request - without the query - so that we get all existing snapshots and find the last successful one from this list.

The other approach that I had in mind was to send one request without any query and obtain all existing snapshots. Then we can add a function to filter the list based on the filters typed in the search bar. However, at the end this computation might be slower than making a search request to Es with a query. On the other hand, the computation would be performed only when the user adds a filter in the search bar.

What do you think @alisonelizabeth? I'm open to any other ideas or suggestions for improvement.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @ElenaStoeva!

The other approach that I had in mind was to send one request without any query and obtain all existing snapshots.

I don't think we should pursue this, as users can potentially have a large number of snapshots (the reasoning why we implemented pagination in #110266).

Did you look into if it's possible to fetch only the latest managed snapshot? I think you could leverage the size and order parameters (ES docs). However, you would also need the managed repository name and I don't recall if you can specify that you want only successful snapshots returned.

cc @yuliacech as I think she has more familiarity with the pagination work and might have some thoughts too

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you look into if it's possible to fetch only the latest managed snapshot? I think you could leverage the size and order parameters (ES docs). However, you would also need the managed repository name and I don't recall if you can specify that you want only successful snapshots returned.

No, I haven't thought about this, but I think this is a great idea. I don't see a query parameter for filtering by state, but since the returned snapshots can be sorted by start time we would just need to return the first successful one (worst case scenario would be having many unsuccessful snapshots). I see the repository name can be returned in the response, so for each returned snapshot we would have to check if its repository is managed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @alisonelizabeth, I added the implementation that we discussed - fetching the last successful managed snapshot only instead of all unfiltered snapshots. If this implementation makes sense, I will add tests for the new function.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
snapshotRestore 268.3KB 268.2KB -64.0B

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
snapshotRestore 27.5KB 27.6KB +124.0B
Unknown metric groups

ESLint disabled line counts

id before after diff
enterpriseSearch 14 16 +2
securitySolution 413 417 +4
total +6

Total ESLint disabled count

id before after diff
enterpriseSearch 15 17 +2
securitySolution 492 496 +4
total +6

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @ElenaStoeva

@ElenaStoeva ElenaStoeva marked this pull request as draft June 30, 2023 14:46
@ElenaStoeva
Copy link
Contributor Author

ElenaStoeva commented Jun 30, 2023

@yuliacech and I discussed this issue over Zoom and here are some of the conclusions that we reached:

  1. With the current possible Snapshot API query parameters, there is no efficient way to fetch the last successful managed snapshot, since there is no query parameter for the state and so Es can return a lot of snapshots, given that the default managed pipeline on Cloud runs every 30 minutes. Even if we use pagination, there is still a worst case scenario when the last successful snapshot is one of the first ones and then we would still have to fetch all managed snapshots.

  2. If we ask the Es team to add a parameter query for the state, there is an efficient and straightforward solution: sending a request to Es with this parameter set to SUCCESS, the managed repository name, sorting by date, and setting size to 1:

        const response = await clusterClient.asCurrentUser.snapshot.get({
          repository: managedRepository,
          snapshot: '_all',
          ignore_unavailable: true,
          sort: 'start_time',
          order: 'desc',
          state: 'SUCCESS',
          size: 1,
        });

This will always return either the last successful managed snapshot or nothing, so it will not cause performance issues. Also, we found another Snapshot and Restore enhancement request (#137432 ) that requires adding a state parameter, so I plan to reach out to the team that works on the Snapshot APIs and ask if they could add this parameter.

  1. If adding the state parameter is not feasible, we may also need to rethink this functionality as we noticed that there is inconsistency between the constraints in the Snapshots UI and those in Es. For example, the UI doesn't allow deleting the last successful managed snapshot, the managed policy, and the managed snapshot, while all of these can be deleted by sending a Es request in the Console. We were discussing that it could be better if this restriction is introduced from the Es side instead or if we allow deleting them and add a warning if the users attempts to delete them.

@alisonelizabeth
Copy link
Contributor

Thanks for the investigation and summary, Elena!

This will always return either the last successful managed snapshot or nothing, so it will not cause performance issues. Also, we found another Snapshot and Restore enhancement request (#137432 ) that requires adding a state parameter, so I plan to reach out to the team that works on the Snapshot APIs and ask if they could add this parameter.

++ let me know if you need help reaching out to the team.

I agree that we should aim for parity between the UI and ES API. I think we had a discussion about this when it was initially implemented and the decision was made to enforce at the UI only at the time, but it's probably worth revisiting. I don't recall the exact reasoning 😅

or if we allow deleting them and add a warning if the users attempts to delete them.

This might be a good middle approach

@ElenaStoeva ElenaStoeva deleted the snapshot-restore-last-successful-snapshot branch June 24, 2024 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Snapshot and Restore Elasticsearch snapshots and repositories UI release_note:skip Skip the PR/issue when compiling release notes Team:Kibana Management Dev Tools, Index Management, Upgrade Assistant, ILM, Ingest Node Pipelines, and more
Projects
None yet
5 participants