Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[meta] Better handling of the most common migration issues #129016

Closed
3 of 4 tasks
pgayvallet opened this issue Mar 31, 2022 · 5 comments
Closed
3 of 4 tasks

[meta] Better handling of the most common migration issues #129016

pgayvallet opened this issue Mar 31, 2022 · 5 comments
Labels
Feature:Migrations Meta project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc

Comments

@pgayvallet
Copy link
Contributor

pgayvallet commented Mar 31, 2022

Our most common migration failure causes could highly benefit from better identification and documentation.

This is a meta issue to list and track the process of each individual failures.

For each failure, we want to:

In any case:

  • Be able to identify it, and to assign a unique error code to it
  • Add online documentation describing how to fix, or work around, the failure
    • it can either be one page per failure or one page listing all the failures, TBD
  • Surface the error code, and the link to the documentation, in the failure's log

When the failure's cause can be predetermined:

Note: ideally, these changes would be backported to the 7.17 branch to ease the migration experience from 7.last to 8+

Individual issues:

@pgayvallet pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Meta project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Feature:Migrations labels Mar 31, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-core (Team:Core)

@TinaHeiligers
Copy link
Contributor

TinaHeiligers commented Apr 5, 2022

assign a unique error code to it

Do we want to build on what we already have for the actions/es_errors or implement something entirely new?
For example, we have a rather nicely-readable error format for the saved objects service.

As for the code ranges, assuming the errors would fall in the 5xx, we could add a generic CODE_SAVED_OBJECTS_MIGRATIONS_ERROR as a 51x and decorate that.

OTOH, we'll probably run out of integers as we start reporting individual elasticsearch API errors.

@pgayvallet @rudolf If there's already a documented strategy for the error codes, please let me know!

@pgayvallet
Copy link
Contributor Author

pgayvallet commented Apr 6, 2022

I don't think we should be using the SO service errors for SO migrations. SO errors are errors that are meant to be surfaced to the client via http transport, and therefor are directly bound to a specific http error code and structure.

For SO migration failures, the only thing we need ihmo is a unique error identifier that will be surfaced in the logs and can be used to search, or link to, the error in our documentation.

I was naively thinking of something like

ERROR_SO_MIGRATION_FAILURE_XXX, with XXX being either a number or an alpha-num value

e.g

ERROR_SO_MIGRATION_FAILURE_001

or

ERROR_SO_MIGRATION_FAILURE_ROUTING_ALLOCATION_DISABLED

The ERROR_SO_MIGRATION_FAILURE prefix could easily be changed too if someone comes with a better idea

@Bamieh
Copy link
Member

Bamieh commented May 12, 2022

@pgayvallet Just started looking into this. Historically ES avoided using error codes and I think it makes sense for Kibana to adopt their philosophy as well. I understand error codes might be easier to document however searching specific error strings on google is a lot easier and gives more relevant results. It also becomes another thing we need to maintain on the long run that we can avoid completely.

Update: Reflecting our discussion during the team sync:
The intention is to use an "error type" rather than an error code which should make it easier to search additionally we'll be linking to our docs to help people resolve these errors so using these error types should make it easier for users to troubleshoot and fix the migration errors.

@rudolf
Copy link
Contributor

rudolf commented Sep 6, 2022

Closing as all the individual issues have been addressed

@rudolf rudolf closed this as completed Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Migrations Meta project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc
Projects
None yet
Development

No branches or pull requests

5 participants