[meta] Better handling of the most common migration issues #129016

pgayvallet · 2022-03-31T08:47:35Z

Our most common migration failure causes could highly benefit from better identification and documentation.

This is a meta issue to list and track the process of each individual failures.

For each failure, we want to:

In any case:

Be able to identify it, and to assign a unique error code to it
Add online documentation describing how to fix, or work around, the failure
- it can either be one page per failure or one page listing all the failures, TBD
Surface the error code, and the link to the documentation, in the failure's log

When the failure's cause can be predetermined:

fail-fast during the migration (e.g as it was done in [SO migrations] exit early if cluster routing allocation is disabled #126612)
surface the problem in Upgrade Assistant

Note: ideally, these changes would be backported to the 7.17 branch to ease the migration experience from 7.last to 8+

Individual issues:

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-03-31T08:47:38Z

Pinging @elastic/kibana-core (Team:Core)

TinaHeiligers · 2022-04-05T20:32:20Z

assign a unique error code to it

Do we want to build on what we already have for the actions/es_errors or implement something entirely new?
For example, we have a rather nicely-readable error format for the saved objects service.

As for the code ranges, assuming the errors would fall in the 5xx, we could add a generic CODE_SAVED_OBJECTS_MIGRATIONS_ERROR as a 51x and decorate that.

OTOH, we'll probably run out of integers as we start reporting individual elasticsearch API errors.

@pgayvallet @rudolf If there's already a documented strategy for the error codes, please let me know!

pgayvallet · 2022-04-06T06:20:45Z

I don't think we should be using the SO service errors for SO migrations. SO errors are errors that are meant to be surfaced to the client via http transport, and therefor are directly bound to a specific http error code and structure.

For SO migration failures, the only thing we need ihmo is a unique error identifier that will be surfaced in the logs and can be used to search, or link to, the error in our documentation.

I was naively thinking of something like

ERROR_SO_MIGRATION_FAILURE_XXX, with XXX being either a number or an alpha-num value

e.g

ERROR_SO_MIGRATION_FAILURE_001

or

ERROR_SO_MIGRATION_FAILURE_ROUTING_ALLOCATION_DISABLED

The ERROR_SO_MIGRATION_FAILURE prefix could easily be changed too if someone comes with a better idea

Bamieh · 2022-05-12T13:56:02Z

@pgayvallet Just started looking into this. Historically ES avoided using error codes and I think it makes sense for Kibana to adopt their philosophy as well. I understand error codes might be easier to document however searching specific error strings on google is a lot easier and gives more relevant results. It also becomes another thing we need to maintain on the long run that we can avoid completely.

Update: Reflecting our discussion during the team sync:
The intention is to use an "error type" rather than an error code which should make it easier to search additionally we'll be linking to our docs to help people resolve these errors so using these error types should make it easier for users to troubleshoot and fix the migration errors.

rudolf · 2022-09-06T12:59:29Z

Closing as all the individual issues have been addressed

pgayvallet added Team:Core Core services & architecture: plugins, logging, config, saved objects, http, ES client, i18n, etc Meta project:ResilientSavedObjectMigrations Reduce Kibana upgrade failures by making saved object migrations more resilient Feature:Migrations labels Mar 31, 2022

rudolf closed this as completed Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[meta] Better handling of the most common migration issues #129016

[meta] Better handling of the most common migration issues #129016

pgayvallet commented Mar 31, 2022 •

edited

Loading

elasticmachine commented Mar 31, 2022

TinaHeiligers commented Apr 5, 2022 •

edited

Loading

pgayvallet commented Apr 6, 2022 •

edited

Loading

Bamieh commented May 12, 2022 •

edited

Loading

rudolf commented Sep 6, 2022

[meta] Better handling of the most common migration issues #129016

[meta] Better handling of the most common migration issues #129016

Comments

pgayvallet commented Mar 31, 2022 • edited Loading

Individual issues:

elasticmachine commented Mar 31, 2022

TinaHeiligers commented Apr 5, 2022 • edited Loading

pgayvallet commented Apr 6, 2022 • edited Loading

Bamieh commented May 12, 2022 • edited Loading

rudolf commented Sep 6, 2022

pgayvallet commented Mar 31, 2022 •

edited

Loading

TinaHeiligers commented Apr 5, 2022 •

edited

Loading

pgayvallet commented Apr 6, 2022 •

edited

Loading

Bamieh commented May 12, 2022 •

edited

Loading