Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow regular data streams to be migrated to tsdb data streams. #83843

Merged
merged 15 commits into from
Feb 17, 2022

Conversation

martijnvg
Copy link
Member

@martijnvg martijnvg commented Feb 11, 2022

A regular data stream can be migrated to a tsdb data stream if in template that created the data stream, the index_mode field is set to time_series and the data stream's index_mode property is either not specified or set to standard. Then on the next rollover the data stream is migrated to be a tsdb data stream.

When that happens the data stream's index_mode property is set to time_series and the new backing index's index.mode index setting is also set to time_series.

Closes #83520

A regular data stream can be migrated to a tsdb data stream if the `index_mode` field is set to `time_series`.
The `index_mode` field can only be update from null to `time_series`.
Then on the next rollover the data stream's index mode's is changed
from null to time_series and the new backing indices will have `index.mode` index setting set to `time_series` as well.

Closes elastic#83520
@martijnvg
Copy link
Member Author

How to migrate a non tsdb data stream:

// add template that creates regular data streams:
PUT /_index_template/1
{
    "index_patterns": [
        "k8s*"
    ],
    "template": {
        "settings": {
            "index": {
                "number_of_replicas": 0,
                "number_of_shards": 2
            }
        },
        "mappings": {
            "properties": {
                "@timestamp": {
                    "type": "date"
                },
                "metricset": {
                    "type": "keyword"
                },
                "k8s": {
                    "properties": {
                        "pod": {
                            "properties": {
                                "uid": {
                                    "type": "keyword"
                                },
                                "name": {
                                    "type": "keyword"
                                },
                                "ip": {
                                    "type": "ip"
                                },
                                "network": {
                                    "properties": {
                                        "tx": {
                                            "type": "long"
                                        },
                                        "rx": {
                                            "type": "long"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "data_stream": {}
}

// Index a doc, which triggers data stream creation
POST /k8s/_doc
{
    "@timestamp": "2022-02-14T14:00:00Z",
    "metricset": "pod",
    "k8s": {
        "pod": {
            "name": "dog",
            "uid": "df3145b3-0563-4d3b-a0f7-897eb2876ea9",
            "ip": "10.10.55.3",
            "network": {
                "tx": 1434595272,
                "rx": 530605511
            }
        }
    }
}

// Check results:
GET /_data_stream/k8s

GET /.ds-k8s-*

// Update template
PUT /_index_template/1
{
    "index_patterns": [
        "k8s*"
    ],
    "template": {
        "settings": {
            "index": {
                "number_of_replicas": 0,
                "number_of_shards": 2,
                "routing_path": [
                    "metricset",
                    "time_series_dimension"
                ]
            }
        },
        "mappings": {
            "properties": {
                "@timestamp": {
                    "type": "date"
                },
                "metricset": {
                    "type": "keyword",
                    "time_series_dimension": true
                },
                "k8s": {
                    "properties": {
                        "pod": {
                            "properties": {
                                "uid": {
                                    "type": "keyword",
                                    "time_series_dimension": true
                                },
                                "name": {
                                    "type": "keyword"
                                },
                                "ip": {
                                    "type": "ip"
                                },
                                "network": {
                                    "properties": {
                                        "tx": {
                                            "type": "long"
                                        },
                                        "rx": {
                                            "type": "long"
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "data_stream": {
        "index_mode": "time_series"
    }
}

// and then rollover:
POST /k8s/_rollover

// Check results (newest index should be tsdb index):
GET /_data_stream/k8s

GET /.ds-k8s-*

@martijnvg martijnvg marked this pull request as ready for review February 14, 2022 15:19
@martijnvg martijnvg added :Data Management/Data streams Data streams and their lifecycles >non-issue labels Feb 14, 2022
@elasticmachine elasticmachine added the Team:Data Management Meta label for data/management team label Feb 14, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@martijnvg martijnvg requested review from dakrone and imotov February 14, 2022 16:59
@dakrone
Copy link
Member

dakrone commented Feb 14, 2022

I was playing with this but I wasn't able to get it to work, maybe I'm doing something wrong though?

Steps:

  1. Start in standard mode:
PUT /_index_template/my-template
{
  "index_patterns": [
    "test-*"
  ],
  "data_stream": {
    "index_mode": "standard"
  },
  "template": {
    "mappings": {
      "properties": {
        "uid": {
          "type": "keyword",
          "time_series_dimension": true
        }
      }
    }
  }
}
  1. Index some docs, do a rollover, index more docs
# Do this a few times
POST /test-foo/_doc
{"@timestamp": "2022-01-28T17", "foo": "bar", "uid": "4"}

POST /test-foo/_rollover

# Do this a few times
POST /test-foo/_doc 
{"@timestamp": "2022-01-28T17", "foo": "bar", "uid": "4"}
  1. Update the template to be time_series mode:
PUT /_index_template/my-template
{
  "index_patterns": [
    "test-*"
  ],
  "data_stream": {
    "index_mode": "time_series"
  },
  "template": {
    "settings": {
      "index": {
        "routing_path": "uid"
      }
    },
    "mappings": {
      "properties": {
        "uid": {
          "type": "keyword",
          "time_series_dimension": true
        }
      }
    }
  }
}
  1. Index a few more docs
  2. Attempt to roll over the index with
POST /test-foo/_rollover
  1. Get this error:
{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "[index.routing_path] requires [index.mode=time_series]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "[index.routing_path] requires [index.mode=time_series]"
  },
  "status" : 400
}

Even though the routing path setting is in the template (and retrieving the template shows index_mode: time_series in the data stream config). If I try removing it from the template I get:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "invalid_index_template_exception",
        "reason" : "index_template [my-template] invalid, cause [Validation Failed: 1: [index.mode=time_series] requires a non-empty [index.routing_path];]"
      }
    ],
    "type" : "invalid_index_template_exception",
    "reason" : "index_template [my-template] invalid, cause [Validation Failed: 1: [index.mode=time_series] requires a non-empty [index.routing_path];]"
  },
  "status" : 400
}

@martijnvg
Copy link
Member Author

I made it so, that a data stream can only be upgrade if index_mode hasn't been specified before. In your example it has been set to standard. An existing data stream then won't pick up the change in the template's index_mode property. I did this, because to be strict about the migrate path that is allowed (only from unspecified to time_series). I don't know what future index modes will be supported. However the error upon rollover is hard to understand now. The index mode change isn't picked up but the index.routing_path setting has been specified and this setting can only be used if index.mode=time_series (this isn't data stream validation, but index level validation performed elsewhere).

This is a bad error, since it isn't clear what to do here. What I think that I will do is, if the index_mode has been specified in the template and data streams exist with that index mode then changing the index mode shouldn't be allowed.

Note that, practicality going from standard to time_series and undefined to time_series is the same, but I prefer to be strict about what migrate path is supported.

Copy link
Contributor

@imotov imotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a bit weird that we treat IndexMode.STANDARD and no index mode as two different settings, since IndexMode.STANDARD is default, but I don't have strong feelings about it. Perhaps with a good error message we can avoid some confusion here. Otherwise, looks good to me form the TSDB perspective.

@dakrone
Copy link
Member

dakrone commented Feb 15, 2022

I think it would be best to support null -> time_series and standard -> time_series transitions, otherwise we penalize any user that might have set it accidentally because they were reading docs, and we essentially have 3 states (unset, standard, and time_series) instead of two (standard and time_series)

@martijnvg
Copy link
Member Author

I think it would be best to support null -> time_series and standard -> time_series transitions, otherwise we penalize any user that might have set it accidentally because they were reading docs, and we essentially have 3 states (unset, standard, and time_series) instead of two (standard and time_series)

Ok, I will change this pr, so that migrating from standard to time_series is allowed as well. Although I do think that it is unlikely that user will use the standard value as index mode.

@martijnvg
Copy link
Member Author

martijnvg commented Feb 16, 2022

@dakrone I've made the change to also allow data stream migration if index mode is standard. Do you think we should prevent users from changing the index_mode in the template from time_series to something else?

Making that change won't effect existing tsdb data streams that match with that template, but it might make users think that it is possible (since we support migration of existing data streams to tsdb data streams by changing the index_mode to time_series).

I think it similar to the check we have in MetadataIndexTemplateService#validateDataStreamsStillReferenced(...), here we prevent removing templates and changing templates to not have data_stream snippet to null that match with existing data streams.

@dakrone
Copy link
Member

dakrone commented Feb 16, 2022

I've made the change to also allow data stream migration if index mode is standard.

Thanks! I'll take another look.

Do you think we should prevent users from changing the index_mode in the template from time_series to something else?

I think that for now it would be good to prevent this. It will be much easier to change it later to allow switching away from time_series if we want to, but much harder to prevent it if we allow it at the beginning. (Always easier to go from strict -> lenient than from lenient -> strict).

Copy link
Member

@dakrone dakrone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding all the tests!

… else if a tsdb data stream still matches with that template
…omething else if a tsdb data stream still matches with that template"

This reverts commit 62cc71d.
@martijnvg
Copy link
Member Author

I think that for now it would be good to prevent this. It will be much easier to change it later to allow switching away from time_series if we want to, but much harder to prevent it if we allow it at the beginning. (Always easier to go from strict -> lenient than from lenient -> strict).

👍 I will address this in a followup PR.

@martijnvg martijnvg merged commit ae7defa into elastic:master Feb 17, 2022
weizijun added a commit to weizijun/elasticsearch that referenced this pull request Feb 18, 2022
* upstream/master: (167 commits)
  Mute FrozenSearchableSnapshotsIntegTests#testCreateAndRestorePartialSearchableSnapshot
  Mute LdapSessionFactoryTests#testSslTrustIsReloaded
  Fix spotless violation from last commit
  Mute GeoGridTilerTestCase#testGeoGridSetValuesBoundingBoxes_UnboundedGeoShapeCellValues
  Small formatting clean up (elastic#84144)
  Always re-run Feature migrations which have encountered errors (elastic#83918)
  [DOCS] Clarify `orientation` usage for WKT and GeoJSON polygons (elastic#84025)
  Group field caps response by index mapping hash (elastic#83494)
  Shrink join queries in slow log (elastic#83914)
  TSDB: Reject the nested object fields that are configured time_series_dimension (elastic#83920)
  [DOCS] Remove note about partial response from Bulk API docs (elastic#84053)
  Allow regular data streams to be migrated to tsdb data streams. (elastic#83843)
  [DOCS] Fix `ignore_unavailable` parameter definition (elastic#84071)
  Make Metadata extend AbstractCollection (elastic#83791)
  Add API specs for OpenID Connect APIs
  Revert "Clean up for superuser role name references (elastic#83627)" (elastic#84096)
  Update Lucene analysis base url (elastic#84094)
  Avoid null threadContext in ResultDeduplicator (elastic#84093)
  Use static empty store files metadata (elastic#84034)
  Preserve context in snapshotDeletionListeners (elastic#84089)
  ...

# Conflicts:
#	x-pack/plugin/rollup/build.gradle
probakowski pushed a commit to probakowski/elasticsearch that referenced this pull request Feb 23, 2022
…tic#83843)

A regular data stream can be migrated to a tsdb data stream if in template that created the data stream, the `index_mode` field is set to `time_series` and the data stream's `index_mode` property is either not specified or set to `standard`. Then on the next rollover the data stream is migrated to be a tsdb data stream.

When that happens the data stream's `index_mode` property is set to `time_series` and the new backing index's `index.mode` index setting is also set to `time_series`.

Closes elastic#83520
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Data streams Data streams and their lifecycles >non-issue Team:Data Management Meta label for data/management team v8.2.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support migrating regular data streams to tsdb data streams
5 participants