-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove support for types? #15613
Comments
Seems like removing support for types would be blocked on #11432. It'd be lovely if this accelerated parent-child relations across indexes though, that'd get rid of a lot of the aforementioned sparsity. |
What about divorcing mappings from types? Make mappings an index level feature and types just kind of like part of the id? Would that make them light enough to not get in the way? |
@zygfryd indeed @nik9000 That is an option too. At least it would make clear that there is a single mapping per index and we could stimplify the internals. With this option, I guess types would remain as first-class filters only (eg. we could do index sorting on _type so that filtering on them would be faster)? |
I'd be fine with that. |
I think divorcing mappings from types is a good idea. In my opinion, removing types is too radical. In our use case (logs management) we have 80+ types of logs and we use a (daily, weekly or monthly) index per project/tenant to be able to handle the load.
If I read you correctly, we should instead use one index per log type ? Could you please explain a little bit more your proposal with this use case and with types removed ?
👍 |
Yes. Types are trappy: at first sight they look like an efficient way to have multiple tenants in a single index, but in practice this usually makes things worse than having multiple indices due to the fact that Lucene likes dense data better than sparse data, especially for norms and doc values. If some tenants have lower indexing rates, they would get fewer shards and/or longer time frames (eg. weekly indices instead of daily). |
I first thought divorcing types from mappings would be a good compromise, but types have another issue that they force us to fold the type into the uid, which typically either makes the _uid less efficient (slower indexing and slower gets) if we prepend the type (like today) or more space-intensive if we append the type. So I think we should think about getting rid of types entirely. For instance, maybe we could consider enforcing a single type per index in version X, with APIs still working with either |
I think we should deprecate type in 5.0 and start moving towards index level mappings, uuid per index not per type etc. If somebody really needs the type in the UUID they can still do that I guess. Types can be build on top of es without native support, there is nothing today that prevents you from doing this. It rather complicates things on all end internally without real benefit to the outsite except of the first level API support that someone might find useful but is only syntactic sugar with a potential high price to pay. I am also +1 to remove this in 6.0 entirely and guide folks how to do it correctly. |
The main concern I have for now is the support of the parent/child feature. Removing types will only allow to do parent/child using the same "kind" of document. Not super terrible as at the end of the day this is what is happening behind the scene. So if we had:
We will basically have to rewrite this as for example:
It means that parent/child will be able to do self referencing as proposed here in #11432. BTW, may be we should already start to educate people to use only one type per index and use data structures similar to what I proposed in my example? |
What about the following plan:
If we are not ready to drop parent/child right now, one trade-off I could consent would be to have a setting that allows indices to have multiple types so that parent/child can be used, but these indices could not be upgraded to 6.0. For the record, we have some evidence that removing types could help indexing speed quite significantly since we would not have to fold the type name into the uid: #18154 (comment) Thoughts? |
@jpountz I think we should do this, but it seems your proposal has gone unnoticed given the lack of reaction (positive or negative). Can we get some other thoughts on this? |
@rjernst i'm staring at it as you type :) |
We discussed this in Fix it Friday. Where we want to get to:We want to remove the concept of types from Elasticsearch, while still supporting parent/child.
It's very important to me that we don't leave users behind - we need to give them a smooth upgrade and transition path. Proposed path:In 5.0:
In 5.x:
In 6.0:
In 6.7:
In 7.0:
In 8.0
In 6.0, all existing types from 5.x indices will have identical mappings. We will still have indices with old parent/child implementation. If we can migrate existing parent/child settings to the new settings, then we could move the "return fields at top level" issue into 6.0. Alternatively, we could return fields at the top level in 6.0 regardless, and still show types (for old indices with types enabled, or with old parent-child) as a separate section in GET mapping. UPDATED TO REFLECT CHANGES IN #15613 (comment) UPDATED TO REFLECT CHANGES IN #15613 (comment) UPDATED TO REFLECT CHANGES IN #15613 (comment) UPDATED TO REFLECT CHANGES DISCUSSED IN #35190 |
Currently both `PUT` and `POST` can be used to create indices. This commit removes support for `POST index_name` so that we can use it to index documents with auto-generated ids once types are removed. Relates elastic#15613
I think the only other endpoint we need to check is |
Perhaps the existing form should be deprecated in favour of I think that's the lot |
Having started to work on it, this is more challenging than I initially thought. However I think this might not be needed: since we will require at most one type in 6.0+ anyway, we will not have to merge mappings across types in 7.0, so this step is not required for the type removal? |
…/{type}`. elastic#20055 This will help remove types as we will need `{index}/{id}` to tell whether a document exists. Relates elastic#15613
@clintongormley I like the purposed plan! If we can find a way to support the new parent/child format in 5.x with the enabled set to false on _type, it would mean simpler migration down the road. We could start to push for setting this setting for new users. |
We just discussed how we want to update REST tests with this change. The issue is that index creation, index, update, put mapping and some other APIs are going to complain with version 7.x if the
|
This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_legacy.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates elastic#15613
A number of APIs (index creation, put mapping, index, etc.) will soon trigger deprecation warnings unless users opt in for typeless APIs by passing `include_type_name=false`. Relates elastic#15613
I updated the plan to add a new item to the 7.0 tasks list: "remove references to types from the high-level rest client API". |
I'm starting with delete since it is a bit simpler than other APIs, but we should eventually do the same with all other APIs that are being replaced with a typeless version (get, put_mapping, search, etc.). Relates elastic#15613
After seeing #33953 @jtibshirani raised the question of whether we want to do something so that users don't have to pass It's true that for someone who would quickly resolve deprecation warnings and use the new typeless APIs, having to keep passing I don't think we should default I have been considering adding a node setting, eg. Opinions / other suggestions? /cc @clintongormley @rjernst |
I like this idea |
This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates #15613
@jpountz to clarify, when you refer to Would it be possible to default the setting to |
This commit duplicates REST tests for the - `indices.create` - `indices.put_mapping` - `indices.get_mapping` - `index` - `get` - `delete` - `update` - `bulk` APIs, so that we both test them when used without types (include_type_name=false) and with types, mostly for mixed-version cluster tests. Given a suite called `X_test_name.yml`, I first copied it to `(X+1)_test_name_with_types.yml` and then changed `X_test_name.yml` to set `include_type_name=false` on every API that supports it. Relates #15613
An update: we met offline to talk through a revised plan for 7.0, which is now documented in #35190. The core of the plan is set, but there are still some open questions to sort out. |
This adds an `include_type_name` option to the `indices.create`, `indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`. When set to `false`, then mappings will be returned directly in the body of the `indices.get_mapping` API, without keying them by the type name, the `indices.create` will expect mappings directly under the `mappings` key, and the `indices.put_mapping` will use `_doc` as a type name and fail if a `type` is provided explicitly. On 5.x indices, get-mapping will fail if the index has multiple mappings, and put-mapping will update or introduce mappings for the `_doc` type instead of updating existing mappings. This oddity is required so that we don't have to introduce a new flag to put-mapping requests to know whether they are actually updating the `_doc` type or performing a typeless call. Relates elastic#15613
This adds an `include_type_name` option to the `indices.create`, `indices.get_mapping` and `indices.put_mapping` APIs, which defaults to `true`. When set to `false`, then mappings will be returned directly in the body of the `indices.get_mapping` API, without keying them by the type name, the `indices.create` will expect mappings directly under the `mappings` key, and the `indices.put_mapping` will use `_doc` as a type name and fail if a `type` is provided explicitly. On 5.x indices, get-mapping will fail if the index has multiple mappings, and put-mapping will update or introduce mappings for the `_doc` type instead of updating existing mappings. This oddity is required so that we don't have to introduce a new flag to put-mapping requests to know whether they are actually updating the `_doc` type or performing a typeless call. Relates #15613
@jpountz Can we close this issue now that types have been deprecated? |
We're tracking the remaining work for the types removal in another meta issue now so I am closing this one. Long live the types... |
@jpountz @jimczi I am confused on this issue, would you please help to explain it for me? If i have one index with 3 types, like:
then es(before version 5.x) will has 5 fields for every document in the index, which will lead to a sparse index. i'm thinking about a solution by make a transformation when create mapping like:
so the index will only has 3 fields:
when query the index, just do the same transformation to convert the field in query to real field stored in index. Can this solve the problem of sparse index? So that we can keep multiple type to facilitate data modeling. |
@penfree This would work, but wouldn't it be even easier to have 2 indices? |
multiple doctype make it much more easier to design data model. An example:
we should make a query on diagnosis or lab report to search clinic visit or patient, it is difficult to do it in multiple indices. Not to mention that we create an index per day and place them in an alias。 |
The ability to have several types on the same index is causing problems:
Migrating existing users is certainly going to be complicated but this would also make the system more honest to new users about the fact that we can't do index-level multi-tenancy efficiently. Also I suspect that the restrictions that we added in 2.0 (that eg. two fields that have the same name in different types) already made lots of users migrate to a single index per data type instead of folding them into different types of the same index.
See also https://www.elastic.co/blog/index-vs-type.
The text was updated successfully, but these errors were encountered: