-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DISCUSS] API for selecting data sources, index aliases, and indices #64858
Comments
Pinging @elastic/kibana-app-arch (Team:AppArch) |
i agree something like this might be desired. If we would implement this on kibana side we should use async request and then stream the data back as we get it, to not slow down due to cross cluster request. if index pattern has no document, it can still have the mapping defined which should allow us to populate the fields right ? so only in case where there is no mapping defined we could show an error. |
This proposal sounds great. I think this should be implemented in ES so that it can be updated with other "data sources" that get added in the future, as well as made to support all nuances of the ES search syntax (e.g. I'd also suggest we expand the API to allow you to check if an index pattern matches a specific name or names. This will be useful when trying to figure out if an existing index or data stream is captured by an index pattern. For example:
|
@mattkime Thanks for raising this issue. I have a few questions for you.
Would you see data streams following a similar pattern? In my mind, I am thinking that we would want to abstract the idea of Indices behind the data stream. @martijnvg @danhermann Not sure if you agree or not. At least I can't immediately think of a use case where someone would want to include only a subset of the indices behind a data stream in the index pattern, which doesn't mean there aren't any. @mattkime I just want to make sure I understand the workflow here for a dedicated API call.
I think this makes sense to have its own call. I think it's something that we could use in the ES UI, and the Kibana UI. @cjcenizal what do you think? Is this something we could use in the SLM UI? For example I #65132 needs to be aware of data streams, so we could either add another api call for the snapshot policy creation to be data streams aware or we could use this same API call. There are a few differences though, as the SLM UI doesn't allow for wildcard searches, and includes indices with no data. |
Yes, thats exactly what we want - hidden indices will not be seen (at least by default) with this api.
We would default to hiding hidden indices of any type. We could allow querying them via an optional query param but I have the same understanding of the need for this feature that you do.
I'm not 100% certain I understand the question but I'll take a swing anyway. I don't see a need to return a list of indices but I'm happy to entertain the idea if someone sees a need. Seems like getting that list could be a different API call.
I don't, my starting point is querying data sources and their fields should be separate apis. _fields_for_wildcard wraps the FieldCaps API which will support data streams. IMO the SLM use case looks very similar and we should design an api that is suitable for both. We could provide a way to specify what type of data sources you'd like to query but it would be trivial to ignore unwanted results. |
I had a quick meeting with Matt to cover this. Here is the summary. In the Index Patterns UI today when you search for a top-level item like an Alias then the UI will show you the Indices that match that top-level item. So for example, if I have an alias called log which has test-001 and test-002 indices behind it. Then if I search for lo* then I would see test-001 and test-002 in my results for indices that the pattern can match. With the changes to make this UI work with data streams, we are proposing to add a new API _data_source API that would return the 3 objects, indices, aliases, and data streams. The UI would then show the top-level items instead of the indices behind those top-level items. So using my previous example if you searched for lo* you would see the alias log as an item that would match the pattern, and you wouldn't see test-001 or test-002 unless you changed the pattern to te*. @mattkime let me know if I missed anything. |
Two comments -- Given that aliases are commonly named similarly to the indices to which they point (e.g., Second, we may want to iterate on the proposed name of |
To address @danhermann's points, I propose two changes:
|
@cjcenizal I think that makes sense to include the indices in which category they fall under. So an index that is part of an alias only shows up in the alias section with its alias and the same with data streams. So +1 for that. For # 2 I also find _data_sources to be a little miss leading. I picture data sources as things like Nginx_logs or filebeat. @cjcenizal I like your idea of explaining a bit more about what it does. I don't like capture as that feels like it's taking some action on the indices. Index pattern is close. I think it aligns with the nomenclature that Kibana uses. Aliases and data streams are a way to group indices, so I think that does fit. However, I don't know if it will collide or be confusing given what index patterns mean in Kibana. For me just using that I might want to do something like |
fwiw I have no opinion on whether Supporting pagination in the api would be nice. |
I read @cjcenizal's suggestion as including enough metadata about each index, alias, and data stream (we call them collectively "index abstractions" on the ES side which does not necessarily mean that's the best name for the endpoint) to eliminate any ambiguities about how a particular index abstraction was matched with a given wildcard. I would prefer to avoid trying to determine which section an index should go in based on how it was resolved. I would amend the original API proposal so that each alias includes all its referenced indices, each index includes all its aliases and parent data stream (if any), and each data stream includes all its backing indices. It would look something like this: Request - Result -
|
@danhermann That looks great, a couple of details / requests I'd like to nail down -
I need to do some research on frozen indices to determine how they might relate to this api. |
I'd like to defer that, if possible. It adds a lot of complexity and we already have a pretty aggressive schedule on the ES side to deliver data streams.
Could you give me an example of what you mean by this?
I'll check on that.
Yes, that's doable. |
Frozen index support - could frozen indices be returned marked Data streams - can we get the |
These two can be added pretty easily. |
Adding this to the new api would make it much more complex. All the other attributes can be retrieved from the cluster state that exists on all nodes in memory, checking whether an index is empty not. I chatted @mattkime and at least for now we will leave this out. This can be added in a later iteration.
Yes, we can make this new api cross cluster aware (similar to _field_caps api). |
@martijnvg Thanks for the comments. @mattkime How much trouble is it going to cause for the user if we can't send back if the data stream is empty or not? I know we can't really bank on it, but at least at first I think the main way data streams will be created will be to send data to an index pattern that matches an index template. So I would think in almost all cases there would be data in a data stream. it is possible to create one via the API, but it's not exposed in the UI anywhere and I don't anticipate users doing that really. |
At most it will mean an additional API call and a message to the user stating that the index pattern cant be created since none of the matching indices have a document. That said, we're currently examining whether we can create kibana index pattern objects without having a document. It certainly would be helpful to beats and ingest teams. |
@martijnvg I'm curious if you can provide a very rough guess as to the performance of this call vs the aggregation query we currently use -
|
@mattkime, as spec'ed here, this API will definitely be faster because it returns data only from cluster metadata which can be retrieved from any node in the cluster (excluding cross-cluster info, of course). It will not have to query every shard as the existing one does. |
@mattkime or @danhermann Are we at a place now where we can create an issue in the ES repo to track this? |
@danhermann Any ideas on how we might come up with a name for the set - data streams, indices, and index aliases? |
I've given it a little thought, but have not come up with any great ideas, yet. It's temporarily named |
Sounds good - can you kick that off? I'm happy to help but this seems more like an ES thing than Kibana thing. |
I think this kind of endpoint will prove super valuable to ES UI and am super excited to see how it is coming along! Is there an open issue for this ES side? Or somewhere we can contribute? |
I planning to have a draft PR open for it within a day. I'm just working on its cross-cluster capabilities now. The request and response look like I proposed in this comment above. |
Ok, thanks for pointing that out @danhermann 👍 - glad to know that version is up-to-date. |
The draft PR is up. Resolving names against a local cluster works, but there's a bug with remote clusters that needs to be fixed. The request and response format and all parameters and options should be stable. |
This API has been completed on the ES side: elastic/elasticsearch#57626 |
In support of Data streams - elastic/elasticsearch#53100
tldr; We need a way to select data streams, index aliases, and indices in such a way that we show the user which entities their wildcard matches.
tl;
Initial display of indices -
Display of matched indices -
Currently we only display indices with at least one document. You can match an index alias but we don't indicate you match it, we just show the indices it references. Finding a document is important as we store a list of fields with the index pattern saved object. We could display an error if a wildcard matches an index without documents.
We may want to add metadata to the entities returned but currently have no defined needs. Let's make sure its easy to add in the future.
This needs to be cross cluster aware. Currently we make two requests when listing indices in the index pattern creation ui -
*
and*:*
. We do this because the cross cluster request is more likely to be slow or fail so its nice to independently error on the cross cluster request.API proposal:
Request -
GET _data_source/{wildcard}
Result -
I'm unsure if this should be implemented in elasticsearch or kibana. You could duplicate the result with
GET *
(for indices),GET /_alias
andGET /_data_streams/
although the individual APIs might be doing more work than necessary. Speed should be taken into consideration as it affects flexibility of usage. We would prefer that index patterns are quick and easy to create as to facilitate data exploration, unlike now where its treated as a kibana configuration step. Its notable thatGET *
frequently returns large payloads detailing fields and their capabilities.The text was updated successfully, but these errors were encountered: