-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Logs UI] [R&D] Determine scope of work for migrating to async search in the Logs UI stream view #76677
Comments
Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui) |
QueriesThis is an overview of the queries in use on the log stream page (this isn't exhaustive as queries will differ slightly based on the source configuration). Please note that where properties are concerned with listing many fields I've cut these down to a small sample for brevity. `getLogEntries` (Without highlights)
`getLogEntries` (With highlights)
`getContainedLogSummaryBuckets` (Without highlights)
`getContainedLogSummaryBuckets` (With highlights)
`getLogItem`
To move over to async search we will need to be able to fulfill all of these queries. Terminology / acronyms
Q&A❓ Does the Kibana data plugin provide the functions we need to make the ES queries we are currently making? 📝 Yes, as far as I can tell we can make the queries listed above using the data plugin, facilitated by the ❓ How can we split this work up into multiple steps to keep the scope small? 📝 There's a few ways we could split this up, I believe the most optimal would be:
❓ What are the benefits/costs of using a server-side "custom search strategy" vs not? 📝 I don't think are any costs per se to using a CSS, other than maintenance on our part. A few benefits are:
In my opinion we should just skip straight to using the async search strategy, the sync search strategy is using the async strategy under the hood anyway. This doesn't mean we need to immediately buy into things like partial results, these can be incremental improvements. We can as a first step (via our CSS) utilise async search, but just wait for all of the results (if we need to). If some features are easier to adapt immediately to fully async (with partial results etc) we can also very easily have two methods on our CSS (or an option / parameter), one would return partial results, one could just be "async as sync" and gather them all before sending them back. Either way the client side hook should be preemptively built with a fully async nature in mind, e.g. using observables, which can incrementally emit new data etc. From my perspective the hardest part of supporting partial results will be the UI / UX, as opposed to the direct technical work. I've added a dedicated section on how this CSS could work below. ❓ Do we foresee problems with the existing Log Stream (auto refresh, scrolling refreshes, highlights, etc) that will cause problems when using the data plugin? 📝 I don't see any problems arising from using the data plugin here, realistically this is just the mechanism for how we get given data. However, how we use the data isn't necessarily easy. Partial results and cancellation raise UI / UX questions, and I think we should get some design input there, especially for cancellation (more below). We could very easily (famous last words) keep the loading spinner functionality that we have now, but immediately render the partial results we receive. Loading spinner stays in place until we have received everything, but users enjoy immediate rendering of incoming results. This might create janky behaviour with the scroll area, I'm not 100% sure. ❓ Do we plan to eventually make use of "partial query results" anywhere in the Logs UI? 📝 I think this is a tentative "yes". The UI / UX might need some input. But it would be easy for us to render items to the stream as we receive them. We should definitely make our CSS and hook partial results aware, even if what that means at first is we wait until we have everything before rendering. Where partial results seem to shine (right now) is for huge queries, queries that take minutes to execute, and it's imperative to start showing something to the user whilst the query carries on running in the background. The stream page isn't really built around this paradigm. However, I still think showing things quickly is important. It's important to note that right now this option means we'd not really get anything out of partial results. They'd be reported every 64 shards. We could, of course, always change this upstream ourselves (as a configurable option). ❓ Do we plan to eventually make use of long-running async queries (notifying a user anywhere in the app when a query has finished) anywhere in the Logs UI? 📝 I can't think of an immediate use case for this. However, if we implement our CSS and hook to be partial results and cancellation aware (which we should be anyway) then this doesn't become hard to add later. Maybe there becomes a use case for years worth of stream viewing? ❓ Do we plan to make use of async query cancellation in the Logs UI, and how? What should the UX be for this, roughly? 📝 We should definitely add first class cancellation support to our CSS and hook. However, how this immediately manifests in the UI is a different question. In the stream the scroll to top / bottom scenario again becomes interesting, if a user scrolls to an extreme, and thus triggers a request, we could allow this to be cancelled, however this request will probably just trigger again after cancellation due to the scroll boundary. For the UX, it probably makes sense to allow cancellation wherever we currently show loading states. So it would become a loading state with an option to cancel. Some sort of "banner" could also be used, but it becomes tricky when the page is dispatching multiple requests. Would a banner approach always cancel all requests in progress? On the stream page, this approach would mean hitting cancel and effectively cancelling queries for entries, minimap buckets, and highlights. I think we need to carefully think about what cancellation means. E.g. I change the time range, and a query is dispatched, I then choose to cancel, does the time range then revert back? Regardless of whether we need to surface this in the UI, we need cancellation support regardless. If props change and we need to dispatch a new request, we need to be able to cancel the old one. ❓ Are there any other places in the Logs UI that we make ES queries that we should consider migrating to async search? (Assuming we will not worry about things like ML and Alerting APIs since those will need to be migrated by their associated teams.) 📝 As we'd be ignoring alerting and ML I don't think are many more locations. We could possibly use async search for the log entry examples shown on the anomalies / categories pages, and for chart preview data on the alerting flyout. Custom search strategyOur CSS approach could work a few different ways. We could have a CSS for each query type we want supported through the data plugin (a Instead a better approach seems to be creating one CSS (the Infra CSS if you will), and that singular CSS would know how to deal with all of our various query types. This mechanism (the dealing with different query types) would be completely up to us. However, the security solution has already implemented something similar in a factory fashion and I really like the approach they've taken. Ours wouldn't look exactly like this, but it'd be similar. Security solution PR: #75439 They have two CSS, which handle all of their various query types across the entire app. The provider lives here, providing a If we take a look at their
And On the client side we can see they're taking a "wait for everything" approach, by waiting until the response is not partial and is not running. If they hit a partial and not running state, they error out (as opposed to just showing the partial results so far). I found this all very easy to follow as someone who has never seen the Security Solution before. I think our ideal solution will look somewhat similar to this. It means we ultimately register one CSS, and can handle all query types in there. If another app wants to make use of this all they need to do is take the types we set up for the query type, and call our CSS. We also have things (like the source configuration) that many query types are all going to want to access, we can ensure these are passed in at the top level, so that all factories can access these. Real (but rough) exampleSome rough code exists here. This isn't perfect, it's just to show the ideas mentioned here. Useful resources
Notes
|
That's a great analysis, thank you. 👍 I concur with most of your conclusions. I'd like to add some arguments for the "multiple search strategies" architecture over the "multiplexing with a single search strategy" approach, if I may.
About complexity: Regarding the complexity, I'd question whether the request/response types from the security solution or your example would scale well for our number of request types. I imagine we'd up with quite deeply nested conditional types if a binary branch already looks like this: type StrategyRequestType<T extends FactoryQueryTypes> = T extends LogEntriesQueries.entries
? LogEntriesEntriesReqestOptions
: T extends LogEntriesQueries.item
? LogEntriesItemReqestOptions
: never;
export type StrategyResponseType<T extends FactoryQueryTypes> = T extends LogEntriesQueries.entries
? LogEntriesEntriesStrategyResponse
: T extends LogEntriesQueries.item
? LogEntriesItemStrategyResponse
: never;
export const infraSearchStrategyProvider = <T extends FactoryQueryTypes>(
data: PluginStart
): ISearchStrategy<StrategyRequestType<T>, StrategyResponseType<T>> => {
// ...
} Not that I'm against conditional types in general, but these seem needlessly complex compared to straight-forward, separate search strategy providers. About aesthetics: On the aesthetic side, I would argue that the data plugin's routes already perform a multiplexing to various search strategies. Nesting another multiplexing operation within that causes the code path to contain deeper, more nested branches which usually doesn't benefit clarity and maintainability. Instead I suspect it furthers the monolithic aspects of our plugin, which could reduce the flexibility for further refactorings. About suitability: More importantly, I wonder if the multiplexing approach is actually suited for the internal workings of our to-be-written search strategies. The two-stage "create dsl" and "parse response" structure might be appealingly clear, but some of our routes actually perform more complex operations that can't be easily mapped to that. The log entries query, for example, actually consists of two queries with particular filters, Let me know if these arguments make sense. There might be upsides for grouping some queries together if they are always requested together and require identical parameters. Overall, though, I don't really see the downside of separate search strategies. Could you elaborate where you see the complexity in that approach? |
Thanks, @weltenwort -- I think I'm okay with either approach but the idea of simplifying those conditional types is pretty appealing. I'll leave this decision up to you and @Kerry350 :D Thanks to both of you for the detailed write-ups! |
@weltenwort Thanks for the feedback 👌
100%, I can't argue against this. The complex types were the big downside to this approach.
Yes, that's a fair comment. Although I guess there's the counter argument we shouldn't necessarily consider how other plugins implement their functionality under the hood. Only the exposed interface.
The point here that we have much more complex routes that perform multiple queries is certainly true. Specifically I want to zone in on this bit though:
I believe we'll need to handle this regardless? Granted a flat I just want to confirm we're on the same page there with the having multiple custom search strategies, that we'd still need to handle these multiple queries and their relationship within
Complexity is probably the wrong word, I don't think it introduces complexity, but I just didn't see the upside to necessarily registering many search strategies. Quick aside question: If we use multiple custom search strategies would we namespace these with I do want to point out though that I'm not super passionate in either direction, I went with this for the super rough POC as it had parity with another plugin. But custom search strategies will work just as well. The implementation only shifts subtly from building one of these factory query type handlers, to instead adding a separate custom search strategy with a |
Absolutely, the bodies of the
A predictable naming scheme is always a good idea. Would a
Agreed, the implementation difference is minor. I didn't want to diminish your work. You answered all the research questions thoroughly, so I figured it's time to start discussing the color of the shed. 🚲 😉 |
Perfect, then we're on the same page 👌
It would probably be sufficient. I'm just thinking if some other solution does something with logs they might also use
Haha, love that 😂 Great, well I'm happy for us to adjust to multiple custom search strategies. @jasonrhodes has said he is. So as long as @afgomez is happy I think we're all aligned. I can tweak the two implementation tickets that exist at some point. |
I like the idea of some kind of prefix distinction, whether it's I'm liking this shed paint though. Well done! :D |
💭 AFAIK the search strategy name is used in a url, so a |
R&D work is done |
We plan to migrate the Logs UI stream view to use the data plugin for ES queries and to make use of ES async search. We need to understand more about what this work is going to entail.
Questions we should answer in this R&D:
Timebox: 1 week
The text was updated successfully, but these errors were encountered: