-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Canvas REST calls via batching #29847
Optimize Canvas REST calls via batching #29847
Conversation
behaves consistently across environments
💚 Build Succeeded |
Pinging @elastic/kibana-canvas |
💚 Build Succeeded |
💔 Build Failed |
retest |
💚 Build Succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic work Chris!!!
To lock in performance, it'd be good to precede this PR with a standalone test-only PR that attempts to measure performance on some expression chains, eg. one straightforward chain; one chain with back & forth communication. Best to instantiate a good number of them eg. a few dozen of each, to simulate a workbook that has several dozens of elements accessing data (it adds up quickly, 5 data driven elements on 5 pages is already 25 expressions). This way the effects of batching can be visible.
The reason for suggesting that it be a preceding PR rather than a commit in this PR is that it can then be used to measure the earlier non-batching version (and backported even for the WS-based implementation, a useful watermark for further optimizations, though my understanding is that we're moving to HTTP period, whether it makes things slower or not).
Good idea, @monfera. Do you know of any perf tests in our current test suite? I'm not aware of any, so this may be a first. The challenge with automated perf tests is that their speed varies across environments, so it's hard to know how to write them robustly. I'm open to suggestions / ideas here, though. I'm not sure if it should be a blocker on this. I think we're moving forward with HTTP regardless, simply because of the HA Kibana cluster scenario, and the fact that WS support is not high among our enterprise environments. |
How about instead of blocking on automated tests (agree, nothing like this exists currently), can you run some manual tests then post the results and how you did the test? We can then have someone repeat them on their local environment just to verify that the results seem to be accurate. Then we can at least generally say, with batching it improves performance by X given a Y expression. Maybe there are tools we can use too for this kind of thing. @jimgoodwin mentioned one of them but I forget the name of it. |
It'd be interesting to figure out what we even measure (repeating that it should be outside this PR). Example: when loading the eCommerce workbook with 5x the number of pages (ie. 10 pages total) the randomized vertical line chart updates within two seconds of a refresh (it has @w33ble spearheads another attack vector aiming to prioritize completion of the current page (now the requests for the current page are issued first, and as seen from the 2s vs 10s measurement, it already does a good job, but if there's ping-pong between client and server, then the subsequent requests will be filled randomly so there is room for manually testing the best possible case, which is, not even loading the other pages). To me, the performance requirements for GA aren't clear, if they exist at all (the decision I heard was for unconditional removal of WS), even with WebSockets we have users who break up a midsize workpad to a few one-pagers. See also #30095 and its Description links. |
P.S. I just used the Network time reporting in Chrome Dev Tools, seemed to be accurate for the purpose, as we're not measuring page junk or other 10ms level details, instead it's in the order of many seconds |
We could probably through in some test expression functions that are sleeps and simply execute on the client vs the server to fake the time it would take to execute something in es, and then maybe return some amount of data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this nice optimization as it's functioning well, the code is clean and compact for what it does, and performance is almost exactly the same as the original refresh-to-current-page before deactivating WS. In addition we don't have a perf target, the WS switch is binary and this PR improves perf nicely over the baseline.
Optimize the expression interpreter by introducing batching of remote function calls.
Optimize the expression interpreter by introducing batching of remote function calls.
Optimize the expression interpreter by introducing batching of remote function calls.
Optimize the expression interpreter by introducing batching of remote function calls.
And adds batching to Canvas' remote function calls.
This builds on this PR: #29792 so reviewers, you may want to hold off on reviewing this until 29792 is merged.