-
Notifications
You must be signed in to change notification settings - Fork 228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tracking bug for agent performance impact on Kibana #2028
Comments
Yesterday my thoughts on a plan for this issue were:
Timebox to another 2 weeks max, then time to move on. Future work:
|
That flamegraph is here: https://gist.github.com/trentm/c70154cc6674a000f6b4c0827e3f0183
|
a look at
|
Some updates on my perf look at async_hooks usage in the agent:
|
Refactor agent.captureError(...) and stacktrace collection for better performance. This moves the stackman dep into lib/stacktraces.js and lib/errors.js to allow breaking its compatibility for perf improvements. Refs: #2028
Refactor agent.captureError(...) and stacktrace collection for better performance. This moves the stackman dep into lib/stacktraces.js and lib/errors.js to allow breaking its compatibility for perf improvements. Refs: #2028
Refactor agent.captureError(...) and stacktrace collection for better performance. This moves the stackman dep into lib/stacktraces.js and lib/errors.js to allow breaking its compatibility for perf improvements. Perf issues: - `{error,span}.stacktrace` logic was largely in the "stackman" module. It had a few perf issues: 1. It monkey-patched (or wrapped?) v8 native CallSite objects with a bunch of properties for convenience/sugar. Using these in stacktrace generation was very slow (as identified by flame graphs). 2. It had a bug in its caching of sourcemap file loading for each stackframe: when a file had no separate sourcemap file (the common case), this wasn't cached. That means we would be stat'ing for a sourcemap file for almost every frame. 3. A stack frame's `filename` is a relative path to cwd. The code ended up calling `process.cwd()` for *every* frame, which is expensive. Dealing with the first issue can't be done without either completely breaking "stackman" backwards compat, or just no longer using "stackman" and vendoring some of its code. I've done the latter. - This adds caching to the conversion of a CallSite object to an APM error stacktrace frame (frameFromCallSite), which had a significant impact on CPU usage in the perf test mentioned above (it was about 6-7% of the diff). Other issues: - Fix sourcemap "sourcesContent" usage. Test case: test/stacktraces/stacktraces.test.js Refs: #2028 Fixes: #2095
This issue was intended to track a timeboxed perf effort on the Node.js APM agent -- mainly motivated by usage of the Node.js agent by the Kibana team. I'm closing this issue with a summary of what improvements we've made lately and some thoughts for future perf work. Improvements:
A before vs. after run of a simulated workload (80% transaction, 17% transaction + some spans, 3% errors) of an Express.js app loaded to 500 req/s show a significant reduction in CPU usage: and latency percentiles:
Future work
|
Refactor agent.captureError(...) and stacktrace collection for better performance. This moves the stackman dep into lib/stacktraces.js and lib/errors.js to allow breaking its compatibility for perf improvements. Perf issues: - `{error,span}.stacktrace` logic was largely in the "stackman" module. It had a few perf issues: 1. It monkey-patched (or wrapped?) v8 native CallSite objects with a bunch of properties for convenience/sugar. Using these in stacktrace generation was very slow (as identified by flame graphs). 2. It had a bug in its caching of sourcemap file loading for each stackframe: when a file had no separate sourcemap file (the common case), this wasn't cached. That means we would be stat'ing for a sourcemap file for almost every frame. 3. A stack frame's `filename` is a relative path to cwd. The code ended up calling `process.cwd()` for *every* frame, which is expensive. Dealing with the first issue can't be done without either completely breaking "stackman" backwards compat, or just no longer using "stackman" and vendoring some of its code. I've done the latter. - This adds caching to the conversion of a CallSite object to an APM error stacktrace frame (frameFromCallSite), which had a significant impact on CPU usage in the perf test mentioned above (it was about 6-7% of the diff). Other issues: - Fix sourcemap "sourcesContent" usage. Test case: test/stacktraces/stacktraces.test.js Refs: elastic#2028 Fixes: elastic#2095
This is a tracking bug for a timeboxed effort to look at improving the Node.js APM agent's perf impact in Kibana.
See elastic/kibana#78792 for most details. This ticket will help for tracking agent-side work.
The text was updated successfully, but these errors were encountered: