Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

require("elastic-apm-node") influences stacktrace generated with "node --enable-source-maps ..." #2833

Closed
3 tasks
dtoch opened this issue Jul 18, 2022 · 6 comments · Fixed by #2906
Closed
3 tasks
Assignees
Labels
8.5-candidate agent-nodejs Make available for APM Agents project planning. bug community
Milestone

Comments

@dtoch
Copy link

dtoch commented Jul 18, 2022

Describe the bug

To Reproduce

A. package.json

{
  "name": "testbug-stacktrace",
  "version": "0.0.0",
  "private": false,
  "dependencies": {
    "elastic-apm-node": "3.36.0"
  },
  "devDependencies": {
    "@types/node": "18.0.6",
    "typescript": "4.7.4"
  }
}

B. tsconfig.json

{
  "include": ["src/**/*.ts"],
  "compilerOptions": {
    "allowJs": true,
    "checkJs": false,
    "declaration": true,
    "emitDeclarationOnly": false,
    "sourceMap": true,
    // Generate JS compatible with Node >= 12 - see https://github.com/microsoft/TypeScript/wiki/Node-Target-Mapping
    "lib": ["ES2019"],
    "module": "commonjs",
    "target": "ES2019"
  }
}

C. src/main.js

const apm = require('elastic-apm-node');

const mod1 = require("./mod1");
mod1();

const { test } = require("./mod2");
test();

D. src/mod1.ts (CJS module)

module.exports = function test() {
  try {
    throw Error();
  } catch (err) {
    console.error(err);
  }
}

E. src/mod2.ts (ES6 module)

export async function test() {
  try {
    throw Error();
  } catch (err) {
    console.error(err);
  }
}

Then using node 16 :

npm install
node_modules\.bin\tsc
node --enable-source-maps src\main.js

Running above script results in :

Error
    at test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod1.js:3:15)
    at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:4:1)
    at Module._compile (node:internal/modules/cjs/loader:1103:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
    at node:internal/main/run_main_module:17:47
Error
    at test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod2.js:6:15)
    at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:7:1)
    at Module._compile (node:internal/modules/cjs/loader:1103:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
    at node:internal/main/run_main_module:17:47

However when in src\main.js the first line (const apm = require('elastic-apm-node')) is put in comment, you will get the right stack trace:

C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod1.ts:3
    throw Error();
          ^

Error
    at null.test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod1.ts:3:11)
    at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:4:1)
    at Module._compile (node:internal/modules/cjs/loader:1103:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
    at node:internal/main/run_main_module:17:47
C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod2.ts:3
    throw Error();
          ^

Error
    at null.test (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\mod2.ts:3:11)
    at Object.<anonymous> (C:\Users\dtoch\work\pga\testbug-correct-stacktrace\src\main.js:7:1)
    at Module._compile (node:internal/modules/cjs/loader:1103:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1157:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
    at node:internal/main/run_main_module:17:47

Expected behavior

Doing require('elastic-apm-node') should not influence the stacktrace generated by "node --enable-source-maps ...".

Environment (please complete the following information)

  • OS: [e.g. Linux] Windows
  • Node.js version: 16.14.2
  • APM Server version: Not important (don't need server to simulate problem)
  • Agent version: 3.36.0

How are you starting the agent? (please tick one of the boxes)

  • Calling agent.start() directly (e.g. require('elastic-apm-node').start(...))
  • Requiring elastic-apm-node/start from within the source code
  • Starting node with -r elastic-apm-node/start

Additional context

  • Agent config options

    Click to expand
    replace this line with your agent config options
    
  • package.json dependencies:

    Click to expand
    replace this line with your dependencies section from package.json
    
@github-actions github-actions bot added agent-nodejs Make available for APM Agents project planning. community triage labels Jul 18, 2022
@dtoch dtoch changed the title requirng elastic-apm-node in code influences node --enable-source-maps option require"elastic-apm-node") in code influences node --enable-source-maps option Jul 18, 2022
@dtoch dtoch changed the title require"elastic-apm-node") in code influences node --enable-source-maps option require"elastic-apm-node") in code influences stacktrace generated with "node --enable-source-maps ..." Jul 18, 2022
@dtoch dtoch changed the title require"elastic-apm-node") in code influences stacktrace generated with "node --enable-source-maps ..." require("elastic-apm-node") influences stacktrace generated with "node --enable-source-maps ..." Jul 18, 2022
@trentm
Copy link
Member

trentm commented Jul 22, 2022

TIL that node --enable-source-maps ... is a thing. I had not seen it before.

Doing require('elastic-apm-node') should not influence the stacktrace generated by "node --enable-source-maps ...".

I agree with that statement, in general: that just require'ing the agent should not impact behaviour of the app, at least as much as possible. Starting the agent, ideally, should not impact behaviour of the app either, but I suppose there is more latitude for some impact when the agent is active.

Why require('elastic-apm-node') has an impact

The reason this is having an impact is because as part of the APM agent's stacktrace collection handling (mainly for serializing a stacktrace when an Error is captured), it imports error-callsites, which sets a Error.prepareStackTrace function (per https://v8.dev/docs/stack-trace-api#customizing-stack-traces). However, according to https://nodejs.org/api/all.html#all_cli_--enable-source-maps

Overriding Error.prepareStackTrace prevents --enable-source-maps from modifying the stack trace.

An aside on subtlties in --enable-source-maps

There is also some more subtlety going on here. If one uses node --enable-source-maps bar.js on a script that has no source map, then the leading context on the printed stacktrace is impacted. This is without the APM agent involved at all:

% cat bar.js
function aFunc() {
    throw new Error('boom');
}
aFunc();

% node bar.js
/Users/trentm/tmp/asdf/bar.js:2
    throw new Error('boom');
    ^

Error: boom
    at aFunc (/Users/trentm/tmp/asdf/bar.js:2:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf/bar.js:4:1)
    at Module._compile (node:internal/modules/cjs/loader:1105:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
...

% node --enable-source-maps bar.js
Error: boom
    at aFunc (/Users/trentm/tmp/asdf/bar.js:2:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf/bar.js:4:1)
    at Module._compile (node:internal/modules/cjs/loader:1105:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
...

My (limited) understanding of Node's implementation of --enable-source-maps is that the actual Error.stack string value is changed to include that leading "/Users/trentm/tmp/asdf/bar.js:2\n throw new Error('boom');\n ^" context (nodejs/node#43186). And then (I'm guessing) the default uncaughtException handler in node will no longer add the leading context if in --enable-source-maps-mode. It looks like this behaviour of adding the leading source context info to Error.stack will change in the next node major version: nodejs/node#43875

Also FWIW, from nodejs/node#41541 it sounds like --enable-source-maps can be slow with large source maps.

back to the APM agent's impact

I'm not sure exactly what the best things are that the APM agent can do here:

  1. Perhaps we could delay overriding Error.prepareStackTrace until the APM agent is started (i.e. in Agent.start()). This is also where Error.stackTraceLimit is set. This would mean this impact would only occur if the APM agent was enabled. That is slightly better.

@dtoch Would that handle your main concern? Or is this also an issue for you when you have the APM agent active?

What to do when the agent is enabled? We want to override Error.prepareStackTrace to get access to the v8 CallSites data structures. Those are used to report a structured stacktrace to APM server, that you can see in the Error details in the Kibana APM app. Note that the APM agent does support sourcemaps for those reported stacktraces. For example:

% cat foo.ts
interface Foo {
}

function aFunc() {
    throw new Error('boom')
}
aFunc()

% ./node_modules/.bin/tsc --sourceMap foo.ts

% cat foo.js
function aFunc() {
    throw new Error('boom');
}
aFunc();
//# sourceMappingURL=foo.js.map

% ELASTIC_APM_LOG_UNCAUGHT_EXCEPTIONS=true node --enable-source-maps -r elastic-apm-node/start foo.js
Error: boom
    at aFunc (/Users/trentm/tmp/asdf/foo.js:2:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf/foo.js:4:1)
    at Module._compile (node:internal/modules/cjs/loader:1105:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1159:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:77:12)
    at node:internal/main/run_main_module:17:47
{"log.level":"info","@timestamp":"2022-07-22T22:18:07.856Z","log":{"logger":"elastic-apm-node"},"ecs":{"version":"1.6.0"},"message":"Sending error to Elastic APM: {\"id\":\"e3863ca4f871527aacec1622fe424151\"}"}

and the structured stacktrace sent to APM server looks like this (it has applied the sourcemap):

            "exception": {
                "message": "boom",
                "type": "Error",
                "handled": false,
                "stacktrace": [
                    {
                        "filename": "foo.ts",
                        "lineno": 5,
                        "function": "aFunc",
                        "library_frame": false,
                        "abs_path": "/Users/trentm/tmp/asdf/foo.ts",
                        "pre_context": [
                            "",
                            "function aFunc() {"
                        ],
                        "context_line": "    throw new Error('boom')",
                        "post_context": [
                            "}",
                            "aFunc()"
                        ]
                    },
                    {
                        "filename": "foo.ts",
                        "lineno": 7,
                        "function": "Object.<anonymous>",
                        "library_frame": false,
                        "abs_path": "/Users/trentm/tmp/asdf/foo.ts",
                        "pre_context": [
                            "    throw new Error('boom')",
                            "}"
                        ],
                        "context_line": "aFunc()",
                        "post_context": [
                            ""
                        ]
                    },

Given that the APM agent is already calculating the source-mapped call sites, it could do the same thing that node's --enable-source-maps is doing for printing that uncaught exception. Then it could have the printed exception match what you see with node --enable-source-maps foo.js.

This is only a partial solution, however. It would handle the case of printing an uncaught exception with the source-mapped locations. But it wouldn't handle changing the value of Error.stack as I believe node --enable-source-maps ... is doing. Is node doing that synchronously when Error.stack is first accessed? Probably. If so, there could be a significant performance impact there. Currently the sourcemap handling in the APM agent is only done asynchronously when reporting the Error, so that the event loop is minimally affected.

More investigation would be required to dig into this.

@trentm trentm removed the triage label Jul 22, 2022
@dtoch
Copy link
Author

dtoch commented Jul 23, 2022

Hi thanks for the quick and detailed reply :-).

  • We only run node with --enable-source-maps when developing locally to be able to more easily pinpoint an error thrown in transpiled code generated from the source code.

  • When running the application in a target production environment, we never enable --enable-source-maps (because as documented it also impacts performance).

However in our source code we have code to start the APM agent and then to interact with the APM Agent API for example to customize the transaction name:

...
const apm = require('elastic-apm-node');
apm.start();
...
apm.setTransactionName(...);
...

When developing locally, the agent is never active because we always set the standard APM agent environment variable ELASTIC_APM_ACTIVE=false but in a target production environment the agent is active.

As a workaround we could wrap above in an if:

...
if (process.env.ELASTIC_APM_ACTIVE == "true") {
  const apm = require('elastic-apm-node');
  apm.start();
  ...
  apm.setTransactionName(...);
}
...

but ideally we could like to avoid this because if in the rest of the code we also use the APM Agent API, then these should be wrapped as well.

Or could this condition on ELASTIC_APM_ACTIVE be considered within the require('elastic-apm-node') call itself?

@trentm
Copy link
Member

trentm commented Jul 25, 2022

Or could this condition on ELASTIC_APM_ACTIVE be considered within the require('elastic-apm-node') call itself?

I think your use case would be handled by doing this:

  1. Perhaps we could delay overriding Error.prepareStackTrace until the APM agent is started (i.e. in Agent.start()). This is also where Error.stackTraceLimit is set. This would mean this impact would only occur if the APM agent was enabled. That is slightly better.

More specifically we should avoid overriding Error.prepareStackTrace if the APM agent is disabled via ELASTIC_APM_ACTIVE=true / active:true.

Thanks for clarifying the use case; that helps. I'll see about getting this scheduled for our next planning session.

@trentm trentm added this to the 8.5 milestone Jul 28, 2022
@astorm astorm self-assigned this Sep 1, 2022
@astorm astorm closed this as completed Sep 12, 2022
astorm added a commit that referenced this issue Sep 12, 2022
* feat: Delay loading error-callsites module to avoid side effects.

#2833
@trentm
Copy link
Member

trentm commented Oct 19, 2022

elastic-apm-node@3.39.0 is release with this now

@Marsup
Copy link
Contributor

Marsup commented Nov 3, 2023

Hi @trentm, sorry to bump such an old issue, but I'm wondering if it still holds true to this day with a node@18 and apm@4.1.0 for example.

I've been experimenting around this since we have this problem in production, and with the following minimal example in typescript:

import apm from 'elastic-apm-node';

apm.start();

function foo() {
  throw new Error('A');
}

try {
  foo();
} catch (e) {
  console.info((e as any).stack);
}

Building this file with inlineSourceMap: true to guarantee the sourcemap to be accessible, I get the following stacktraces.

With apm.start():

Error: A
    at foo (./file.js:9:11)
    at Object.<anonymous> (./file.js:12:5)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47

Without apm.start():

Error: A
    at foo (./file.ts:6:9)
    at Object.<anonymous> (./file.ts:10:3)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47

Passing --enable-source-maps doesn't seem to have any impact whatsoever.

@trentm
Copy link
Member

trentm commented Nov 7, 2023

Hi @Marsup. I believe what you are seeing is what is currently expected from the above discussion, though obviously it isn't ideal.

The change in elastic-apm-node@3.39.0 made it so node's --enable-source-maps handling was not affected by just importing the APM agent code via require('elastic-apm-node'). However, starting the APM agent will prevent --enable-source-maps from doing its thing.

my repro

% npm ls
asdf.20231103t165932@1.0.0 /Users/trentm/tmp/asdf.20231103T165932
├── @types/node@20.8.10
├── elastic-apm-node@4.1.0
└── typescript@5.2.2

foo.ts

import apm from 'elastic-apm-node'; apm

// apm.start()

interface Foo {
}

function aFunc() {
    throw new Error('boom')
}
aFunc()

foo-start-apm.ts

import apm from 'elastic-apm-node'; apm

apm.start()

interface Foo {
}

function aFunc() {
    throw new Error('boom')
}
aFunc()

Showing with and without --enable-source-maps:

% ./node_modules/.bin/tsc --sourceMap --esModuleInterop foo.ts
% ./node_modules/.bin/tsc --sourceMap --esModuleInterop foo-start-apm.ts
% node foo.js
/Users/trentm/tmp/asdf.20231103T165932/foo.js:9
    throw new Error('boom');
    ^

Error: boom
    at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo.js:9:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo.js:11:1)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47

Node.js v18.18.2
[11:12:25 trentm@pink:~/tmp/asdf (rv:1)]
% node --enable-source-maps foo.js
/Users/trentm/tmp/asdf.20231103T165932/foo.ts:9
    throw new Error('boom')
    ^


Error: boom
    at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo.ts:9:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo.ts:11:1)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47

Node.js v18.18.2

This shows that starting the APM agent results in different handling of <Error instance>.stack and the output to stderr for uncaughtException events:

% node --enable-source-maps foo-start-apm.js
{"log.level":"info","@timestamp":"2023-11-07T19:12:51.314Z","log.logger":"elastic-apm-node","ecs.version":"8.10.0","agentVersion":"4.1.0","env":{"pid":24860,"proctitle":"node","os":"darwin 22.6.0","arch":"x64","host":"pink.local","timezone":"UTC-0800","runtime":"Node.js v18.18.2"},"config":{"serviceName":{"source":"default","value":"asdf_20231103t165932","commonName":"service_name"},"serviceVersion":{"source":"default","value":"1.0.0","commonName":"service_version"},"serverUrl":{"source":"default","value":"http://127.0.0.1:8200/","commonName":"server_url"},"logLevel":{"source":"default","value":"info","commonName":"log_level"}},"activationMethod":"require","message":"Elastic APM Node.js Agent v4.1.0"}
Error: boom
    at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo-start-apm.js:10:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo-start-apm.js:12:1)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47
{"log.level":"info","@timestamp":"2023-11-07T19:12:51.361Z","log.logger":"elastic-apm-node","ecs.version":"8.10.0","message":"Sending error to Elastic APM: {\"id\":\"633a2f196e2837075dded6a48e52e982\"}"}

possible improvements

To improve on this situation will take new work on the APM agent.

  1. To handle just the case of the output printed to stderr on uncaughtException events, the APM agent code could get smarter. It could detect if the --enable-source-maps option was given (either in process.execArgv or in NODE_OPTIONS) and then apply sourcemaps itself.

However, this only handles the uncaughtException case. With node --enable-source-maps ... the sourcemap translation is done for all <Error instance>.stack usages. For example:

% cat foo-internal-error.ts
interface Foo {
}
function aFunc() {
    throw new Error('boom')
}
try {
    aFunc()
} catch (err) {
    console.log('caught an err:', err.stack)
}

% ./node_modules/.bin/tsc --sourceMap --esModuleInterop foo-internal-error.ts

% node foo-internal-error.js
caught an err: Error: boom
    at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.js:2:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.js:5:5)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47

% node --enable-source-maps foo-internal-error.js
caught an err: Error: boom
    at aFunc (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.ts:4:11)
    at Object.<anonymous> (/Users/trentm/tmp/asdf.20231103T165932/foo-internal-error.ts:7:5)
    at Module._compile (node:internal/modules/cjs/loader:1256:14)
    at Module._extensions..js (node:internal/modules/cjs/loader:1310:10)
    at Module.load (node:internal/modules/cjs/loader:1119:32)
    at Module._load (node:internal/modules/cjs/loader:960:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:86:12)
    at node:internal/main/run_main_module:23:47
  1. The only way for an active elastic-apm-node to stay out of the way of --enable-source-maps would be for it to stop using Error.prepareStackTrace at all. Using this mechanism is the canonical API for getting rich stack trace information: https://v8.dev/docs/stack-trace-api#customizing-stack-traces Perhaps the Node.js --enable-source-maps handling makes those richer CallSite object available for other code, like elastic-apm-node, to use or could be made to do so. Or perhaps elastic-apm-node could fallback to doing a best-effort parse of the err.stack string.

Would you be willing to open a separate feature request issue for this? It is very helpful to have real user use cases on issues to motivate development.

fpm-peter pushed a commit to fpm-git/apm-agent-nodejs that referenced this issue Aug 20, 2024
* feat: Delay loading error-callsites module to avoid side effects.

elastic#2833
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.5-candidate agent-nodejs Make available for APM Agents project planning. bug community
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants