Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Debugging a server client holding process open #460

Closed
chadxz opened this issue May 15, 2024 · 10 comments
Closed

Debugging a server client holding process open #460

chadxz opened this issue May 15, 2024 · 10 comments
Labels
bug Something isn't working package: sdk/server-node Label for issues affecting the sdk/server-node package. Stale

Comments

@chadxz
Copy link

chadxz commented May 15, 2024

Looking for guidance and/or best-practices for trying to understand how best to make sure the persistent connection to Launchdarkly closes cleanly, as we're having an issue with our process being held open.

Describe the bug

When using node-server-sdk in a cron job, the persistent connection is holding the process open even after we call .close()

To reproduce

I don't know what specifically triggers this, as it doesn't appear to be consistent. When it happens, our logging shows that the close call has been made, but the process doesn't exit as it should.

Expected behavior

After calling close, the process exits instead of hanging open.

SDK version
node-server-sdk@9.4.1

Language version, developer tools
Docker image node:20.12.2-bookworm

Additional context
I was looking at the code and noticed the types that my app has installed indicate the close method is synchronous, but the code appears to be async. Not sure what is going on there.

@chadxz chadxz added bug Something isn't working package: sdk/server-node Label for issues affecting the sdk/server-node package. labels May 15, 2024
@chadxz
Copy link
Author

chadxz commented May 17, 2024

Also been seeing this in our test suite. Maybe related?:

Jest has detected the following 1 open handle potentially keeping Jest from exiting:

  ●  TLSWRAP

      24 |     this.client =
      25 |       LDClient ??
    > 26 |       LaunchDarkly.init(process.env.LAUNCH_DARKLY_SDK_KEY as string, {
         |                    ^
      27 |         timeout: TIMEOUT_IN_SECONDS,
      28 |         logger: TraceLogger,
      29 |         updateProcessor: process.env.NODE_ENV === "test" ? launchDarklyTestData.getFactory() : undefined,

      at ../node_modules/@opentelemetry/instrumentation-http/src/http.ts:690:26
      at safeExecuteInTheMiddle (../node_modules/@opentelemetry/instrumentation/src/utils.ts:32:14)
      at ../node_modules/@opentelemetry/instrumentation-http/src/http.ts:689:67
      at AsyncHooksContextManager.with (../node_modules/@opentelemetry/context-async-hooks/src/AsyncHooksContextManager.ts:49:17)
      at ContextAPI.with (../node_modules/@opentelemetry/api/src/api/context.ts:77:42)
      at outgoingRequest (../node_modules/@opentelemetry/instrumentation-http/src/http.ts:679:26)
      at Object.httpsOutgoingRequest [as request] (../node_modules/@opentelemetry/instrumentation-http/src/http.ts:263:10)
      at ../node_modules/@launchdarkly/node-server-sdk/src/platform/NodeRequests.ts:124:24
      at NodeRequests.fetch (../node_modules/@launchdarkly/node-server-sdk/src/platform/NodeRequests.ts:123:12)
      at EventSender.tryPostingEvents (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventSender.ts:67:67)
      at EventSender.sendEventData (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventSender.ts:120:17)
      at EventProcessor.postDiagnosticEvent (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventProcessor.ts:165:22)
      at EventProcessor.start (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventProcessor.ts:145:12)
      at new EventProcessor (../node_modules/@launchdarkly/js-sdk-common/src/internal/events/EventProcessor.ts:123:12)
      at new LDClientImpl (../node_modules/@launchdarkly/js-server-sdk-common/src/LDClientImpl.ts:177:29)
      at new LDClientNode (../node_modules/@launchdarkly/node-server-sdk/src/LDClientNode.ts:38:5)
      at new WithEvents (../node_modules/@launchdarkly/node-server-sdk/src/Emits.ts:13:10)
      at Object.init (../node_modules/@launchdarkly/node-server-sdk/src/index.ts:49:10)
      at new LaunchDarklyClient (src/integrations/launchDarkly/client.ts:26:20)
      at Object.<anonymous> (src/integrations/launchDarkly/index.ts:3:23)
      at Object.<anonymous> (test/setupAfterEnv.ts:1:1)

@kinyoklion
Copy link
Member

Hello @chadxz,

Another user was encountering a similar situation recently, but as of yet we have been unable to reproduce. Could you provide details on the platform that you are experiencing the problem on?

Closing a client is synchronous, but any outstanding asynchronous tasks will keep the application running. There are a few such tasks, but all their handles are retained and should be cancelled when the client is closed. For instance there is a timer on the event processor and that timer gets cleared when closing the client. If it was not, then that timer could keep the process running.

So, in some condition, some tasks are potentially unaccounted for and then node doesn't shutdown cleanly.

We will look into it.

Filed internally as 245016

Thank you,
Ryan

@kinyoklion
Copy link
Member

In regards to the handle open in the unit tests I would make sure you are closing the client. Otherwise it will have recurring timers. Generally though we would suggest using an offline client, or a client configured using TestData, for testing purposes.

Thanks,
Ryan

@kinyoklion
Copy link
Member

@chadxz Is this happening frequently enough that you would be able to capture some additional logging. Node has a function that can show the resources that are keeping it open: https://nodejs.org/api/process.html#processgetactiveresourcesinfo

Logging the result of that function would at least provide the type of resource. So it could be determined if it was a timer or a network request, for example.

For much more specific information there are libraries like: https://www.npmjs.com/package/why-is-node-running

Which would provide callstacks of the specific resources.

Thanks,
Ryan

@chadxz
Copy link
Author

chadxz commented May 23, 2024

Another user was encountering a similar situation recently, but as of yet we have been unable to reproduce. Could you provide details on the platform that you are experiencing the problem on?

Debian docker container on Linux. (base image is node:20.12.2-bookworm)

In regards to the handle open in the unit tests I would make sure you are closing the client.

Yeah we are closing it in our afterAll() callback in Jest like this:

await client.flush();
client.close();

Is this happening frequently enough that you would be able to capture some additional logging?

It was, but one of our devs stuck a process.exit(0) call at the end of the cronjob that was hanging and as far as I know the hang stopped. Obviously this is a hack, but the fire is out at least. If it happens again i'll instrument it with the why-is-node-running call and see if we can get some better leads.

Thanks for your thoughts!

@kinyoklion
Copy link
Member

Thank you @chadxz.

We did find a couple issues that could have left some handles open with streaming error conditions and have released fixes for those.

We will continue investigation. We currently do not cancel pending network requests when the client is closed, so it is possible that those requests prevent graceful shutdown in some situations. (For instance posting a batch of analytic events.)

As you mention process.exit will exit regardless of pending work.

Thank you,
Ryan

@hysmio
Copy link

hysmio commented Jul 13, 2024

Just ran into this when using https://bun.sh instead of Node. More than likely this is on the Bun side of things, but Bun also exits normally with other libraries.

Tested using:
Bun: 1.0.30
Node: v20.12.2

This is the following code I was using to test the LD Relay Proxy:

const ld = require("@launchdarkly/node-server-sdk");

const relayUri = "http://localhost:8030";
const sdkKey = ``;
let launchDarklyCleanedUp = false;

(async () => {
  const client = await ld
    .init(sdkKey, {
      baseUri: relayUri,
      streamUri: relayUri,
      eventsUri: relayUri,
    })
    .waitForInitialization({
      timeout: 5,
    });

  console.log("Initialised LD client");

  const testFlag = await client.boolVariation(
    "test-flag",
    { key: "user1" },
    true,
  );
  console.log(testFlag);

  const cleanup = async () => {
    if (!launchDarklyCleanedUp) {
      launchDarklyCleanedUp = true;
      await client.flush();
      await client.close();
      console.log("Closed LD client");
    }
  };

  await cleanup();

  process.on("SIGINT", async () => {
    await cleanup();
    process.exit(1);
  });
})();
image

@kinyoklion
Copy link
Member

Hello @hysmio,

Thank you for the report.

In node you can use process.__getActiveRequests and process._getActiveHandles to determine what is holding the event loop open. (Which I can no longer find any open handles in normal usage in node.)

I do not see any comments or docs about it in bun, but it appears that those functions are implemented as stubs that always return an empty array. So unfortunately I do not see a good starting point for debugging this issue in bun. Unless there is some way to inspect the event loop and I am having a hard time discovering it.

Thank you,
Ryan

Copy link
Contributor

This issue is stale because it has been open for 90 days without activity. Remove the stale label or comment, or this will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 14, 2024
@kinyoklion
Copy link
Member

It is likely the original underlying reason here has been addressed. I am going to close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working package: sdk/server-node Label for issues affecting the sdk/server-node package. Stale
Projects
None yet
Development

No branches or pull requests

3 participants