Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible unhandleable crash in kad-dht #2216

Closed
saul-jb opened this issue Nov 7, 2023 · 1 comment · Fixed by #2225
Closed

Possible unhandleable crash in kad-dht #2216

saul-jb opened this issue Nov 7, 2023 · 1 comment · Fixed by #2225
Labels
need/triage Needs initial labeling and prioritization

Comments

@saul-jb
Copy link
Contributor

saul-jb commented Nov 7, 2023

  • Version:
    • libp2p: 0.46.16
    • @libp2p/kad-dht: 10.0.11
  • Platform: Linux 5.15.0-88-generic Ubuntu x86_64 GNU/Linux
  • Subsystem: kad-dht

Severity: ?

Description:

I have been downloading many blocks in parallel via Helia and on (rare) occasions it would crash without much of a trace:

file://node_modules/@libp2p/kad-dht/src/query/query-path.ts:227
      deferred.reject(new CodeError('Query aborted', 'ERR_QUERY_ABORTED'))
                      ^
CodeError: Query aborted
    at EventTarget.<anonymous> (file://node_modules/@libp2p/kad-dht/src/query/query-path.ts:227:23)
    at EventTarget.[nodejs.internal.kHybridDispatch] (node:internal/event_target:757:20)
    at EventTarget.dispatchEvent (node:internal/event_target:692:26)
    at abortSignal (node:internal/abort_controller:369:10)
    at AbortController.abort (node:internal/abort_controller:403:5)
    at EventTarget.onAbort (file://node_modules/any-signal/src/index.ts:14:16)
    at EventTarget.[nodejs.internal.kHybridDispatch] (node:internal/event_target:757:20)
    at EventTarget.dispatchEvent (node:internal/event_target:692:26)
    at abortSignal (node:internal/abort_controller:369:10)
    at AbortController.abort (node:internal/abort_controller:403:5) {
  code: 'ERR_QUERY_ABORTED',
  props: {}
}

Further debugging shows that the abort signal triggering this can be called from bitswap

Steps to reproduce the error:

Unfortunately I can't seem to trigger this reliably. The error is thrown in query-path.ts#L227. I believe this error is thrown and is uncatchable if the following steps happen:

In case that is not clear enough here is some code illustrating the error when everything aligns:

import defer from 'p-defer';

async function * toGenerator () {
	let defered = defer();

	// Resolve soon.
	setTimeout(() => defered.resolve(), 10);

	for (;;) {
		// Wait for the deferred to be resolved
		await defered.promise;
		defered = defer();

		// Reject later.
		setTimeout(() => defered.reject("Query aborted"), 10);

		yield;
	}
}

const itr = toGenerator();

try {
	// First iteration should resolve.
	await itr.next();

	// A timer allows the promise to reject before the .next() call.
	await new Promise(r => setTimeout(r, 30));

	// The toGenerator iterator will throw an uncatchable error before it gets here.
	await itr.next();
} catch (error) {
	// Error will not be catchable.
}

It's possible that I am barking up the wrong tree here and my crash is caused by something else but after many hours of debugging it is the only possibility that I can think of that causes this issue.

@saul-jb saul-jb added the need/triage Needs initial labeling and prioritization label Nov 7, 2023
achingbrain added a commit that referenced this issue Nov 9, 2023
If a DHT query is aborted during reading, the deferred promise can
become rejected while nothing is `await`ing it.

Switch the implementation to use a `pushable` queue instead.

Fixes #2216
achingbrain added a commit that referenced this issue Nov 9, 2023
If a DHT query is aborted during reading, the deferred promise can
become rejected while nothing is `await`ing it.

Switch the implementation to use a `pushable` queue instead.

Fixes #2216
@achingbrain
Copy link
Member

No, I think this is a possible crash though it's hard to trigger. I managed to get a repro in a test for #2225 but it was complicated slightly as mocha seems to do something funny to unhandledRejection events on the process object.

Registering a listener directly surfaces the error.

Switching the internal implementation to use a pushable queue instead of the deferred promise makes the test pass.

achingbrain added a commit that referenced this issue Nov 10, 2023
If a DHT query is aborted during reading, the deferred promise can become rejected while nothing is `await`ing it.

Switch the implementation to use a `pushable` queue instead.

Fixes #2216
maschad pushed a commit to maschad/js-libp2p that referenced this issue Nov 10, 2023
If a DHT query is aborted during reading, the deferred promise can become rejected while nothing is `await`ing it.

Switch the implementation to use a `pushable` queue instead.

Fixes libp2p#2216
This was referenced Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need/triage Needs initial labeling and prioritization
Projects
None yet
2 participants