Skip to content

feat: improve pg write queue #1878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

bestmike007
Copy link
Contributor

@bestmike007 bestmike007 commented Mar 1, 2024

Description

This PR is to fix the potential issue described in https://medium.com/@alkor_shikyaro/transactions-and-promises-in-node-js-ca5a3aeb6b74

Type of Change

  • New feature
  • Bug fix
  • API reference/documentation update
  • Other

Does this introduce a breaking change?

No.

Are documentation updates required?

No.

Testing information

Current tests covered this.

Checklist

  • Code is commented where needed
  • Unit test coverage for new or modified code paths
  • npm run test passes
  • Changelog is updated
  • Tag 1 of @rafaelcr or @zone117x for review

Sorry, something went wrong.

Unverified

This user has not yet uploaded their public signing key.
Signed-off-by: bestmike007 <i@bestmike007.com>
@bestmike007
Copy link
Contributor Author

@zone117x

Copy link

codecov bot commented Mar 1, 2024

Codecov Report

Attention: Patch coverage is 90.00000% with 1 line in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/datastore/helpers.ts 90.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

done(): Promise<void> {
return this.queue.onIdle();
async done(): Promise<void> {
// https://medium.com/@alkor_shikyaro/transactions-and-promises-in-node-js-ca5a3aeb6b74
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't believe the problem described in this post applies to the tasks handled by this queue for a couple of reasons:

  1. PQueue's onIdle already behaves like allSettled

    Returns a promise that settles when the queue becomes empty, and all promises have completed

  2. The API only enqueues work to this queue inside of a pre-existing SQL transaction (all the block processing code is contained in a single sqlWriteTransaction). If there's ever an exception thrown inside, the transaction aborts and rolls back all the pending writes. We've done extensive tests on this and it's worked correctly for a long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. That's what I thought, but it was wrong; here's a demonstration: https://replit.com/@bestmike007/TroubledEarnestSandboxes#index.ts
  2. onIdle does not throw an exception, it simply waits all promises to complete.

What I'm seeing is, there was an unhandledRejection and then the process restarted.

@rafaelcr
Copy link
Collaborator

rafaelcr commented Mar 1, 2024

@bestmike007 I'm curious to now if this fix was submitted because of a problem you're experiencing? We'd be happy to test some edge case if you've found one we haven't covered yet

@bestmike007
Copy link
Contributor Author

Yes I was trying to resolve a problem, our nodes run into unrecoverable state, not sure if it's related to this PR.

Here's one of the situations I saw, which should not be happening:

telegram-cloud-photo-size-5-6273870458726235191-y

Another one, I couldn't found a screenshot, was reporting something like table tid from new index tuple (13374,63) overlaps with invalid duplicate tuple at offset 4 of block 6211 in inde, and can be fixed by 3 steps: remove the nft_custody_unique constraints, remove duplicates with lower block heights, and then rebuild the unique constraint.

Unverified

This user has not yet uploaded their public signing key.
Signed-off-by: bestmike007 <i@bestmike007.com>
@rafaelcr
Copy link
Collaborator

rafaelcr commented Mar 6, 2024

Hmm I don't believe that error is related to this change. It seems strange, though, because the ON CONFLICT ON CONSTRAINT is there precisely to avoid this error. Can you tell me more about your setup? Is this error occurring in an event replay, or a normal Stacks node sync?

}
enqueue(task: Parameters<PQueue['add']>[0]): void {
void this.queue.add(task);
const p = this.queue.add(task);
p.catch(e => logger.error(e, 'PgWriteQueue task failed'));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this catch needed, or would it be handled later in the throw firstRejected.reason code?

If the error is logged here, it would have no meaningful stack trace compared to the below code which preserves the async call stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's no catch, the process will be terminated by an unhandledRejection.

The error is logged here because throw firstRejected.reason only throws the first one, we don't have any log for the other errors.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, would it make sense to use the p-queue error event instead? https://www.npmjs.com/package/p-queue#error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what we saw using a node built from this PR.

image

Copy link
Contributor

@zone117x zone117x Mar 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, we can manually capture the errors ourself with .catch to prevent a terminated by an unhandledRejection. But could we instead use the build-in p-queue errror handler, e.g.

queue.on('error', error => {
	console.error(error);
});

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bestmike007
Copy link
Contributor Author

Hmm I don't believe that error is related to this change. It seems strange, though, because the ON CONFLICT ON CONSTRAINT is there precisely to avoid this error. Can you tell me more about your setup? Is this error occurring in an event replay, or a normal Stacks node sync?

It it not related, it's another issue I saw when the node keeps crashing because of unhandledRejection when it fails to process a new_block event.

Our nodes are set up using the script in this repo: https://github.com/alexgo-io/stacks-node-mainnet, and nodes restoring from both hiro archive and the cold backup saw this issue.

@saralab saralab moved this from 🏗 In Progress to ✅ Done in API Board Mar 24, 2025
@rafaelcr rafaelcr closed this May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants