Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: catch all exceptions in auction wakers #8438

Merged
merged 4 commits into from
Oct 8, 2023
Merged

Conversation

Chris-Hibbert
Copy link
Contributor

@Chris-Hibbert Chris-Hibbert commented Oct 5, 2023

closes: #8307

Description

#8296 demonstrated an issue in the Timer wakers used by the Auctioneer. #8301 addressed the particular problem, but not the general case. This handles the general case for all wakers in the Auctioneer.

Security Considerations

This is about robustness rather than security.

Scaling Considerations

No impact.

Documentation Considerations

N/A

Testing Considerations

Existing tests pass. No new tests added, as inducing appropriate errors wasn't cost effective in code that has already been fixed for all the cases we know of.

There is a test that serves as a regression test for the known case. If the fix from #8301 were lost, this new try-catch would prevent the damage caused by the situation in that test.

Upgrade Considerations

This doesn't make it any more or less urgent to get upgrades to the chain.

Copy link
Member

@turadg turadg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For solving the "general case" of timer exceptions, this covers Auctioneer but doesn't help other contracts that will run into this.

I think we should consider a helper that makes requires a handler for wake exceptions. E.g.

    uponWakeup(
      timer,
      start,
      time => {
        setTimeMonotonically(time);
        auctionDriver.capturePrices();
        // eslint-disable-next-line no-use-before-define
        return startAuction();
      },
      e => console.error(`🚨 Auction threw ${e}. Caught in SchedulerWaker.`),
    );

If a handler (fourth position) isn't provided there'd be a type error.

trace('wake step', now);
clockTick(liveSchedule);
} catch (e) {
console.warn(`🧯Auction threw ${e}. Caught in PriceStepWaker.`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a new emoji to log. what does it mean differently than 🚨 and ⚠️ ?

Also shouldn't this log as an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are both warnings that something is wrong and danger may persist. This was an attempt to say that the emergency has been handled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol I get it now.

Is it a full recovery? No further investigation needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigation would be warranted. I'd like to know that something happened and figure out why, but I don't expect the chain to be in imminent danger.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then it sounds like the same attention needed as

console.error(
`⚠️ Excess collateral remaining sent to reserve. Expected ${q(
plan.collatRemaining,
)}, sent ${q(collateralInLiqSeat)}`,
);

I think it needs to a) go to the error logs (so console.error) and b) indicate that investigation is warranted. ⚠️ is what we've been using for that.

I see how 🧯 might be a better pictograph for that because it suggests an exception was handled, but a counter-argument is that every exception that doesn't halt the vat is handled so it's not an informative distinction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Chris-Hibbert
Copy link
Contributor Author

I think we should consider a helper that makes requires a handler for wake exceptions. E.g.

If we think that should always be required, then we should lobby for a change to the timerService. Not rescheduling when a waker throws was a design choice. If that's sometimes the right behavior, then each caller of setWakeup or repeatAfter should make a conscious decision about whether to unconditionally catch.

existing tests pass. No new tests added, as inducing appropriate errors was too hard.
@Chris-Hibbert
Copy link
Contributor Author

Rebased to master, and added one tiny commit.

@Chris-Hibbert Chris-Hibbert added the automerge:squash Automatically squash merge label Oct 6, 2023
@github-actions
Copy link

github-actions bot commented Oct 8, 2023

@Chris-Hibbert - This PR appears to be stuck. It's had a merge label for > 24 hours

@turadg
Copy link
Member

turadg commented Oct 8, 2023

@Mergifyio requeue

@mergify
Copy link
Contributor

mergify bot commented Oct 8, 2023

requeue

✅ The queue state of this pull request has been cleaned. It can be re-embarked automatically

@mergify mergify bot merged commit d4b32f1 into master Oct 8, 2023
71 checks passed
@mergify mergify bot deleted the 8307-noWakerThrows branch October 8, 2023 17:31
@Chris-Hibbert Chris-Hibbert mentioned this pull request Feb 7, 2024
14 tasks
anilhelvaci pushed a commit to Jorge-Lopes/agoric-sdk that referenced this pull request Feb 16, 2024
* feat: catch all exceptions in auction wakers

existing tests pass. No new tests added, as inducing appropriate errors was too hard.

* chore: canonicalize error logs

---------

Co-authored-by: mergify[bot] <37929162+mergify[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auction automerge:squash Automatically squash merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

All Auction timer repeated actions should catch exceptions or have a way to recover
2 participants