Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for having a reassign backoff. #1219

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

NikeNano
Copy link

@NikeNano NikeNano commented Jan 24, 2025

Description

We have seen on our hatchet deployment that reassignment of hatchet steps is super fast. This seems to exhaust the nbr of max retries(SERVER_MAX_INTERNAL_RETRY_COUNT) during a redeploy of the k8s pods and the workflow end up failing. We have long running workflows( hours) but redeploy the workers ~20 times a day.

Made this pr as a potential fix, but mostly hope to just learn how we can run hatchet better. Please also let med know how to test this the best.

Fixes # (issue)

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Documentation change (pure documentation change)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactor (non-breaking changes to code which doesn't change any behaviour)
  • CI (any automation pipeline changes)
  • Chore (changes which are not directly related to any business logic)
  • Test changes (add, refactor, improve or change a test)
  • This change requires a documentation update

What's Changed

  • Add a list of tasks or features here...

Copy link

vercel bot commented Jan 24, 2025

@NikeNano is attempting to deploy a commit to the Hatchet Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor

@abelanger5 abelanger5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @NikeNano, thanks for the PR! It looks like there are a few files missing from the commit, could you also push up the relevant changes in the step_runs.sql file (I only see the generated Go file).

@NikeNano
Copy link
Author

NikeNano commented Jan 29, 2025

Hey @NikeNano, thanks for the PR! It looks like there are a few files missing from the commit, could you also push up the relevant changes in the step_runs.sql file (I only see the generated Go file).

I actually did not understood how it was working so changed in the go files. I will put some efforts in to get the tests up on my local tonight to test it out if you are fine with this as the high level approach @abelanger5, put it together as a fast example and have not had the time to deep dive in to hatchet.

@NikeNano NikeNano requested a review from abelanger5 January 30, 2025 11:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants