Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent breaking scheduler on task_instance.priority_weight column has overflow value #22784

Closed
wants to merge 1 commit into from

Conversation

kosteev
Copy link
Contributor

@kosteev kosteev commented Apr 6, 2022

closes: #22781

@eladkal
Copy link
Contributor

eladkal commented Apr 6, 2022

The DAG shared on the issue runs just fine with Sqlite:
Screen Shot 2022-04-06 at 17 55 52

Screen Shot 2022-04-06 at 17 55 41

I'm not sure if this is the right fix. Why should the model be concerned with issues originated from a specific DB?
I suspect that this could be also reported on retries, pool_slots etc.
I think the fix should be in the scheduler level (at least by not crashing) / raising broken DAG?

@uranusjr
Copy link
Member

uranusjr commented Apr 6, 2022

The fix is specific to Postgres I believe. Every database has different integer ranges so this will need to contain some if-else branches to cover each of them. Although honestly maybe it’s even better to just document this is a database limitation and you just can’t go over the limit.

@eladkal
Copy link
Contributor

eladkal commented Apr 6, 2022

The fix is specific to Postgres I believe. Every database has different integer ranges so this will need to contain some if-else branches to cover each of them. Although honestly maybe it’s even better to just document this is a database limitation and you just can’t go over the limit.

I agree but the point of concern is that this cause the scheduler to crash which is not good.
It means that one dag can cause cluster wide problems. I think we need to find a way to locallize the effect so only the problematic dag will be effected.

@uranusjr
Copy link
Member

uranusjr commented Apr 6, 2022

Yeah the scheduler should not crash, but perhaps we could catch the exception or something like that. I seem to recall there’s a similar situation for something else a while ago (datetime out of range?)

@eladkal
Copy link
Contributor

eladkal commented Apr 6, 2022

I recall only #17003 which also caused scheduler crash with divide by zero

@ashb
Copy link
Member

ashb commented Apr 6, 2022

Is silently clamping it right, or should this be a parse time DAG import error?

@uranusjr
Copy link
Member

uranusjr commented Apr 6, 2022

Ah found it, 2213635

So I guess the better fix here is to do something similar, check for Postgres, and prevent an out-of-bounds value to be assigned for Postgres specifically.

+1 to an import error.

@kosteev
Copy link
Contributor Author

kosteev commented Apr 7, 2022

All right, validating it during DAG import and throwing error makes sense.
What about different databases? Should we handle all the databases taking into account different constraints of each?

@uranusjr
Copy link
Member

uranusjr commented Apr 7, 2022

Let’s start with adding just a check for Postgres now, we can add more later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error "NumericValueOutOfRange integer out of range" in scheduler, task_insance.priority_weight column overflow
4 participants