Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mark as failed stuck async jobs [MARXAN-1554] #1168

Merged
merged 5 commits into from
Jul 14, 2022

Conversation

angelhigueraacid
Copy link
Contributor

This PR adds async jobs garbage collector. So in case an async job gets stuck , a failed api events for the given async job will be send. We will check wether an async jobs is stuck or not when users logs in for every project and scenario where the user has any role.

For every kind of async job, a default 8 hours interval is set to define if the async job is stuck or not, this behavior can be easily updated for each individual async job kind changing the implementation of getMaxHoursForAsyncJob for the corresponding async job.

Feature relevant tickets

If an async job gets "stuck", it should be marked as failed automatically

@angelhigueraacid angelhigueraacid added the API Everything related the api label Jul 12, 2022
@angelhigueraacid angelhigueraacid self-assigned this Jul 12, 2022
@vercel
Copy link

vercel bot commented Jul 12, 2022

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated
marxan ✅ Ready (Inspect) Visit Preview Jul 14, 2022 at 6:36AM (UTC)
marxan-storybook ✅ Ready (Inspect) Visit Preview Jul 14, 2022 at 6:36AM (UTC)

@angelhigueraacid
Copy link
Contributor Author

Right now, for every project 6 queries and will be performed (and 11 queries for every scenario), in case we detect this could be a problem, the current implementation could be modified to perform each of the project queries in one query (the same goes for scenario queries) and then call each of the async jobs implementation with the result of the query

Copy link
Member

@hotzevzl hotzevzl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @angelhigueraacid - great PR 🙌🏼

my only concern is about trying to do too much at once - see my inline notes on whether we should consider processing each project and scenario check sequentially, and each job kind check within each of them sequentially too, since we shouldn't be too concerned about the checks completing as soon as possible (and conversely if we unleash potentially hundreds of checks at very short interval on a large api_events table, we may actually overload the db 🤔

@angelhigueraacid
Copy link
Contributor Author

thanks @angelhigueraacid - great PR 🙌🏼

my only concern is about trying to do too much at once - see my inline notes on whether we should consider processing each project and scenario check sequentially, and each job kind check within each of them sequentially too, since we shouldn't be too concerned about the checks completing as soon as possible (and conversely if we unleash potentially hundreds of checks at very short interval on a large api_events table, we may actually overload the db 🤔

I will gladly change the code to process project and scenarios one by one and each async job kind within each of the sequentially. I had this same concern and i believe this change will make it safer. Thanks for the suggestion

@hotzevzl hotzevzl changed the title Feat/marxan 1554 mark as failed stuck async jobs mark as failed stuck async jobs [MARXAN-1554 ] Jul 14, 2022
@hotzevzl hotzevzl changed the title mark as failed stuck async jobs [MARXAN-1554 ] mark as failed stuck async jobs [MARXAN-1554] Jul 14, 2022
@hotzevzl hotzevzl merged commit 3d39d27 into develop Jul 14, 2022
@hotzevzl hotzevzl deleted the feat/MARXAN-1554-mark-as-failed-stuck-async-jobs branch July 14, 2022 12:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Everything related the api
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants