-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS RDS Failover hangs hangfire for 3 days and it cannot start again 10K+ pending jobs #81
Comments
I started to get the following frequently also after DB upgrade to Aurora 5.7.mysql_aurora.2.09.2
|
@arnoldasgudas any idea, I am having serious issues. This is also blocking my server too. |
I think I had something wrong with the tables, one record was unable to delete so I dumped and rebuild the tables with the right data. Now above issues don't happen however, I keep on getting the following:
Any idea? There is no pressure on DB, only one hangfire instance, no other so why is this happening? |
Figured the issue completely. |
Hi,
I recently upgraded my Aurora DB (Mysql) to the latest version and during the process DB rebooted, failed over etc. But one thing I was unable notice was, hangfire process get into partial working state on all servers (4)
It was able to queue jobs but not able to process anything at all.
As a result, I restarted the instances so it to be kicked in.
Here are the problems happened as far as I understand
To handle the case, I made another database so systems will use fresh empty DB and it worked fine. But problem is I want to process locked up 10K+ jobs so I put one of the servers back to old DB connection. But now, this single server gets locked up and stuck just like the screenshot above. Problem is that, I think it uses so much memory that I cannot even get connected to the server via SSH. the only option I have is to reboot the server and once it is up, stop the service so I can modify connectionstring so server does respond and process properly.
Looking for the following
For #1, DB failover shall not stop processing completely (like defined above, taking new queue but not processing any) see below
Again for #1: Once DB is available again, do not bombard with whatever is having to completely lock up the systems and be in not able to process any DB operations like can be seen below:
Environment information:
Ubuntu servers, running .net core 3.1 and Hangfire.MySqlStorage 2.0.2 with Hangfire.Core. 1.7.20.
I cannot use latest version of 2.0.3 because it requires MySqlConnector >= 1.0.0 where we use Pomelo that has not released latest to use it.
The text was updated successfully, but these errors were encountered: