Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jobs Stuck in Enqueued State #2355

Open
mwasson74 opened this issue Jan 24, 2024 · 8 comments
Open

Jobs Stuck in Enqueued State #2355

mwasson74 opened this issue Jan 24, 2024 · 8 comments

Comments

@mwasson74
Copy link

I realize I'm not using the latest versions of things but these were the latest versions when I started having this issue in production. But due to stdump issue IndexOutOfRangeException - What am I doing wrong? I could not get a stack trace dump when I had the latest version of the packages.

ASP.NET Core .NET 6
Hangfire.AspNetCore" Version="1.8.6"
Hangfire.Console" Version="1.4.2"
Hangfire.Core" Version="1.8.6"
Hangfire.Dashboard.BasicAuthorization" Version="1.0.2"
Hangfire.Mongo" Version="1.9.12"

stdump_hangfire.txt

Classes have this attribute applied: SkipWhenPreviousJobIsRunningAttribute.txt

Execute Methods have [DisableConcurrentExecution("{0}", 3)] applied

image

image

HangfireDashboard

@odinserj
Copy link
Member

Thanks for the dump file! I believe the following thread is the most interesting one. It holds a semaphore, so other worker threads are waiting on its completion. And if this thread stuck, then new background jobs will not be processed. And it is likely it's stuck.

I found the following issue on GitHub - dotnet/runtime#70656 - with a similar stack trace happened in .NET 6.X and that issue states the issue was fixed in .NET 7.0. I see you are using an affected version, so perhaps the best recommendation I can give is to upgrade to a newer .NET version. Unfortunately, I also see dotnet/runtime#83455, but looks like it was fixed in .NET 7.0.7 and 8.0.

Thread #41
  OS Thread ID:      81092
  AppDomain Address: 1776550875936
  State:             176672

  Managed stack trace:
   - [InlinedCallFrame] (Interop+Winsock.recv) at System.Net.Sockets.dll
   - [InlinedCallFrame] (Interop+Winsock.recv) at System.Net.Sockets.dll
   -  at 
   - System.Net.Sockets.Socket.Receive(System.Span`1<Byte>, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef) at System.Net.Sockets.dll
   - System.Net.Sockets.NetworkStream.Read(System.Span`1<Byte>) at System.Net.Sockets.dll
   - System.Net.Security.SslStream+<EnsureFullTlsFrameAsync>d__186`1[[System.Net.Security.SyncReadWriteAdapter, System.Net.Security]].MoveNext() at System.Net.Security.dll
   -  at 
   -  at 
   - System.Net.Security.SslStream+<ReadAsyncInternal>d__188`1[[System.Net.Security.SyncReadWriteAdapter, System.Net.Security]].MoveNext() at System.Net.Security.dll
   -  at 
   - System.Net.Security.SslStream.Read(Byte[], Int32, Int32) at System.Net.Security.dll
   - MongoDB.Driver.Core.Misc.StreamExtensionMethods.ReadBytes(System.IO.Stream, Byte[], Int32, Int32, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBuffer(System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveBuffer(Int32, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.Connections.BinaryConnection.ReceiveMessage(Int32, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.IMessageEncoderSelector, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.MessageEncoderSettings, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool+PooledConnection.ReceiveMessage(Int32, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.IMessageEncoderSelector, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.MessageEncoderSettings, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.ConnectionPools.ExclusiveConnectionPool+AcquiredConnection.ReceiveMessage(Int32, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.IMessageEncoderSelector, MongoDB.Driver.Core.WireProtocol.Messages.Encoders.MessageEncoderSettings, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.WireProtocol.CommandUsingCommandMessageWireProtocol`1[[System.__Canon, System.Private.CoreLib]].Execute(MongoDB.Driver.Core.Connections.IConnection, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll
   - MongoDB.Driver.Core.WireProtocol.CommandWireProtocol`1[[System.__Canon, System.Private.CoreLib]].Execute(MongoDB.Driver.Core.Connections.IConnection, System.Threading.CancellationToken) at MongoDB.Driver.Core.dll

@mwasson74
Copy link
Author

@odinserj, thank you so much for getting back to me on this so quickly!! I have upgraded to .NET 8 just now and am about to deploy to see how it goes!! 🤞

@mwasson74
Copy link
Author

It did not go well. Here is the stack trace from when it happened again:

stdump_hangfire2.txt

ASP.NET Core .NET 8
Hangfire.AspNetCore Version="1.8.9"
Hangfire.Console Version="1.4.2"
Hangfire.Core Version="1.8.9"
Hangfire.Dashboard.BasicAuthorization Version="1.0.2"
Hangfire.Mongo Version="1.9.16"

@odinserj
Copy link
Member

Hm, so the main issue is that the number of enqueued metrics is inconsistent with the record themselves, e.g. it shows there are some jobs, but you don't see them?

image

@mwasson74
Copy link
Author

That is, I assume, the symptom of the underlying issue. When this happens, the system thinks those jobs are still running and won’t enqueue them again. So in the instance from the screen shot, we now have 63 unique recurring jobs that never get enqueued again. The only way I can find to get them running again is to stop the app pool, drop all hangfire.* collections from mongo, and then start the app pool again. (we add the recurring jobs on startup)

@odinserj
Copy link
Member

In this case, I might be causing you to go in a wrong direction with that method and .NET upgrade, sorry for this.

I think it's better to raise an issue in the Hangfire.Mongo repository and describe the situation, because counters and actual contents should be consistent with each other.

@jonathancounihan
Copy link

I have the same issue with the SQL storage - there are always 10 jobs in the counter but nothing is enqueued.

.NET 4.6.1
Hangfire 1.8.6
Hangfire.Core 1.8.6
Hangfire.SqlServer 1.8.6.

image

@mwasson74
Copy link
Author

@jonathancounihan

I am using Hangfire.Mongo and the owner said that he's found a bug in Hangfire.Mongo and he's pretty sure the same would happen with Sql Storage, too. gottscj/Hangfire.Mongo#380 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants