You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When our application starts up and tries to connect to an external Orleans cluster (which is completely offline), the host that contains the cluster client crashes during its .StartAsync method.
This is what we see in the logging:
Orleans.Runtime.SiloUnavailableException: Could not find any gateway in Orleans.Runtime.Membership.AdoNetGatewayListProvider. Orleans client cannot initialize.
at async Task Orleans.Messaging.GatewayManager.StartAsync(CancellationToken cancellationToken) in /_/src/Orleans.Core/Messaging/GatewayManager.cs:line 75
at async Task Orleans.OutsideRuntimeClient.StartInternal(CancellationToken cancellationToken)+(?) => { } in /_/src/Orleans.Core/Runtime/OutsideRuntimeClient.cs:line 156
at async Task Orleans.OutsideRuntimeClient.StartInternal(CancellationToken cancellationToken)+ExecuteWithRetries(?) in /_/src/Orleans.Core/Runtime/OutsideRuntimeClient.cs:line 183
at async Task Orleans.OutsideRuntimeClient.StartInternal(CancellationToken cancellationToken) in /_/src/Orleans.Core/Runtime/OutsideRuntimeClient.cs:line 155
at async Task Orleans.OutsideRuntimeClient.Start(CancellationToken cancellationToken) in /_/src/Orleans.Core/Runtime/OutsideRuntimeClient.cs:line 144
at async Task Orleans.ClusterClient.StartAsync(CancellationToken cancellationToken) in /_/src/Orleans.Core/Core/ClusterClient.cs:line 72
at async Task Microsoft.Extensions.Hosting.Internal.Host.StartAsync(CancellationToken cancellationToken)
at async Task Dobco.POW4.DicomProcessor.Client.Orleans.DpClusterClientBackgroundService.ExecuteAsync(CancellationToken stoppingToken) in C:/git/pow4/Dobco.POW4.DicomProcessor.Client/Orleans/DpClusterClientBackgroundService.cs:line 35
1
Technically it's fine that some exception is logged - the cluster is offline after all -, but because the host crashes during startup, no reconnection attempt is ever made, so if the cluster starts up a little bit later, it never works.
Today, we're working around this issue with the following background service:
usingMicrosoft.Extensions.Hosting;usingMicrosoft.Extensions.Logging;usingMicrosoft.Extensions.Options;publicinterfaceIClusterClientProvider{IHostHost{get;}IClusterClientClusterClient{get;}}publicsealedclassClusterClientBackgroundService:BackgroundService{privatereadonlyILogger<ClusterClientBackgroundService>_logger;privatereadonlyIClusterClientProvider_clusterClientProvider;privatebool_isHostStarted;publicClusterClientBackgroundService(ILogger<ClusterClientBackgroundService>logger,IClusterClientProviderclusterClientProvider){_logger=logger??thrownewArgumentNullException(nameof(logger));_clusterClientProvider=clusterClientProvider??thrownewArgumentNullException(nameof(clusterClientProvider));}protectedoverrideasyncTaskExecuteAsync(CancellationTokenstoppingToken){// Setting up the host and the cluster client is omitted for brevityvarhost=_clusterClientProvider.Host;varretryDelay=TimeSpan.FromSeconds(30);while(!stoppingToken.IsCancellationRequested&&!_isHostStarted){try{try{_logger.LogInformation("Starting cluster client host");awaithost.StartAsync(stoppingToken);_isHostStarted=true;_logger.LogInformation("Cluster client host started successfully");}catch(OrleansExceptione){_isHostStarted=false;_logger.LogWarning(e,"Failed to start cluster client host, will retry in {RetryDelay}",retryDelay);awaitTask.Delay(retryDelay,stoppingToken);}}catch(OperationCanceledException){// Ignored}}}publicoverrideasyncTaskStopAsync(CancellationTokencancellationToken){_logger.LogInformation("Shutting down cluster client");awaitbase.StopAsync(cancellationToken);varhost=_clusterClientProvider.Host;if(_isHostStarted){try{awaithost.StopAsync(CancellationToken.None);}catch(Exceptione){_logger.LogWarning(e,"An error occurred while stopping the cluster client host");}}switch(host){caseIAsyncDisposableasyncDisposable:awaitasyncDisposable.DisposeAsync();break;caseIDisposabledisposable:disposable.Dispose();break;}_logger.LogInformation("Successfully shut down cluster client");}}
It would be better if the GatewayManager would gracefully handle all silos being offline, and just retry the initial connection from time to time.
The text was updated successfully, but these errors were encountered:
amoerie
changed the title
External cluster client host fails to start up when all silos are dead initially
External cluster client host fails to start up when all gateways are dead initially
Nov 10, 2023
I would understand if this issue were closed, but can I still vote for maybe adding such a retry filter as the default behavior, or at least documenting this clearly somewhere? If it already is, my apologies, but I failed to find it the first time around...
Orleans: 7.2.1
Relates to #7436
When our application starts up and tries to connect to an external Orleans cluster (which is completely offline), the host that contains the cluster client crashes during its
.StartAsync
method.This is what we see in the logging:
Technically it's fine that some exception is logged - the cluster is offline after all -, but because the host crashes during startup, no reconnection attempt is ever made, so if the cluster starts up a little bit later, it never works.
Today, we're working around this issue with the following background service:
It would be better if the
GatewayManager
would gracefully handle all silos being offline, and just retry the initial connection from time to time.The text was updated successfully, but these errors were encountered: