What is the preferred way to reconnect a client to a silo host? #7436

SeppPenner · 2021-12-10T14:07:31Z

I have the following debugging setup:
Runtime: Net6.0
Database: PostgreSQL 13 with TimeScaleDB enabled
OS: Windows 10

The original setup is a bit different (No idea if relevant):
Service: Net6.0 in Docker as background service (Worker template), Debian.
SiloHost: Runs as Linux service under CentOS Linux release 8.2.2004, but plain, not in Docker.
Database: PostgreSQL 12 with TimeScaleDB enabled on CentOS Linux release 8.2.2004.
Database and SiloHost on the same machine.

Hints for the database setup:

CREATE DATABASE example;

Afterwards, run the Orleans database scripts from:

The problem I face (See example project under https://github.com/SeppPenner/OrleansReconnection), is that the Orleans client doesn't reconnect properly. I will eleborate this a bit more:

We didn't use reconnection mechanisms before and everything went fine (For some time).
After a sudden period, the service calling the grain methods was getting strange errors from time to time like GrainTypeResolver for the grain xy not found.
Afterwards, the service wasn't able to reconnect anymore to the silo host.
Redeployments / restarting of the service solved the issue, of course.
We added a method to try to catch the grain type resolver issues and reconnect the client manually (See method ExecuteAsync in the ExampleService class in the example project.
That workes whenever the grain type resolver issues occur. However, it doesn't work if the silo is down and goes up again.
I have played around with several scenarios (like OrleansClient.Close() before OrleansClient.Connect() or OrleansClient.Dispose() before OrleansClient.Connect()). Nothing worked, always some error was thrown (Either an error that the client is already initialized or that the grain couldn't be reached because the client was re-initialized).

As far as I understand the docs under https://dotnet.github.io/orleans/docs/host/client.html, reconnection shouldn't be added manually as it should work by default. Is that correct?

Back to the example project:

Start the silo host project as service without debugging from Visual Studio (IIS won't work, I guess) after the database is initialized properly.
Start the example service project as service with debugging from Visual Studio (IIS won't work, I guess).
Kill the silo host service manually by task manager / programm close bar.
Wait for the example service to notice the disconnection.
Start the silo host service project as service again without debugging from Visual Studio (IIS won't work, I guess) after the database is initialized properly.
The example service won't be able to reconnect to the silo host.

So, the final questions:

How should reconnection from the client side work in general?
What's wrong in the example if I want to avoid the GrainTypeResolver error?

The text was updated successfully, but these errors were encountered:

benjaminpetit · 2021-12-13T09:40:52Z

Are you using static clustering like in your example? Do you have only one silo in your cluster?

If yes, it could explain why you see issues when the silo goes down then up again

SeppPenner · 2021-12-14T15:12:19Z

Are you using static clustering like in your example? Do you have only one silo in your cluster?

Yes and yes. Currently, I only have one silo (In the future, there are 2 or 3 planned).

Is there anything I can change in the current setup despite adding a second silo for now?

benjaminpetit · 2021-12-14T15:47:43Z

Static clustering isn't very robust when you have a silo restarting. I strongly suggest using a clustering provider like Azure Table (Maybe ADO since you are using a SQL server for grain persistence ?).

Even with one silo, your client will reconnect much faster once the silo is back up. And of course, two silos would be better :)

SeppPenner · 2021-12-15T12:55:48Z

Static clustering isn't very robust when you have a silo restarting. I strongly suggest using a clustering provider like Azure Table (Maybe ADO since you are using a SQL server for grain persistence ?).

Ok, I will try that. We use PostgreSQL, not SQL server, but the ADO provider is basically "the same", I guess.

Even with one silo, your client will reconnect much faster once the silo is back up. And of course, two silos would be better :)

Ok, thanks. I will close the issue now and re-open it, if more questions occur. Thank you for the quick answer :)

ghost added the Needs: triage 🔍 label Dec 10, 2021

ReubenBond assigned benjaminpetit Dec 14, 2021

SeppPenner closed this as completed Dec 15, 2021

ghost locked as resolved and limited conversation to collaborators Jan 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the preferred way to reconnect a client to a silo host? #7436

What is the preferred way to reconnect a client to a silo host? #7436

SeppPenner commented Dec 10, 2021

benjaminpetit commented Dec 13, 2021

SeppPenner commented Dec 14, 2021 •

edited

Loading

benjaminpetit commented Dec 14, 2021

SeppPenner commented Dec 15, 2021

What is the preferred way to reconnect a client to a silo host? #7436

What is the preferred way to reconnect a client to a silo host? #7436

Comments

SeppPenner commented Dec 10, 2021

benjaminpetit commented Dec 13, 2021

SeppPenner commented Dec 14, 2021 • edited Loading

benjaminpetit commented Dec 14, 2021

SeppPenner commented Dec 15, 2021

SeppPenner commented Dec 14, 2021 •

edited

Loading