Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the preferred way to reconnect a client to a silo host? #7436

Closed
SeppPenner opened this issue Dec 10, 2021 · 4 comments
Closed

What is the preferred way to reconnect a client to a silo host? #7436

SeppPenner opened this issue Dec 10, 2021 · 4 comments
Assignees

Comments

@SeppPenner
Copy link
Contributor

I have the following debugging setup:
Runtime: Net6.0
Database: PostgreSQL 13 with TimeScaleDB enabled
OS: Windows 10

The original setup is a bit different (No idea if relevant):
Service: Net6.0 in Docker as background service (Worker template), Debian.
SiloHost: Runs as Linux service under CentOS Linux release 8.2.2004, but plain, not in Docker.
Database: PostgreSQL 12 with TimeScaleDB enabled on CentOS Linux release 8.2.2004.
Database and SiloHost on the same machine.

Hints for the database setup:

CREATE DATABASE example;

Afterwards, run the Orleans database scripts from:

The problem I face (See example project under https://github.com/SeppPenner/OrleansReconnection), is that the Orleans client doesn't reconnect properly. I will eleborate this a bit more:

  1. We didn't use reconnection mechanisms before and everything went fine (For some time).
  2. After a sudden period, the service calling the grain methods was getting strange errors from time to time like GrainTypeResolver for the grain xy not found.
  3. Afterwards, the service wasn't able to reconnect anymore to the silo host.
  4. Redeployments / restarting of the service solved the issue, of course.
  5. We added a method to try to catch the grain type resolver issues and reconnect the client manually (See method ExecuteAsync in the ExampleService class in the example project.
  6. That workes whenever the grain type resolver issues occur. However, it doesn't work if the silo is down and goes up again.
  7. I have played around with several scenarios (like OrleansClient.Close() before OrleansClient.Connect() or OrleansClient.Dispose() before OrleansClient.Connect()). Nothing worked, always some error was thrown (Either an error that the client is already initialized or that the grain couldn't be reached because the client was re-initialized).

As far as I understand the docs under https://dotnet.github.io/orleans/docs/host/client.html, reconnection shouldn't be added manually as it should work by default. Is that correct?

Back to the example project:

  1. Start the silo host project as service without debugging from Visual Studio (IIS won't work, I guess) after the database is initialized properly.
  2. Start the example service project as service with debugging from Visual Studio (IIS won't work, I guess).
  3. Kill the silo host service manually by task manager / programm close bar.
  4. Wait for the example service to notice the disconnection.
  5. Start the silo host service project as service again without debugging from Visual Studio (IIS won't work, I guess) after the database is initialized properly.
  6. The example service won't be able to reconnect to the silo host.

So, the final questions:

  • How should reconnection from the client side work in general?
  • What's wrong in the example if I want to avoid the GrainTypeResolver error?
@ghost ghost added the Needs: triage 🔍 label Dec 10, 2021
@benjaminpetit
Copy link
Member

Are you using static clustering like in your example? Do you have only one silo in your cluster?

If yes, it could explain why you see issues when the silo goes down then up again

@SeppPenner
Copy link
Contributor Author

SeppPenner commented Dec 14, 2021

Are you using static clustering like in your example? Do you have only one silo in your cluster?

Yes and yes. Currently, I only have one silo (In the future, there are 2 or 3 planned).

Is there anything I can change in the current setup despite adding a second silo for now?

@benjaminpetit
Copy link
Member

Static clustering isn't very robust when you have a silo restarting. I strongly suggest using a clustering provider like Azure Table (Maybe ADO since you are using a SQL server for grain persistence ?).

Even with one silo, your client will reconnect much faster once the silo is back up. And of course, two silos would be better :)

@SeppPenner
Copy link
Contributor Author

Static clustering isn't very robust when you have a silo restarting. I strongly suggest using a clustering provider like Azure Table (Maybe ADO since you are using a SQL server for grain persistence ?).

Ok, I will try that. We use PostgreSQL, not SQL server, but the ADO provider is basically "the same", I guess.

Even with one silo, your client will reconnect much faster once the silo is back up. And of course, two silos would be better :)

Ok, thanks. I will close the issue now and re-open it, if more questions occur. Thank you for the quick answer :)

@ghost ghost locked as resolved and limited conversation to collaborators Jan 14, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants