-
Notifications
You must be signed in to change notification settings - Fork 936
When I turn on SemiSyncEnforced, should it be automatically configured when the instance is discovered? #1360
Comments
This is actually a good point. Orchestrator also does not react on when SemiSyncEnforced changes through the DetectSemiSync... query. We're just thinking about implementing something that does the toggling, but it'd be nice to have it happen within Orchestrator. |
I think this is a dup of #1270 possibly. |
@shlomi-noach if you think this is a good idea, I'd be more than happy to implement it. |
Sorry for the delay. History: Later on I began work on enforcing a state. I went as far as analyzing the topology and generating a structure analysis: Let's first discuss how we might want to go about it. Will |
I just wanted to let you know that I am very much thinking about a solution for this, and that my team has collectively spent hours and hours on a solution outside of orchestrator, so I am sure we can come up with something elegant within it. I didn't have a ton of time today but I should be able to respond tomorrow. I want to make sure that everything I post is concise and doesn't waste your time. |
Awesome. Please take your time. |
Ok so I don't have a solution yet, but I feel like we can start the conversation (I'm happy to chat on Slack/whatever or even to Zoom if that makes it easier for you). My focus is obviously on our own topology, but we can generalize it later. It's just much easier to think and talk about a concrete example. Example topologyWe have dozens (soon hundreds) of 3-host clusters that look like this:
1 is the source, 2 and 3 are replicas. We use 3 exclusively as a backup host, so we don't ever want to fail over to it, even when 1 and 2 are down. A few scenariosRequirements:
MySQL settings (in all scenarios):
a. Failure of 1 (source) b. Re-adding 1 as a replica c. Failure of 2 (current semi-sync replica) d. Recovery of 2 (... There are more, but I'll leave it at that for now ...) Current implementation of
|
Okay after looking at the code for a little bit I think that the "background process" thing is not how Orchestrator currently works, and it seems unnecessary too. It looks like I could "simply" check for "too many" or "too few" semi sync replicas in I believe the "too few semi sync replicas" analysis already exists as I will give this a go I think. Possibly this weekend, though very likely early next week. Any input appreciated. |
Here is a WIP: #1373 |
Still thinking this over. Just one quick note:
Not exactly. The primary can be unlocked and still have too few semi-sync replicas. The primary is only locked if the semi sync wait timeout is long, otherwise it eventually reverts to async replication and becomes unlocked. |
Take your time. I will hold off on any work for this issue (as well as #1369, since it's so closely related) for now until you had time to think on this. I also joined the Vitess Slack if you want to talk less asynchronously.
Ah yes of course. We've actually been bitten pretty bad by this fallback, so I'm not sure why I didn't think about that. Good catch.
I am not quite sure how I didn't see this part of your comment before, and that |
I know it's only been a couple of days, but did you have some time to think about this? Just wanted to mention that I am more than happy to implement the solution that we come up with. However, I'm sure you understand that I don't really want to put any a ton of work in unless I can be somewhat sure that the code will end up getting merged in. That's why to PR is only a proof-of-concept, and not really doing everything the right way. I don't want to be stuck maintaining a fork at my company. That's a recipe for disaster :-D At my company, we currently have automatic failovers turned off entirely on all our clusters because we "lost" a ton of transactions during a catastrophic failover in which semi-sync had failed, so we're really eager to get something within or outside of Orchestrator going soon. I realize that you are doing this in your spare time and I'd like to thank you again for your fantastic work on this project. It's really awesome!! |
Hi @binwiederhier and thank you again. Sorry for the delays as I jump in and out of context between various tasks. I too appreciate your time. It looks like you're on a good path and I'll be happy to incorporate such changes into
Correct
Again, correct. Le'ts use this path. Basically Let's completely ignore the existing More thoughts/comments on the PR. |
This is great. Thank you for responding. I will take a few days this week to work on it. I'm actually looking forward to it. :-) |
Can I assume the issue has been resolved and close it? |
When I turn on semi-sync enforced replication and import the database nodes via discovery, it doesn't bother to configure the semi-sync option for me.
This makes me need additional operations when I initialize.
The text was updated successfully, but these errors were encountered: