-
Notifications
You must be signed in to change notification settings - Fork 66
Replica forests: Suboptimal initial layout + incorrect layout on rebootstrap after scaleout #616
Comments
Looking forward very much to the PR for this! I agree that it is not the best algorithm, but it is smarter than you might think (unless it is broken). There have been a few tweaks not so long ago, so make sure you check out latest version. The general idea with forest replication was:
I ran various checks, and it should spread replicas evenly across all hosts. If talking about replicas for Modules and such, yes that is likely sub-optimal, but you can target hosts with forests manually as well in ml-config.. PS: scale out is indeed a bit tricky, but we came to the conclusion that as long as you scale up and down in FILO style (first in, last out), you should be relatively safe.. |
This sounds like different views of the current functionality. @tdiepenbrock, please make sure your analysis is based on the most code. |
Have these changes been merged to the master branch? We are working off of td |
Looks like all my work should be in master, and that would indeed be 1.7.3. The round-robin formula should be in this line: https://github.com/marklogic/roxy/blob/master/deploy/lib/xquery/setup.xqy#L1179
The fragile part of this is that $hosts and $hostnr can be unpredictable if cluster size changes.. |
By the way, I wasn't saying everything should already be working as described by Thomas, but it does spread across hosts in the cluster. Maybe it helps to take a literal example, look at what Roxy makes out of it, and then pinpoint flaws in that? I do can imagine that if N drops out, N+1 can get more extra load than any of the other nodes in the cluster, but an example would help visualize, and think about improvements.. |
Hm, Geert, using 1.7.3 we see all forests from host N being replicated to For a three-node cluster, the layout Roxy 1.7.3 creates looks like this: Host 1: Host 2: Host 3: All of Host 1's forests are replicated to Host 2, all of Host 2's forests Host 1: Host 2: Host 3: In the event of a failure of one node, this results in each remaining node In Roxy 1.7.3, scaling this cluster by adding one node and re-bootstrapping Host 1: Host 2: Host 3: Host 4: Under even the 1.7.3 forest layout scheme, re-bootstrapping really should Host 1: Host 2: Host 3: Host 4: I believe the line of code you are pointing to essentially says that if we Host 1: Host 2: Host 3: Host 4: Host 5: I think what we would really like to see in this case is this: Host 1: Host 2: Host 3: Host 4: Host 5: This layout still lets us lose two nodes, but in the failure case evenly The code that Joe M. is writing lays out the forests this way and also |
Yes, the issue with scaling out is that you would have to literally migrate forests to different hosts, at least the replicated ones. Currently Roxy does calculate the layout correctly, but most of the rep forests already exists, and Roxy won't reassign them to a different host. In fact that is not allowed, other then deleting and recreating them. I was kinda hesitant going that way.. Re your example, yeah I think you are right. If you have multiple forests on one host, each first rep goes to N+1. My formula doesn't include forestnr, but that should be a relatively small change I think.. Anyhow, still looking forward to a PR! |
PR coming soon, Joe M. is working part time on this so he will fork when Yes, ML does not allow you to re-assign forests to other hosts. The plan |
Keep in mind you probably only need to delete/recreate rep forests on node 1 and 2: just those that wrap around from last host, because of the new host becoming the new last one.. |
I think it depends on how many forests per host--but I'll mention the idea |
Yeah, true. Still, it probably only affects a limited number of forests.. |
Fixed in #684 |
Replica forests have some issues:
We have been implementing a fix for this on our project in Roxy. The updated code does the following:
I believe the scaleout process involves multiple Roxy commands that need to be performed manually when inspection of the admin console/forest status shows the cluster is ready for the next step--I will verify with Joe M., who is working on this.
We will fork and do a pull request to commit the code back to the project for review. In the meantime please feel free to comment on the forest layout strategy.
The text was updated successfully, but these errors were encountered: