-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement server-side role management #480
Conversation
Codecov Report
@@ Coverage Diff @@
## master #480 +/- ##
==========================================
+ Coverage 59.01% 60.97% +1.96%
==========================================
Files 33 34 +1
Lines 5929 6340 +411
Branches 1764 1884 +120
==========================================
+ Hits 3499 3866 +367
+ Misses 1363 1293 -70
- Partials 1067 1181 +114
... and 2 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
5a3b910
to
191ca5a
Compare
a21b849
to
f0d2b17
Compare
6c609fa
to
6449a4c
Compare
I'm not entirely satisfied with this branch yet, but I think it's ready for some other eyes on it. Jepsen seems mostly happy, including the roles checker -- there is an "extra online spare" that I'm looking into, but it doesn't occur very often (about once per CI run) and I've also seen it sporadically with the go-dqlite client-side role management. I've written some unit tests for the role assignment algorithm, but still want to add more of those, plus a couple of integration tests once I figure out how to set that up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good on first pass
src/server.c
Outdated
} | ||
|
||
if (d->role_management) { | ||
/* TODO how should the server respond to client requests while in this state? */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we let raft handle it transparently? raft normally rejects raft_apply
requests when it's transferring leadership.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That still leaves the window after the transfer completes where we're transferring voting rights, yeah?
src/role_management.c
Outdated
polling->i = j; | ||
work = &work_objs[j]; | ||
work->data = polling; | ||
rv = uv_queue_work(&d->loop, work, pollClusterWorkCb, pollClusterAfterWorkCb); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want to increase the libuv thread pool size, see https://docs.libuv.org/en/v1.x/threadpool.html something like 128 by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128 seems large to me, but I don't have experience tuning these things. For polling the cluster we run one task on the thread pool for each node in the configuration, so typically 3-5 tasks total.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if the capacity of the thread pool gets exhausted, then further calls to uv_queue_work
will not fail, but rather queued and executed when threads become available. In general the size of the threadpool should ideally match the hardware capacity (e.g. the number of cores of the host), in order to reduce contention and context switches.
I think we can safely leave libuv's default.
|
It's our code that is calling |
f37333c
to
03e97b2
Compare
e0f827a
to
e48bf70
Compare
Broken up into commits for easier review :) |
I think there is still a pesky leak of allocations that should be cleaned up by RoleManagementDrain, but when I went to look at the GHA failure for the latest commit, it had turned into a success, hmm. |
Much easier to review indeed! |
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I left some other comments.
It'd be nice to try to enable role management in the dqlite server, disable it in the dqlite app
code and then try to run the test/roles.sh
suite and see how it behaves.
It'd be also nice to add to the dqlite suite itself a test similar to go-dqlite
's test/roles.sh
, either as shell script or C test. That is going to require some work though and could be done separately.
@@ -114,7 +110,6 @@ int dqlite__init(struct dqlite_node *d, | |||
rv = DQLITE_ERROR; | |||
goto err_after_ready_init; | |||
} | |||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand that this PR somehow touches this area, and that it probably feels awkward to add similar MacOS-specific calls for the new semaphore being add by this PR.
However it seems that @rabits has already put some meaningful effort to try to support MacOS in dqlite and raft, and it seems there's some community interest in MacOS support.
What about leaving this code in place for now? We don't need to add anything MacOS-specific for the new semaphore right now, but perhaps we'll find folks that can start from the initial work of @rabits and move it forwards. Actually we might have not given @rabits enough support, considering that his raft PR is still open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, yeah, forgot about this PR. So raft become a big problem for me at a time, and since I don't use dqlite anymore - I can't continue to work on the macos support.
* automatic role management is enabled, servers in a dqlite cluster will | ||
* autonomously (without client intervention) promote and demote each other | ||
* to maintain a specified number of voters and standbys, taking into account | ||
* the health, failure domain, and weight of each server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be useful to move or copy this part of the comment (from "When automatic role management is enabled" onward) to the public docstring of dqlite_node_enable_role_management()
.
src/roles.c
Outdated
|
||
struct change_record | ||
{ | ||
uint64_t id; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the raft_id
type here. Fixed-size fields like uint64_t
should only be used when serializing or deserializing in-memory structures. This way the compiler can choose the best implementation depending on the hardware platform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is our code actually prepared to deal with a hypothetical target where unsigned long long
and uint64_t
are not the same type? It seems to me that if the IDs, failure domains, etc. are required by the wire protocol to be 64 bits, we should just be using uint64_t
or a typedef of it everywhere. Using the "wobbly-width" C integer types in that kind of situation is (IMO) either pointless, if the width of the type is effectively fixed (as for unsigned long long
), or buggy, if the width can vary across targets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't find the argument from optimization persuasive because I can't think of a target where unsigned long long
is 128 bits, and even on such a target the gains from using a 128-bit type over a 64-bit type seem to me very unlikely to justify the complexity incurred by mixing unsigned long long
and uint64_t
when these are not the same width.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we had this conversation before, the principle is that model/language types are independent from protocol/serialization concerns.
At some point we might change the serialization type of a given model/language type without changing the model/language type (thus without breaking the API/ABI). For example we might change the protocol to allow for greater values, without requiring a language/model change.
One recent example of that was the allowed number of statement parameters. The protocol version X only allowed for 256 parameters. But the associated language type was not uint8_t
(or whatever Go equivalent). By separating the two concerns we were able to bump the protocol version to version X+1 with a different serialization format/type for the number of the statement parameters without needing to change the language. In the implementation there was a missing check for the 256 limit, but that's fine, it was just a bug, and the fix was not to change the language type from int
to uint8_t
(or whatever exact Go types), but rather to add the check.
In this specific case the current situation is that the failure domain must be a value between 0
and 2^64
, but we don't want to enforce that with a uint64_t
type, we want to enforce that with a check in the code (and add documentation if missing). This way we can evolve the wire protocol in the future, if required. We're most probably not going to need that for the failure domain values specifically, but this the general idea.
Is our code actually prepared to deal with a hypothetical target where
unsigned long long
anduint64_t
are not the same type?
That I didn't check. If it's not, it should be fixed with a check on the value, and documented.
Note that boundary checks are just one type of model value checks. In general any expectation we have on a value (e.g. the value should be "odd", or it should be a multiple of 3 or whatever) should be explicitly coded.
It seems to me that if the IDs, failure domains, etc. are required by the wire protocol to be 64 bits, we should just be using
uint64_t
or a typedef of it everywhere.
As said, the wire protocol is opaque, and can change. What should be more stable is the language/model API.
Using the "wobbly-width" C integer types in that kind of situation is (IMO) either pointless, if the width of the type is effectively fixed (as for
unsigned long long
), or buggy, if the width can vary across targets.
This is indeed the point, the width of the type is not effectively fixed by design. The current boundary values are [0..2^64] (and as said they should be documented and checked), but those could change in the future without changing the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I don't find the argument from optimization persuasive because I can't think of a target where
unsigned long long
is 128 bits, and even on such a target the gains from using a 128-bit type over a 64-bit type seem to me very unlikely to justify the complexity incurred by mixingunsigned long long
anduint64_t
when these are not the same width.)
The optimization argument is ancillary and definitely not the main one, the point is more to separate concerns conceptually. As said, failure domains don't need to be 64 bits, and should not be modeled as such. As said, this doesn't matter much in the specific case of failure domains, but it might matter in other cases, and for consistency we apply the same approach everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a 32-bit world, it would be good practice to use int
for your APIs, not int32_t
. At that point you might have checks in place to ensure that your values were at most 2^32 (because for example you were serializing them using int32_t
). Then you start supporting new 64-bit architectures, and you don't need to change the API (int
to int64_t
), but you might change the serialization and relax your boundary checks to allow for greater values. User code does not need to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really not convinced by this argument, but I will change the PR to match the status quo.
src/roles.c
Outdated
|
||
if (status != 0) { | ||
/* XXX */ | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure because I still have to familiarize with the new code here, but if status
is non-zero, wouldn't we end up in a "stuck" state? Because *polling->count
never reaches polling->n_cluster
and adjustClusterCb
is never called.
In practice, I think the only way for status
to be non-zero would be to call uv_cancel()
on the work handle, which we currently don't do. However it's something we might eventually want to do (e.g. to improve RolesCancelPendingChanges()
or to implement some sort of "hard" dqlite_node_stop()
functionality).
It doesn't need to be fixed here, but if what I'm saying is accurate, I would suggest to turn if (status != 0)
into assert(status != 0)
(assuming that we'd break otherwise), and to add a comment to the assertion describing what I said above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, an assert sounds good.
Review comments addressed. I've left the commit removing
|
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
Signed-off-by: Cole Miller <cole.miller@canonical.com>
This is still rough and most error handling is stubbed out, but I think it represents a workable approach to moving role management onto the server.
Signed-off-by: Cole Miller cole.miller@canonical.com