Implement server-side role management #480

cole-miller · 2023-02-28T04:17:19Z

This is still rough and most error handling is stubbed out, but I think it represents a workable approach to moving role management onto the server.

Signed-off-by: Cole Miller cole.miller@canonical.com

codecov · 2023-02-28T04:21:51Z

Codecov Report

Merging #480 (ef1599a) into master (19a7924) will increase coverage by 1.96%.
The diff coverage is 67.14%.

@@            Coverage Diff             @@
##           master     #480      +/-   ##
==========================================
+ Coverage   59.01%   60.97%   +1.96%     
==========================================
  Files          33       34       +1     
  Lines        5929     6340     +411     
  Branches     1764     1884     +120     
==========================================
+ Hits         3499     3866     +367     
+ Misses       1363     1293      -70     
- Partials     1067     1181     +114

Impacted Files	Coverage Δ
src/transport.c	`51.29% <50.00%> (+5.19%)`	⬆️
src/roles.c	`63.58% <63.58%> (ø)`
src/server.c	`55.55% <84.84%> (+3.84%)`	⬆️
src/client/protocol.c	`42.66% <100.00%> (+9.16%)`	⬆️
src/config.c	`92.00% <100.00%> (+0.69%)`	⬆️

... and 2 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

cole-miller · 2023-03-21T03:20:25Z

I'm not entirely satisfied with this branch yet, but I think it's ready for some other eyes on it. Jepsen seems mostly happy, including the roles checker -- there is an "extra online spare" that I'm looking into, but it doesn't occur very often (about once per CI run) and I've also seen it sporadically with the go-dqlite client-side role management.

I've written some unit tests for the role assignment algorithm, but still want to add more of those, plus a couple of integration tests once I figure out how to set that up.

MathieuBordere

Looks good on first pass

MathieuBordere · 2023-03-21T14:49:47Z

src/server.c

+ }
+
+ if (d->role_management) {
+ /* TODO how should the server respond to client requests while in this state? */


Can we let raft handle it transparently? raft normally rejects raft_apply requests when it's transferring leadership.

That still leaves the window after the transfer completes where we're transferring voting rights, yeah?

test/unit/test_role_management.c

MathieuBordere · 2023-03-22T14:08:38Z

src/role_management.c

+ polling->i = j;
+ work = &work_objs[j];
+ work->data = polling;
+ rv = uv_queue_work(&d->loop, work, pollClusterWorkCb, pollClusterAfterWorkCb);


I think we might want to increase the libuv thread pool size, see https://docs.libuv.org/en/v1.x/threadpool.html something like 128 by default?

128 seems large to me, but I don't have experience tuning these things. For polling the cluster we run one task on the thread pool for each node in the configuration, so typically 3-5 tasks total.

Note that if the capacity of the thread pool gets exhausted, then further calls to uv_queue_work will not fail, but rather queued and executed when threads become available. In general the size of the threadpool should ideally match the hardware capacity (e.g. the number of cores of the host), in order to reduce contention and context switches.

I think we can safely leave libuv's default.

cole-miller · 2023-04-17T19:14:20Z

I'm wondering whether the assertion failure inside uv_close (resulting from an uninitialized handle) is a problem that only surfaces with specific versions of libuv. When I run the new integration test locally against libuv v1.44.3, it's not triggered. nope, it's my fault, failing to call uv_timer_init in all cases

freeekanayaka · 2023-04-17T20:10:34Z

I'm wondering whether the assertion failure inside uv_close (resulting from an uninitialized handle) is a problem that only surfaces with specific versions of libuv. When I run the new integration test locally against libuv v1.44.3, it's not triggered.

It's our code that is calling uv_close(), and that assertion seems to imply that the handle that we're passing to uv_close() is somehow screwed up (either uninitialized, as you point, or maybe with corrupted memory that was overwritten somewhere), so even if for some reason the problem doesn't surface in v1.44.3 it looks like a bug on our side.

src/role_management.c

include/dqlite.h

cole-miller · 2023-04-18T21:31:38Z

Broken up into commits for easier review :)

cole-miller · 2023-04-18T22:24:30Z

I think there is still a pesky leak of allocations that should be cleaned up by RoleManagementDrain, but when I went to look at the GHA failure for the latest commit, it had turned into a success, hmm.

src/role_management.c

src/server.h

src/server.c

freeekanayaka · 2023-04-19T07:19:16Z

Broken up into commits for easier review :)

Much easier to review indeed!

src/role_management.c

src/role_management.h

Signed-off-by: Cole Miller <cole.miller@canonical.com>

freeekanayaka

Looks good overall. I left some other comments.

It'd be nice to try to enable role management in the dqlite server, disable it in the dqlite app code and then try to run the test/roles.sh suite and see how it behaves.

It'd be also nice to add to the dqlite suite itself a test similar to go-dqlite's test/roles.sh, either as shell script or C test. That is going to require some work though and could be done separately.

freeekanayaka · 2023-04-20T16:12:56Z

src/server.c

@@ -114,7 +110,6 @@ int dqlite__init(struct dqlite_node *d,
 rv = DQLITE_ERROR;
 goto err_after_ready_init;
 }
-#endif


I understand that this PR somehow touches this area, and that it probably feels awkward to add similar MacOS-specific calls for the new semaphore being add by this PR.

However it seems that @rabits has already put some meaningful effort to try to support MacOS in dqlite and raft, and it seems there's some community interest in MacOS support.

What about leaving this code in place for now? We don't need to add anything MacOS-specific for the new semaphore right now, but perhaps we'll find folks that can start from the initial work of @rabits and move it forwards. Actually we might have not given @rabits enough support, considering that his raft PR is still open.

Oh, yeah, forgot about this PR. So raft become a big problem for me at a time, and since I don't use dqlite anymore - I can't continue to work on the macos support.

freeekanayaka · 2023-04-20T16:18:23Z

src/roles.c

+ * automatic role management is enabled, servers in a dqlite cluster will
+ * autonomously (without client intervention) promote and demote each other
+ * to maintain a specified number of voters and standbys, taking into account
+ * the health, failure domain, and weight of each server.


It'd be useful to move or copy this part of the comment (from "When automatic role management is enabled" onward) to the public docstring of dqlite_node_enable_role_management().

src/roles.c

freeekanayaka · 2023-04-20T16:37:54Z

src/roles.c

+
+struct change_record
+{
+ uint64_t id;


Please use the raft_id type here. Fixed-size fields like uint64_t should only be used when serializing or deserializing in-memory structures. This way the compiler can choose the best implementation depending on the hardware platform.

Is our code actually prepared to deal with a hypothetical target where unsigned long long and uint64_t are not the same type? It seems to me that if the IDs, failure domains, etc. are required by the wire protocol to be 64 bits, we should just be using uint64_t or a typedef of it everywhere. Using the "wobbly-width" C integer types in that kind of situation is (IMO) either pointless, if the width of the type is effectively fixed (as for unsigned long long), or buggy, if the width can vary across targets.

(I don't find the argument from optimization persuasive because I can't think of a target where unsigned long long is 128 bits, and even on such a target the gains from using a 128-bit type over a 64-bit type seem to me very unlikely to justify the complexity incurred by mixing unsigned long long and uint64_t when these are not the same width.)

I think we had this conversation before, the principle is that model/language types are independent from protocol/serialization concerns.

At some point we might change the serialization type of a given model/language type without changing the model/language type (thus without breaking the API/ABI). For example we might change the protocol to allow for greater values, without requiring a language/model change.

One recent example of that was the allowed number of statement parameters. The protocol version X only allowed for 256 parameters. But the associated language type was not uint8_t (or whatever Go equivalent). By separating the two concerns we were able to bump the protocol version to version X+1 with a different serialization format/type for the number of the statement parameters without needing to change the language. In the implementation there was a missing check for the 256 limit, but that's fine, it was just a bug, and the fix was not to change the language type from int to uint8_t (or whatever exact Go types), but rather to add the check.

In this specific case the current situation is that the failure domain must be a value between 0 and 2^64, but we don't want to enforce that with a uint64_t type, we want to enforce that with a check in the code (and add documentation if missing). This way we can evolve the wire protocol in the future, if required. We're most probably not going to need that for the failure domain values specifically, but this the general idea.

Is our code actually prepared to deal with a hypothetical target where unsigned long long and uint64_t are not the same type?

That I didn't check. If it's not, it should be fixed with a check on the value, and documented.

Note that boundary checks are just one type of model value checks. In general any expectation we have on a value (e.g. the value should be "odd", or it should be a multiple of 3 or whatever) should be explicitly coded.

It seems to me that if the IDs, failure domains, etc. are required by the wire protocol to be 64 bits, we should just be using uint64_t or a typedef of it everywhere.

As said, the wire protocol is opaque, and can change. What should be more stable is the language/model API.

Using the "wobbly-width" C integer types in that kind of situation is (IMO) either pointless, if the width of the type is effectively fixed (as for unsigned long long), or buggy, if the width can vary across targets.

This is indeed the point, the width of the type is not effectively fixed by design. The current boundary values are [0..2^64] (and as said they should be documented and checked), but those could change in the future without changing the type.

(I don't find the argument from optimization persuasive because I can't think of a target where unsigned long long is 128 bits, and even on such a target the gains from using a 128-bit type over a 64-bit type seem to me very unlikely to justify the complexity incurred by mixing unsigned long long and uint64_t when these are not the same width.)

The optimization argument is ancillary and definitely not the main one, the point is more to separate concerns conceptually. As said, failure domains don't need to be 64 bits, and should not be modeled as such. As said, this doesn't matter much in the specific case of failure domains, but it might matter in other cases, and for consistency we apply the same approach everywhere.

In a 32-bit world, it would be good practice to use int for your APIs, not int32_t. At that point you might have checks in place to ensure that your values were at most 2^32 (because for example you were serializing them using int32_t). Then you start supporting new 64-bit architectures, and you don't need to change the API (int to int64_t), but you might change the serialization and relax your boundary checks to allow for greater values. User code does not need to change.

I'm really not convinced by this argument, but I will change the PR to match the status quo.

src/roles.c

freeekanayaka · 2023-04-20T16:57:35Z

src/roles.c

+
+ if (status != 0) {
+ /* XXX */
+ return;


I'm not entirely sure because I still have to familiarize with the new code here, but if status is non-zero, wouldn't we end up in a "stuck" state? Because *polling->count never reaches polling->n_cluster and adjustClusterCb is never called.

In practice, I think the only way for status to be non-zero would be to call uv_cancel() on the work handle, which we currently don't do. However it's something we might eventually want to do (e.g. to improve RolesCancelPendingChanges() or to implement some sort of "hard" dqlite_node_stop() functionality).

It doesn't need to be fixed here, but if what I'm saying is accurate, I would suggest to turn if (status != 0) into assert(status != 0) (assuming that we'd break otherwise), and to add a comment to the assertion describing what I said above.

Yes, an assert sounds good.

src/roles.c

cole-miller · 2023-04-21T15:23:04Z

Review comments addressed. I've left the commit removing #ifdef __APPLE__ in because:

the dqlite team is prioritizing code simplification over cross-platform support for the upcoming cycle, and our backlog of work is such that we don't really have the resources to solicit + review external contributions adding support for macOS or other platforms
~~we are not set up to properly maintain macOS support (no CI)~~ (ignore this one, I somehow forgot that GHA supports macOS, so CI support is not an issue)
the removed code is quite straightforward and won't be any trouble to add back if someone takes a stab at adding macOS support in the future

Signed-off-by: Cole Miller <cole.miller@canonical.com>

cole-miller force-pushed the roles branch 2 times, most recently from 5a3b910 to 191ca5a Compare March 1, 2023 21:29

cole-miller force-pushed the roles branch 4 times, most recently from a21b849 to f0d2b17 Compare March 15, 2023 04:52

cole-miller force-pushed the roles branch 6 times, most recently from 6c609fa to 6449a4c Compare March 21, 2023 00:48

cole-miller requested review from MathieuBordere and freeekanayaka March 21, 2023 03:16

cole-miller marked this pull request as ready for review March 21, 2023 03:16

MathieuBordere reviewed Mar 21, 2023

View reviewed changes

MathieuBordere reviewed Mar 22, 2023

View reviewed changes

cole-miller force-pushed the roles branch from 6449a4c to 67380d9 Compare March 28, 2023 16:43

This was referenced Mar 31, 2023

client/protocol: Switch context from budget to deadline #490

Merged

Potential use-after-free in handling of raft_transfer canonical/raft#396

Closed

cole-miller force-pushed the roles branch from 67380d9 to 02f1806 Compare April 6, 2023 19:53

cole-miller force-pushed the roles branch from 02f1806 to a3d25ff Compare April 17, 2023 17:50

freeekanayaka reviewed Apr 17, 2023

View reviewed changes

src/role_management.c Outdated Show resolved Hide resolved

cole-miller force-pushed the roles branch from a3d25ff to c44510a Compare April 17, 2023 20:38

freeekanayaka reviewed Apr 17, 2023

View reviewed changes

src/role_management.c Outdated Show resolved Hide resolved

cole-miller force-pushed the roles branch from c44510a to 49d8cd1 Compare April 17, 2023 20:52

cole-miller force-pushed the roles branch 3 times, most recently from f37333c to 03e97b2 Compare April 17, 2023 22:22

freeekanayaka reviewed Apr 18, 2023

View reviewed changes

include/dqlite.h Show resolved Hide resolved

cole-miller force-pushed the roles branch 2 times, most recently from e0f827a to e48bf70 Compare April 18, 2023 21:31