-
Notifications
You must be signed in to change notification settings - Fork 285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(validator): elections powered by etcd leases #31
feat(validator): elections powered by etcd leases #31
Conversation
How does architecture change here? I see you add 3 raft nodes to 4 quorum-validator nodes. How does the resulting network work in brief? How does validator nodes map to raft nodes? |
fc52e6b
to
c99da7d
Compare
@denis-yu-glotov Here's the link to the docs: https://github.com/petermetz/blockchain-integration-framework/blob/feat/etcd-leader-election/docs/architecture/leader-election/etcd-leases.md |
02f6344
to
00ee082
Compare
64b1fa4
to
76a97af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, looks my comments were not published, sorry
@hugo-borne to run regression tests and let @petermetz know if any issues to address before merge |
4bda8ca
to
b7bcb1f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got an error running fabric federation
To reproduce:
npm run fabric
npm run fed:fabric
npm run fed:fabric:log
Error (see attached screenshot):
fabric_validator4_1 | [2019-11-19T17:59:51.883] [DEBUG] Validator - Starting as Follower...
fabric_validator4_1 | [2019-11-19T17:59:51.900] [DEBUG] Validator - My Leader is :
fabric_validator4_1 | (undefined,
fabric_validator4_1 | undefined,
fabric_validator4_1 | undefined)
fabric_validator4_1 | (node:7) UnhandledPromiseRejectionWarning: TypeError: Address must be a string!
fabric_validator4_1 | at exports.Socket.Socket.connect (/federation/node_modules/zeromq/lib/index.js:519:13)
fabric_validator4_1 | at Validator.startAsFollower (/federation/node_modules/@hyperledger/blockchain-integration-framework/src/Validator.js:143:24)
fabric_validator4_1 | at Validator.start (/federation/node_modules/@hyperledger/blockchain-integration-framework/src/Validator.js:87:12)
fabric_validator4_1 | at /federation/app.js:46:13
fabric_validator4_1 | at <anonymous>
fabric_validator4_1 | (node:7) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
fabric_validator4_1 | (node:7) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.
Warning: the package.json in your fork is pointing to another branch, which means you may be testing your code against another version than the one in feat/etcd-leader-election
Ah, must be one of the stylistic changes I did for review. Will check, thank you! |
8857fa0
to
da42d1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous error persists on my side:
Non-leader Validators are not able to parse the leader's address
TypeError: Address must be a string!
Hmm, is it possible that you have the docker images, node_modules folders hanging around from the previous test? |
d3da0ab
to
a325049
Compare
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
It was copy-pasted from the fabric docker-compose and so the network names were incorrectly referencing the fabric network, not the quorum. Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
The getter that was changed: isCurrentNodeLeader Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
TODO(peter.somogyvari): Once leader election has enough test coverage, get rid of these property assignments and just use the this.leaderNodeInfo.networkInfo object directly everywhere. Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
The examples were hardcoding GitHub branch references which is a moving target and makes it error prone in certain situations to have the intended code running for local tests/CI. Solved it with a pre-install npm script that runs npm pack in the root directory of the project creating an installable tarball of the latest source code that's checked out at the current moment. This strongly ties the examples of any given revision to the exact source code of BIF from that revision as well, guaranteeing that whoever/wherever does an npm install inside the examples will get the current source code of that revision packaged up as an npm dependency. that they are committed with Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Reverting an attempted fix that was breaking the CI. The attempted fix was to introduce resource constraints on the quorum containers (CPU, RAM usage). This worked in the sense that on a smaller dev laptop launching the quorum network didn't cause OOM, but on the other hand containers got stuck in 'unhealthy' status on CI so overall the fix didn't pan out. Also: Performance optimization is added by reducing logging verbosity within the quorum containers and the geth syncmode is being changed from 'full' to 'fast' (the default) which does not verify each block one by one upon initialization. Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
This was forgotten from the original commit which only updated fabrid and quorum Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
a325049
to
0f9b886
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested on my VM by running tools/ci.sh
and it went fine. Is this enough or you have some other manual tests to try?
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
I think we are good, trying to get this in as soon as possible now, been open too long and the follow-up PR fixing test:bc is already in the queue as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, provided that the validator code is work in progress: having separate etcd3 nodes is very questionable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
* feat(validator): elections powered by etcd leases Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * docs: etcd leases, state and deployment diagram Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * fix(quorum): docker-compose network name typo It was copy-pasted from the fabric docker-compose and so the network names were incorrectly referencing the fabric network, not the quorum. Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * refactor(validator): change getter to method The getter that was changed: isCurrentNodeLeader Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * refactor(validator): Object.assign -> line by line TODO(peter.somogyvari): Once leader election has enough test coverage, get rid of these property assignments and just use the this.leaderNodeInfo.networkInfo object directly everywhere. Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * fix(test/validator): isCurrentNodeLeader syntax Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * fix(examples,tools): use local npm package The examples were hardcoding GitHub branch references which is a moving target and makes it error prone in certain situations to have the intended code running for local tests/CI. Solved it with a pre-install npm script that runs npm pack in the root directory of the project creating an installable tarball of the latest source code that's checked out at the current moment. This strongly ties the examples of any given revision to the exact source code of BIF from that revision as well, guaranteeing that whoever/wherever does an npm install inside the examples will get the current source code of that revision packaged up as an npm dependency. that they are committed with Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * fix(example/quorum): container resource usage Reverting an attempted fix that was breaking the CI. The attempted fix was to introduce resource constraints on the quorum containers (CPU, RAM usage). This worked in the sense that on a smaller dev laptop launching the quorum network didn't cause OOM, but on the other hand containers got stuck in 'unhealthy' status on CI so overall the fix didn't pan out. Also: Performance optimization is added by reducing logging verbosity within the quorum containers and the geth syncmode is being changed from 'full' to 'fast' (the default) which does not verify each block one by one upon initialization. Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * fix(examples/corda): adds etcd to docker-compose This was forgotten from the original commit which only updated fabrid and quorum Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com> * docs(tools): add docs for create local npm packge Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Fixes #19
Could only run the unit tests, not a proper end to end test on account of the resource leakage issue.
Once the CI environment is up and running the change can be properly verified.