Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(validator): elections powered by etcd leases #31

Merged

Conversation

petermetz
Copy link
Contributor

Fixes #19

Could only run the unit tests, not a proper end to end test on account of the resource leakage issue.
Once the CI environment is up and running the change can be properly verified.

@petermetz petermetz added the enhancement New feature or request label Nov 7, 2019
@petermetz petermetz requested a review from tkuhrt as a code owner November 7, 2019 05:45
@denis-yu-glotov
Copy link
Contributor

How does architecture change here? I see you add 3 raft nodes to 4 quorum-validator nodes. How does the resulting network work in brief? How does validator nodes map to raft nodes?

@petermetz petermetz force-pushed the feat/etcd-leader-election branch 6 times, most recently from fc52e6b to c99da7d Compare November 14, 2019 18:46
@petermetz
Copy link
Contributor Author

How does architecture change here? I see you add 3 raft nodes to 4 quorum-validator nodes. How does the resulting network work in brief? How does validator nodes map to raft nodes?

@denis-yu-glotov
Added some documentation that I'm hoping will answer all those questions. If anything is still spotty please let me know, I want to make sure the documentation explains everything properly!

Here's the link to the docs: https://github.com/petermetz/blockchain-integration-framework/blob/feat/etcd-leader-election/docs/architecture/leader-election/etcd-leases.md

@petermetz petermetz force-pushed the feat/etcd-leader-election branch from 02f6344 to 00ee082 Compare November 14, 2019 22:59
@jonathan-m-hamilton jonathan-m-hamilton added this to the v0.2.0 milestone Nov 15, 2019
@petermetz petermetz force-pushed the feat/etcd-leader-election branch 3 times, most recently from 64b1fa4 to 76a97af Compare November 19, 2019 02:31
Copy link
Contributor

@denis-yu-glotov denis-yu-glotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, looks my comments were not published, sorry

src/Validator.js Show resolved Hide resolved
src/Validator.js Outdated Show resolved Hide resolved
src/Validator.js Outdated Show resolved Hide resolved
@jonathan-m-hamilton
Copy link
Contributor

@hugo-borne to run regression tests and let @petermetz know if any issues to address before merge

@petermetz petermetz force-pushed the feat/etcd-leader-election branch 2 times, most recently from 4bda8ca to b7bcb1f Compare November 19, 2019 17:24
Copy link

@hugo-borne hugo-borne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got an error running fabric federation

To reproduce:

npm run fabric
npm run fed:fabric
npm run fed:fabric:log

Error (see attached screenshot):

fabric_validator4_1  | [2019-11-19T17:59:51.883] [DEBUG] Validator - Starting as Follower... 
fabric_validator4_1  | [2019-11-19T17:59:51.900] [DEBUG] Validator - My Leader is : 
fabric_validator4_1  |       (undefined,
fabric_validator4_1  |       undefined,
fabric_validator4_1  |       undefined)
fabric_validator4_1  | (node:7) UnhandledPromiseRejectionWarning: TypeError: Address must be a string!
fabric_validator4_1  |     at exports.Socket.Socket.connect (/federation/node_modules/zeromq/lib/index.js:519:13)
fabric_validator4_1  |     at Validator.startAsFollower (/federation/node_modules/@hyperledger/blockchain-integration-framework/src/Validator.js:143:24)
fabric_validator4_1  |     at Validator.start (/federation/node_modules/@hyperledger/blockchain-integration-framework/src/Validator.js:87:12)
fabric_validator4_1  |     at /federation/app.js:46:13
fabric_validator4_1  |     at <anonymous>
fabric_validator4_1  | (node:7) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch block, or by rejecting a promise which was not handled with .catch(). (rejection id: 1)
fabric_validator4_1  | (node:7) [DEP0018] DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code.

Screenshot 2019-11-19 at 19 01 25

Warning: the package.json in your fork is pointing to another branch, which means you may be testing your code against another version than the one in feat/etcd-leader-election

@petermetz
Copy link
Contributor Author

Ah, must be one of the stylistic changes I did for review. Will check, thank you!

Copy link

@hugo-borne hugo-borne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous error persists on my side:

Non-leader Validators are not able to parse the leader's address
TypeError: Address must be a string!

@petermetz
Copy link
Contributor Author

Previous error persists on my side:

Non-leader Validators are not able to parse the leader's address
TypeError: Address must be a string!

Hmm, is it possible that you have the docker images, node_modules folders hanging around from the previous test?
Just asking because I had the same issue while testing the fix and only after cleaning up my environment was I able to see that the bug was fixed.
The ./tools/ci.sh script is an easy to way to run all the cleanups necessary (if you already used that and the result is the same then something is definitely wrong though).

@petermetz petermetz force-pushed the feat/etcd-leader-election branch from d3da0ab to a325049 Compare November 26, 2019 20:52
@hugo-borne hugo-borne self-assigned this Nov 28, 2019
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
It was copy-pasted from the fabric docker-compose and so the network names
were incorrectly referencing the fabric network, not the quorum.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
The getter that was changed: isCurrentNodeLeader

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
TODO(peter.somogyvari): Once leader election
has enough test coverage, get rid of these property assignments
and just use the this.leaderNodeInfo.networkInfo
object directly everywhere.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
The examples were hardcoding GitHub branch references
which is a moving target and makes it error prone in certain
situations to have the intended code running for local tests/CI.
Solved it with a pre-install npm script that runs npm pack in the
root directory of the project creating an installable tarball of the
latest source code that's checked out at the current moment.
This strongly ties the examples of any given revision to the
exact source code of BIF from that revision as well, guaranteeing
that whoever/wherever does an npm install inside the examples
will get the current source code of that revision packaged up as
an npm dependency.
that they are committed with

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
Reverting an attempted fix that was breaking the CI.
The attempted fix was to introduce resource constraints
on the quorum containers (CPU, RAM usage). This worked
in the sense that on a smaller dev laptop launching the
quorum network didn't cause OOM, but on the other hand
containers got stuck in 'unhealthy' status on CI so
overall the fix didn't pan out.

Also: Performance optimization is added by reducing logging
verbosity within the quorum containers and the geth syncmode
is being changed from 'full' to 'fast' (the default) which
does not verify each block one by one upon initialization.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
This was forgotten from the original commit which only
updated fabrid and quorum

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
@petermetz petermetz force-pushed the feat/etcd-leader-election branch from a325049 to 0f9b886 Compare November 28, 2019 19:33
Copy link
Contributor

@denis-yu-glotov denis-yu-glotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested on my VM by running tools/ci.sh and it went fine. Is this enough or you have some other manual tests to try?

src/Validator.js Show resolved Hide resolved
examples/simple-asset-transfer/package.json Show resolved Hide resolved
tools/create-local-npm-package.js Outdated Show resolved Hide resolved
tools/create-local-npm-package.js Show resolved Hide resolved
tools/create-local-npm-package.js Show resolved Hide resolved
Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
@petermetz
Copy link
Contributor Author

I tested on my VM by running tools/ci.sh and it went fine. Is this enough or you have some other manual tests to try?

I think we are good, trying to get this in as soon as possible now, been open too long and the follow-up PR fixing test:bc is already in the queue as well.

Copy link
Contributor

@denis-yu-glotov denis-yu-glotov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, provided that the validator code is work in progress: having separate etcd3 nodes is very questionable to me.

Copy link
Contributor

@jonathan-m-hamilton jonathan-m-hamilton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@petermetz petermetz merged commit 61aab4a into hyperledger-cacti:master Dec 2, 2019
@petermetz petermetz deleted the feat/etcd-leader-election branch December 2, 2019 23:20
blefevre pushed a commit to blefevre/blockchain-integration-framework that referenced this pull request Feb 4, 2020
* feat(validator): elections powered by etcd leases

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* docs: etcd leases, state and deployment diagram

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* fix(quorum): docker-compose network name typo

It was copy-pasted from the fabric docker-compose and so the network names
were incorrectly referencing the fabric network, not the quorum.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* refactor(validator): change getter to method

The getter that was changed: isCurrentNodeLeader

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* refactor(validator): Object.assign -> line by line

TODO(peter.somogyvari): Once leader election
has enough test coverage, get rid of these property assignments
and just use the this.leaderNodeInfo.networkInfo
object directly everywhere.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* fix(test/validator): isCurrentNodeLeader syntax

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* fix(examples,tools): use local npm package

The examples were hardcoding GitHub branch references
which is a moving target and makes it error prone in certain
situations to have the intended code running for local tests/CI.
Solved it with a pre-install npm script that runs npm pack in the
root directory of the project creating an installable tarball of the
latest source code that's checked out at the current moment.
This strongly ties the examples of any given revision to the
exact source code of BIF from that revision as well, guaranteeing
that whoever/wherever does an npm install inside the examples
will get the current source code of that revision packaged up as
an npm dependency.
that they are committed with

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* fix(example/quorum): container resource usage

Reverting an attempted fix that was breaking the CI.
The attempted fix was to introduce resource constraints
on the quorum containers (CPU, RAM usage). This worked
in the sense that on a smaller dev laptop launching the
quorum network didn't cause OOM, but on the other hand
containers got stuck in 'unhealthy' status on CI so
overall the fix didn't pan out.

Also: Performance optimization is added by reducing logging
verbosity within the quorum containers and the geth syncmode
is being changed from 'full' to 'fast' (the default) which
does not verify each block one by one upon initialization.

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* fix(examples/corda): adds etcd to docker-compose

This was forgotten from the original commit which only
updated fabrid and quorum

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>

* docs(tools): add docs for create local npm packge

Signed-off-by: Peter Somogyvari <peter.somogyvari@accenture.com>
ryjones pushed a commit that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rework mechanism to achieve quorum on signatures
4 participants