-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend peer shard info #3726
Extend peer shard info #3726
Conversation
6b28633
to
6520192
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the first pass and got a couple questions about the protobuf message changes.
Let me confirm my understanding. Three protobuf message pairs were changed.
TMGetShardInfo
andTMShardInfo
that were marked as deprecated previously are removed now.TMGetPeerShardInfo
andTMPeerShardInfo
that were used previously are marked as deprecated now. Their handler functions are empty.TMGetPeerShardInfoV2
andTMPeerShardInfoV2
are the new messages.
TMGetPeerShardInfoV2
is only used to answer crawl_shards
RPC requests when the relays > 0
.
TMPeerShardInfoV2
is used to reply to TMGetPeerShardInfoV2
. It is also used in DatabaseShardImp::updatePeers()
to update peers when the local node's shard information changes.
It seems nodes with different versions of rippled won't be able to exchange shard information via these messages. Maybe not a big concern if the messages were only for RPCs, and I guess the message version update happened once already when updating from TMGetShardInfo
to TMGetPeerShardInfo
.
To reduce unnecessary messages, maybe only send TMGetPeerShardInfoV2
and TMPeerShardInfoV2
to peers that understand them? I am not sure if the optimization is needed since I don't know how often these messages are sent. If we need the optimization, then ProtocolFeature
might be helpful. https://github.com/ripple/rippled/blob/c11037fd2764f6fea8e0d184b0b7aa0daa81136f/src/ripple/overlay/Peer.h#L38
6520192
to
b136b23
Compare
@pwang200 Yes, the information is used to answer the crawl RPC but its primary objective is to automatically share the status through the peer network via the protocol message. It's a building block and it allows us to expand the use cases. For instance, clusters could potentially reduce shard redundancy when selecting new shards to acquire. The info can also be utilized to know which peers have the shards you are acquiring, allowing a direct download. Because the messages are signed, one could potentially build a verified state of shard history on the network. The older protocol messages were deprecated some time ago and deal with an older shard version. You are correct that some older rippled versions will not be able to understand but at this point, I am not sure it warrants optimization. |
158ac4a
to
2a0b097
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enjoyed reviewing this PR, learned a bit more about shard, and got to think about signatures and p2p information propagation. Thanks for the explanation of the potential use cases. Regarding p2p information propagation, I think "pull + last hop hush" make sense for our usages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly found nits, but there are a couple of issues that I might be confused about or else really need to be addressed. Happy to talk through any of the stuff I've pointed out.
Regarding unit test coverage, you might consider running codecov locally (if you can) to see what your coverage looks like. On macOS to run coverage on all non-manual unit tests I use...
cmake -DOPENSSL_ROOT_DIR=$(brew --prefix openssl@1.1) -DCMAKE_BUILD_TYPE=Debug -Dunity=OFF -Dassert=ON -Dwerr=ON -Dcoverage=ON -Dcoverage_core_only =OFF -Ucoverage_test ../../..
To run a specific test ($1 on the command line) I use...
cmake -DOPENSSL_ROOT_DIR=$(brew --prefix openssl@1.1) -DCMAKE_BUILD_TYPE=Debug -Dunity=OFF -Dassert=ON -Dwerr=ON -Dcoverage_core_only=OFF -Dcoverage_test=$1 ../../..
Either of those is followed by...
make -j3 coverage_report
I have no idea whether that works on other platforms. But I'm happy to try and help if I can.
Get me out of this list !!!b |
ae3a4e3
to
30a21be
Compare
57415a7
to
8167c59
Compare
FWIW, the latest commit is not building for me on macOS with clang. The problem appears to result from One fix would be to make the two data members non-
I'm going to assume you make a change similar to that and continue reviewing the code.. |
Oh! interesting. I'm unsure why clang was okay with the version of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think once the problem with Incomplete
is patched up this looks good to go. I spotted a couple of other things I think might be minor improvements, but they are not required.
6c6f7a9
to
1ec44a2
Compare
1ec44a2
to
d1cfdaa
Compare
d1cfdaa
to
d134990
Compare
FWIW, the current version of this branch usually (but not always) fails the DatabaseShard manual unit test on macOS. Sometimes the test passes. I usually get this:
|
The |
8f6f47c
to
4ed1a55
Compare
Hi @miguelportilla, thanks for the work on the unit tests. I think things are better, but we're not quite there, at least for me on macOS. I ran the
So I think that needs to be looked at and addressed. |
58bffc7
to
7e4e26f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Looks good. Left one minor comment.
Also, the "crawling" in p2p is interesting. No action needed now. But if we have more use cases later, we could abstract the crawling logic out.
src/ripple/overlay/impl/PeerImp.cpp
Outdated
auto const timestamp{ | ||
NetClock::time_point{std::chrono::seconds{m->timestamp()}}}; | ||
auto const now{app_.timeKeeper().now()}; | ||
if (timestamp > now) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (timestamp > (now + 5s))
to allow some peer clocks that are not synchronized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwang200 Good idea, thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me! I ran the DatabaseShard
test 35 times in a loop with no failures, so that's looking reliable now. And you got 100% code coverage on ShardInfo
! Nice work! 👍
a989eae
to
05b0098
Compare
05b0098
to
e1fedd4
Compare
e1fedd4
to
e97c54a
Compare
This pull request is a refactoring and improvement of the shard information shared between peers with messages. Notably, the data shared now includes shard states, their progress, and a digital signature of the message data. The information is also visible via the crawl_shards command. Please note that the protocol message for the shard info has changed and the new message will only be visible to rippled nodes running 1.7.0-b10 or higher.
The introduction of the ShardInfo class disrupts levelization. The class resides in /nodestore, and is included from /overlay/impl/Peerimp.h and /nodestore/DataBaseShard.h. Since this cycle already exists, the levelization files will be updated with the change.