-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add node-specific suffix to DX endpoint "id" #916
Conversation
d569e32
to
bb48e46
Compare
New versions of ffdx allow a `sender` in the message body, to help with the mapping of remote -> local namespace. Signed-off-by: Alex Shorsher <alex.shorsher@kaleido.io>
Signed-off-by: Alex Shorsher <alex.shorsher@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
internal/dataexchange/ffdx/ffdx.go
Outdated
if handler, ok := cb.handlers[namespace]; ok { | ||
handler.DXEvent(cb.plugin, event) | ||
func (cb *callbacks) DXEvent(ctx context.Context, namespace, recipient string, event dataexchange.DXEvent) { | ||
if node, ok := cb.plugin.nodes[recipient]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for a DX message to arrive before the node identity has been confirmed? Here plugin.nodes
is populated with any nodes that exist on startup or any nodes that are confirmed by an identity claim broadcast. I don't see any way to pick the correct handler unless the node has been claimed and confirmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically this is an implicit validation on the new "recipient" field provided by DX, to see that it matches a known node/verifier in the database. Previously this field didn't exist, so any message received was just put into the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the question is if you were to:
- As
B
broadcast my node to the network - As
A
triggers an event off of the broadcast ofB
and immediately send a message to a group containingB
- As
B
receive the message fromA
before processing the broadcast from (1) - even though it was successfully processed by the remote node.
I believe it is (at least theoretically) possible to engineer this scenario.
It doesn't seem like one to worry about in practice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I think it would mainly be if the blockchain listener on B
was disconnected or behind when (3) is received. It would ignore the off-chain data since there's no orchestrator lined up to handle it until (1) is confirmed (on- and off-chain portions) on my local node. It feels like an edge case though.
Perhaps the more important angle to this is that there's no way to communicate back to A
whether the message was processed. DX will indicate success as long as the message was delivered to FireFly - but if FireFly decides to drop it on the floor, A
will never know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the more general case, you could send a message to peer0/jack
and DX would report success as long as peer0
is valid - regardless of whether /jack
exists at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spoke to @gabriel-indik about this, and we agreed the MTLS implementation of DX should reject a HTTP POST to an unregistered node ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - can table it in the context of this PR then, and come back after scoping what it would take to push that responsibility into DX
Codecov Report
@@ Coverage Diff @@
## main #916 +/- ##
==========================================
+ Coverage 99.97% 99.98% +0.01%
==========================================
Files 299 299
Lines 19475 19477 +2
==========================================
+ Hits 19470 19474 +4
+ Misses 4 3 -1
+ Partials 1 0 -1
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
The "id" of a DX node will now have a suffix indicating a local node name. This will allow multiple nodes to co-exist within a single instance of FireFly, as long as they are stored on different local namespaces. Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few questions going through @awrichar that I'd like to clarify before this merges.
One more general one - I was expecting to find something around this check:
firefly/internal/privatemessaging/privatemessaging.go
Lines 275 to 278 in d764406
if node.Parent.Equals(localOrg.ID) { | |
l.Debugf("Skipping send of batch for local node %s for group=%s node=%s (%d/%d)", batch.ID, batch.Group, node.ID, i+1, len(nodes)) | |
continue | |
} |
Either a complete removal of it, or a tweak to it. Could you help me understand why it's still there?
@@ -201,13 +201,6 @@ func (e *Ethereum) AddFireflySubscription(ctx context.Context, namespace core.Na | |||
return "", err | |||
} | |||
|
|||
switch firstEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking out for why this change is in this PR - hoping I answer this later in the review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... I didn't (or I just missed it) - mind adding an explanation of this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this is just a general cleanup item that crept into this branch. Nothing to do with the PR really, but this same check is performed again in the createSubscription()
method ultimately called from this flow. I can keep it here or pull it out to a separate PR if it seems extraneous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving it here is fine - thanks for adding the clarification.
@@ -210,8 +235,14 @@ func (h *FFDX) beforeConnect(ctx context.Context) error { | |||
if h.needsInit { | |||
h.initialized = false | |||
var status dxStatus | |||
var body []fftypes.JSONObject | |||
h.nodesMutex.Lock() | |||
for _, node := range h.nodes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about where the namespace is in this, as where the APIs have fallen with FFDX I'm not sure it's there.
Going to work through it here, and hopefully understand why it's not needed...
Specifically the scenario I'm thinking of is:
- Namespace1: Joins remote namespace
A
- node namebob
- so DX IDpeer1/bob
- Namespace2: Joins remote namespace
B
- node namebob
- so DX IDpeer1/bob
This means that two broadcasts go out, two two different namespaces, with the same fully qualified DX ID.
Or have I missed something?
If I've got this right, then any other party that is also in NS A
and B
on the multi-party system, will end up with two node
objects - with different UUIDs - but the same DX identifier.
But the data-structure here is a map
(not an array) so I guess DX will arbitrarily get one of them.
I guess the point is that in practice they have to be the same signing key and other details at the DX level, for it to work (and scratching my head I couldn't think of a way to exploit the overlap between namespaces - only ways to break my own node).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The peers advertised to DX are actually just peer1
(not peer1/bob
). So my DX ends up knowing about all other DX peers across all multiparty namespaces that it participates in. It doesn't care about or store /bob
(and doesn't accept it when posting to /peers
... only accepts it and forwards it along when received as part of a message delivery). Now, that said - I'm not sure we've actually tested the re-init flow here, so I should take that as an action.
When FireFly receives a message, it decides to route it based on 1) mapping the qualified DX ID like peer1/bob
to a local node name like bob
and 2) looking up the correct orchestrator based on that node name and the remote namespace. You're correct that if two nodes on different namespaces share a DX ID like peer1/bob
, then h.nodes
will arbitrarily contain one of them. I think it all just happens to work because the mapping in (1) should be the same regardless of which node we happen to be looking at, and (2) will ensure it ultimately goes to the correct namespace. But it's worth another pass to see if this can be any cleaner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I updated to fully separate the node lists per remote namespace - hopefully that makes it a bit more deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per https://discord.com/channels/905194001349627914/971132225355657267/1004818976876007445 there's going to be follow-on work between @awrichar and @gabriel-indik to work this one through. We are moving forwards with this update to FF Core for now as it's holding up the line, and this is well understood.
Maintain separate node maps per remote namespace. Use a single mutex instead of two slightly different ones. Combine InitPeer and AddPeer into new AddNode method. Remove one unused DX plugin method. Explicitly provide a GetPeerID method instead of doing GetString("id"). Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Regarding the changes to the "Skipping send of batch for local node" logic - that's in the next PR (#918). Didn't discover the need for it until I added the E2E test. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @awrichar - sorry for taking a while with the review 🙇
In a chain with #915Depends on hyperledger/firefly-dataexchange-https#62
UPDATE: found a number of bugs while adding E2E tests - so #918 contains everything from this PR, some additional fixes, and an E2E test. The comments below are still relevant though.
FireFly can now support many local namespaces that map to a single remote namespace (for multi-tenancy purposes). Each of these local namespaces may have their own "node" within FireFly, but they may all be sharing a single DX plugin. This change adds a "nodeName" qualifier to many of the dataexchange plugin calls, along with a specific implementation in the FFDX plugin code, to allow many nodes to exist behind a single DX.
Implementation outline:
/
as a separator.Should remain fully backwards compatible for nodes that were already broadcast/stored in the database with a non-suffixed peer ID.