Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add node-specific suffix to DX endpoint "id" #916

Merged
merged 9 commits into from
Aug 4, 2022

Conversation

awrichar
Copy link
Contributor

@awrichar awrichar commented Aug 1, 2022

In a chain with #915
Depends on hyperledger/firefly-dataexchange-https#62

UPDATE: found a number of bugs while adding E2E tests - so #918 contains everything from this PR, some additional fixes, and an E2E test. The comments below are still relevant though.

FireFly can now support many local namespaces that map to a single remote namespace (for multi-tenancy purposes). Each of these local namespaces may have their own "node" within FireFly, but they may all be sharing a single DX plugin. This change adds a "nodeName" qualifier to many of the dataexchange plugin calls, along with a specific implementation in the FFDX plugin code, to allow many nodes to exist behind a single DX.

Implementation outline:

  • For newly created nodes, the FFDX connector will append a suffix to the member ID advertised by the plugin before storing or broadcasting it. This means each node will have a unique verifier ID even if they are sharing a DX member ID and certificate. The FFDX connector and FFDX plugin have agreed on / as a separator.
  • The FFDX connector will ignore this suffix when routing messages (paying attention only to the member ID before the separator) - but will pass the full recipient string for further routing by FireFly on the far end.
  • FireFly will use the combination of the received message's remote namespace and recipient ID to route it uniquely to a single event manager.

Should remain fully backwards compatible for nodes that were already broadcast/stored in the database with a non-suffixed peer ID.

@awrichar awrichar marked this pull request as draft August 1, 2022 17:44
@awrichar awrichar force-pushed the dxfiltering branch 4 times, most recently from d569e32 to bb48e46 Compare August 1, 2022 18:44
shorsher and others added 3 commits August 1, 2022 15:22
New versions of ffdx allow a `sender` in the message body, to help with
the mapping of remote -> local namespace.

Signed-off-by: Alex Shorsher <alex.shorsher@kaleido.io>
Signed-off-by: Alex Shorsher <alex.shorsher@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
@awrichar awrichar changed the title Add supplementary "nodeID" field to DX node profile Add node-specific suffix to DX endpoint "id" Aug 1, 2022
if handler, ok := cb.handlers[namespace]; ok {
handler.DXEvent(cb.plugin, event)
func (cb *callbacks) DXEvent(ctx context.Context, namespace, recipient string, event dataexchange.DXEvent) {
if node, ok := cb.plugin.nodes[recipient]; ok {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for a DX message to arrive before the node identity has been confirmed? Here plugin.nodes is populated with any nodes that exist on startup or any nodes that are confirmed by an identity claim broadcast. I don't see any way to pick the correct handler unless the node has been claimed and confirmed.

Copy link
Contributor Author

@awrichar awrichar Aug 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically this is an implicit validation on the new "recipient" field provided by DX, to see that it matches a known node/verifier in the database. Previously this field didn't exist, so any message received was just put into the database.

Copy link
Contributor

@peterbroadhurst peterbroadhurst Aug 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the question is if you were to:

  1. As B broadcast my node to the network
  2. As A triggers an event off of the broadcast of B and immediately send a message to a group containing B
  3. As B receive the message from A before processing the broadcast from (1) - even though it was successfully processed by the remote node.

I believe it is (at least theoretically) possible to engineer this scenario.
It doesn't seem like one to worry about in practice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, I think it would mainly be if the blockchain listener on B was disconnected or behind when (3) is received. It would ignore the off-chain data since there's no orchestrator lined up to handle it until (1) is confirmed (on- and off-chain portions) on my local node. It feels like an edge case though.

Perhaps the more important angle to this is that there's no way to communicate back to A whether the message was processed. DX will indicate success as long as the message was delivered to FireFly - but if FireFly decides to drop it on the floor, A will never know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the more general case, you could send a message to peer0/jack and DX would report success as long as peer0 is valid - regardless of whether /jack exists at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spoke to @gabriel-indik about this, and we agreed the MTLS implementation of DX should reject a HTTP POST to an unregistered node ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - can table it in the context of this PR then, and come back after scoping what it would take to push that responsibility into DX

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2022

Codecov Report

Merging #916 (dc55d52) into main (3ae7ba8) will increase coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #916      +/-   ##
==========================================
+ Coverage   99.97%   99.98%   +0.01%     
==========================================
  Files         299      299              
  Lines       19475    19477       +2     
==========================================
+ Hits        19470    19474       +4     
+ Misses          4        3       -1     
+ Partials        1        0       -1     
Impacted Files Coverage Δ
internal/blockchain/ethereum/ethereum.go 100.00% <ø> (ø)
internal/data/data_manager.go 100.00% <100.00%> (ø)
internal/dataexchange/ffdx/dxevent.go 100.00% <100.00%> (ø)
internal/dataexchange/ffdx/ffdx.go 100.00% <100.00%> (ø)
internal/definitions/handler.go 100.00% <100.00%> (ø)
internal/definitions/handler_identity_claim.go 100.00% <100.00%> (ø)
internal/definitions/sender.go 100.00% <100.00%> (ø)
internal/identity/identitymanager.go 100.00% <100.00%> (ø)
internal/networkmap/register_node.go 100.00% <100.00%> (ø)
internal/orchestrator/orchestrator.go 100.00% <100.00%> (ø)
... and 2 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@awrichar awrichar marked this pull request as ready for review August 1, 2022 19:49
pkg/dataexchange/plugin.go Outdated Show resolved Hide resolved
The "id" of a DX node will now have a suffix indicating a local node name.
This will allow multiple nodes to co-exist within a single instance of
FireFly, as long as they are stored on different local namespaces.

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
Copy link
Contributor

@peterbroadhurst peterbroadhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions going through @awrichar that I'd like to clarify before this merges.

One more general one - I was expecting to find something around this check:

if node.Parent.Equals(localOrg.ID) {
l.Debugf("Skipping send of batch for local node %s for group=%s node=%s (%d/%d)", batch.ID, batch.Group, node.ID, i+1, len(nodes))
continue
}

Either a complete removal of it, or a tweak to it. Could you help me understand why it's still there?

@@ -201,13 +201,6 @@ func (e *Ethereum) AddFireflySubscription(ctx context.Context, namespace core.Na
return "", err
}

switch firstEvent {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking out for why this change is in this PR - hoping I answer this later in the review

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... I didn't (or I just missed it) - mind adding an explanation of this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this is just a general cleanup item that crept into this branch. Nothing to do with the PR really, but this same check is performed again in the createSubscription() method ultimately called from this flow. I can keep it here or pull it out to a separate PR if it seems extraneous.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving it here is fine - thanks for adding the clarification.

internal/dataexchange/ffdx/ffdx.go Outdated Show resolved Hide resolved
@@ -210,8 +235,14 @@ func (h *FFDX) beforeConnect(ctx context.Context) error {
if h.needsInit {
h.initialized = false
var status dxStatus
var body []fftypes.JSONObject
h.nodesMutex.Lock()
for _, node := range h.nodes {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about where the namespace is in this, as where the APIs have fallen with FFDX I'm not sure it's there.
Going to work through it here, and hopefully understand why it's not needed...

Specifically the scenario I'm thinking of is:

  • Namespace1: Joins remote namespace A - node name bob - so DX ID peer1/bob
  • Namespace2: Joins remote namespace B - node name bob - so DX ID peer1/bob

This means that two broadcasts go out, two two different namespaces, with the same fully qualified DX ID.
Or have I missed something?

If I've got this right, then any other party that is also in NS A and B on the multi-party system, will end up with two node objects - with different UUIDs - but the same DX identifier.

But the data-structure here is a map (not an array) so I guess DX will arbitrarily get one of them.

I guess the point is that in practice they have to be the same signing key and other details at the DX level, for it to work (and scratching my head I couldn't think of a way to exploit the overlap between namespaces - only ways to break my own node).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The peers advertised to DX are actually just peer1 (not peer1/bob). So my DX ends up knowing about all other DX peers across all multiparty namespaces that it participates in. It doesn't care about or store /bob (and doesn't accept it when posting to /peers... only accepts it and forwards it along when received as part of a message delivery). Now, that said - I'm not sure we've actually tested the re-init flow here, so I should take that as an action.

When FireFly receives a message, it decides to route it based on 1) mapping the qualified DX ID like peer1/bob to a local node name like bob and 2) looking up the correct orchestrator based on that node name and the remote namespace. You're correct that if two nodes on different namespaces share a DX ID like peer1/bob, then h.nodes will arbitrarily contain one of them. I think it all just happens to work because the mapping in (1) should be the same regardless of which node we happen to be looking at, and (2) will ensure it ultimately goes to the correct namespace. But it's worth another pass to see if this can be any cleaner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I updated to fully separate the node lists per remote namespace - hopefully that makes it a bit more deterministic.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per https://discord.com/channels/905194001349627914/971132225355657267/1004818976876007445 there's going to be follow-on work between @awrichar and @gabriel-indik to work this one through. We are moving forwards with this update to FF Core for now as it's holding up the line, and this is well understood.

Maintain separate node maps per remote namespace.
Use a single mutex instead of two slightly different ones.
Combine InitPeer and AddPeer into new AddNode method.
Remove one unused DX plugin method.
Explicitly provide a GetPeerID method instead of doing GetString("id").

Signed-off-by: Andrew Richardson <andrew.richardson@kaleido.io>
@awrichar
Copy link
Contributor Author

awrichar commented Aug 3, 2022

Regarding the changes to the "Skipping send of batch for local node" logic - that's in the next PR (#918). Didn't discover the need for it until I added the E2E test.

Copy link
Contributor

@peterbroadhurst peterbroadhurst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @awrichar - sorry for taking a while with the review 🙇

@peterbroadhurst peterbroadhurst merged commit 3873b2c into hyperledger:main Aug 4, 2022
@peterbroadhurst peterbroadhurst deleted the dxfiltering branch August 4, 2022 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants