-
Notifications
You must be signed in to change notification settings - Fork 267
don't get links from network if we are the authority #2183
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should at least make am_i_dht_authority_for_base() do what the name suggest as noted in my comment below.
Just a remark that so far we deliberately chose to not put code into core that knows about the DHT role of the node, and keep that concern confined to the network implementation. Since this is only reading DHT I don‘t see a problem adding it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
.expect("No state present when trying to respond with gossip list"); | ||
|
||
let mut address_map = AspectMap::new(); | ||
address_map.add(&aspect); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, this sends back a GossipList with just this one item in it? Isn't this causing sim2h to be confused about the gossip list of this node?
I really think we shouldn't do this (managing results of holding workflows) here at all and instead have sim2h properly (if it isn't) poll for gossip lists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the gossiplist is additive on the sim2h side so this really acts as an acknowledgement of the hold_aspect request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sim2h's state knowledge is append-only.
We were previously setting "agent-holding" when we sent the hold request, but that could fail - it's much more accurate for core to send a sparse holding list that results in an additional hold record.
r | ||
} else { | ||
let mut store = (*old_store).clone(); | ||
store.mark_hold_aspect_complete(id.clone(), hold_result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Despite my last comment, I think it is a good thing to store the result of the holding workflow in the state as done here. (So we don't [always] rerun an already failed validation - and for debugging). I still think introducing the complexity of getting this result back to sim2h as a response for the holding request is too brittle and having sim2h (re-)poll for the gossip list (instead of marking something as hold because the request was sent) is the real fix here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that can't work in the end. Remember that sim2h is where the intelligence lies of who should hold what. It has the knowledge of who is in the sharding neighborhood. The only reason it asks for a gossip list is to update it's understanding of who is actually holding what data so it can correctly forward queries. But at the same time it should be making the decisions about who SHOULD hold what. These fixes are about getting that right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this affecting the network code's reasoning about who should hold what?
crates/core/src/dht/dht_store.rs
Outdated
@@ -49,6 +50,9 @@ pub struct DhtStore { | |||
/// All the entry aspects that the network has told us to hold | |||
holding_map: AspectMap, | |||
|
|||
/// Hold aspect requests from the network | |||
queued_hold_aspect_requests: HashMap<ProcessUniqueId, Result<(), HolochainError>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confused about the name. Should this be something like holding_attempt_results
?
|
||
// currently sim2h asks for the authoring list and treats it just the same | ||
//as the gossiping list, i.e. as data that you hold. This is a problem because | ||
//it means that gossiping isn't actually working right. So instead of fixing that in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah, if sim2h is doing, that is a problem and I would consider it a defect.
Last time I looked, sim2h was using the authoring list to check if this node has something new that the DHT does not store yet, such that sim2h would request that data from the node and publish it to the according authorities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that was incorrect. We could, in the sim2h case actually treat it that way, and we could add to sim2h the tracking of who authored what so that it does treat authors as authoritative on the DHT. The hack of having the conductor simply republish everything it authored when it receives the authoring list request serves the exact same purpose as what say above as long as sim2h also checks to not send the data again to nodes that it knows it's already sent the publish data too. I will be adding that.
result | ||
); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would expect this hack to cause much more strain on the whole network since sim2h handles publishes different to how it (used to) handle(s) authoring lists. Doesn't publishes always result in store requests send to the according DHT authorities? Which would now happen every time a node comes online...
I don't think this is the right solution. If sim2h does not treat authoring lists as authoring lists anymore, that should get back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We cannot have authors marked as holding the data - because they are not authoritative - the meta info will be incomplete.
Because Sim2h maintains central knowledge of who holds what, a publish should become a no-op if the item has already saturated a neighborhood.
sim2h_handle.state().spawn_agent_holds_aspects( | ||
(&*space_hash).clone(), | ||
(&*agent_id).clone(), | ||
data.entry.entry_address.clone(), | ||
aspect_list.clone(), | ||
); | ||
);*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I regard this as the real fix for handling the case of failed holding requests.
PR summary
This PR adds:
testing/benchmarking notes
( if any manual testing or benchmarking was/should be done, add notes and/or screenshots here )
followups
( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )
changelog
documentation