Skip to content
This repository has been archived by the owner on Feb 3, 2023. It is now read-only.

don't get links from network if we are the authority #2183

Closed
wants to merge 34 commits into from

Conversation

zippy
Copy link
Member

@zippy zippy commented Apr 24, 2020

PR summary

This PR adds:

  • an optimization of not querying the DHT for get links requests when we know that we are the DHT authority for the base of the query.
  • two conductor fixes during error cases (like base hasn't arrived yet) in hold_aspects request:
    • stop incorrectly recording aspects as held
    • stop locking up the future
  • because sim2h doesn't get error messages from a hold_aspect request it must also no longer records aspects sent in those messages as held, and thus the conductor must explicitly send back a list of aspects held after a hold_aspect request
  • get authoring list doesn't really work because correct items won't be gossiped and the optimization now breaks this. Fix this by not responding with the list, and instead re-publishing.

testing/benchmarking notes

( if any manual testing or benchmarking was/should be done, add notes and/or screenshots here )

followups

( any new tickets/concerns that were discovered or created during this work but aren't in scope for review here )

changelog

  • if this is a code change that effects some consumer (e.g. zome developers) of holochain core, then it has been added to our between-release changelog with the format
- summary of change [PR#1234](https://github.com/holochain/holochain-rust/pull/1234)

documentation

@zippy zippy requested a review from lucksus April 24, 2020 16:43
Copy link
Collaborator

@lucksus lucksus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should at least make am_i_dht_authority_for_base() do what the name suggest as noted in my comment below.

Just a remark that so far we deliberately chose to not put code into core that knows about the DHT role of the node, and keep that concern confined to the network implementation. Since this is only reading DHT I don‘t see a problem adding it.

crates/core/src/workflows/get_link_result.rs Outdated Show resolved Hide resolved
@zippy zippy requested a review from lucksus April 29, 2020 19:41
Copy link
Collaborator

@lucksus lucksus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@zippy zippy requested a review from lucksus May 5, 2020 00:50
.expect("No state present when trying to respond with gossip list");

let mut address_map = AspectMap::new();
address_map.add(&aspect);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, this sends back a GossipList with just this one item in it? Isn't this causing sim2h to be confused about the gossip list of this node?

I really think we shouldn't do this (managing results of holding workflows) here at all and instead have sim2h properly (if it isn't) poll for gossip lists.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the gossiplist is additive on the sim2h side so this really acts as an acknowledgement of the hold_aspect request.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sim2h's state knowledge is append-only.

We were previously setting "agent-holding" when we sent the hold request, but that could fail - it's much more accurate for core to send a sparse holding list that results in an additional hold record.

r
} else {
let mut store = (*old_store).clone();
store.mark_hold_aspect_complete(id.clone(), hold_result);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite my last comment, I think it is a good thing to store the result of the holding workflow in the state as done here. (So we don't [always] rerun an already failed validation - and for debugging). I still think introducing the complexity of getting this result back to sim2h as a response for the holding request is too brittle and having sim2h (re-)poll for the gossip list (instead of marking something as hold because the request was sent) is the real fix here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that can't work in the end. Remember that sim2h is where the intelligence lies of who should hold what. It has the knowledge of who is in the sharding neighborhood. The only reason it asks for a gossip list is to update it's understanding of who is actually holding what data so it can correctly forward queries. But at the same time it should be making the decisions about who SHOULD hold what. These fixes are about getting that right.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this affecting the network code's reasoning about who should hold what?

@@ -49,6 +50,9 @@ pub struct DhtStore {
/// All the entry aspects that the network has told us to hold
holding_map: AspectMap,

/// Hold aspect requests from the network
queued_hold_aspect_requests: HashMap<ProcessUniqueId, Result<(), HolochainError>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused about the name. Should this be something like holding_attempt_results?


// currently sim2h asks for the authoring list and treats it just the same
//as the gossiping list, i.e. as data that you hold. This is a problem because
//it means that gossiping isn't actually working right. So instead of fixing that in
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woah, if sim2h is doing, that is a problem and I would consider it a defect.

Last time I looked, sim2h was using the authoring list to check if this node has something new that the DHT does not store yet, such that sim2h would request that data from the node and publish it to the according authorities.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that was incorrect. We could, in the sim2h case actually treat it that way, and we could add to sim2h the tracking of who authored what so that it does treat authors as authoritative on the DHT. The hack of having the conductor simply republish everything it authored when it receives the authoring list request serves the exact same purpose as what say above as long as sim2h also checks to not send the data again to nodes that it knows it's already sent the publish data too. I will be adding that.

result
);
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect this hack to cause much more strain on the whole network since sim2h handles publishes different to how it (used to) handle(s) authoring lists. Doesn't publishes always result in store requests send to the according DHT authorities? Which would now happen every time a node comes online...

I don't think this is the right solution. If sim2h does not treat authoring lists as authoring lists anymore, that should get back in.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot have authors marked as holding the data - because they are not authoritative - the meta info will be incomplete.

Because Sim2h maintains central knowledge of who holds what, a publish should become a no-op if the item has already saturated a neighborhood.

sim2h_handle.state().spawn_agent_holds_aspects(
(&*space_hash).clone(),
(&*agent_id).clone(),
data.entry.entry_address.clone(),
aspect_list.clone(),
);
);*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I regard this as the real fix for handling the case of failed holding requests.

@zippy
Copy link
Member Author

zippy commented May 11, 2020

This work has been split into multiple PRs: #2184 #2189

@zippy zippy closed this May 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants