Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add per key actor epochs #1040

Closed
wants to merge 17 commits into from
Closed

Add per key actor epochs #1040

wants to merge 17 commits into from

Conversation

russelldb
Copy link
Member

See riak_kv#679 and the associated platform_task RFC and summary.

Requires basho/riak_core#646

This PR addresses the "doomstone", backup-restore, and some byzantine flavours of the kv679 bug.

The RFC explains the mechanism in detail but briefly:

  1. add a persisted to disk vnode counter (persisted with leases, aysnc)
  2. when ever a key is written to for the "first time" by a vnode, create an epoch actor for the key by concatenating vnodeid+counter (and increment the counter)

This ensures that a first time write for a key gets a new actor, this is the epoch for the key. It means we don't mix up deleted+re-created keys {a,1} event with the original {a, 1} event for some key, by ensuring an actor per epoch, without causing a keyspace wide actor explosion.

I hope that is enough details for the review to start. I'll start work on EQC for the status_mgr and the new vnode functions in the mean time.

use the counter to create a per epoch key when local not found.
When it hits that limit create a new vnodeid.
Start a new epoch if the local object has a lower epoch for the
vnode id than the incoming object (no epoch is lower, no epich is zero)
Bad match for re-refactord put_merge return.
Correct return type for highest_actor
@russelldb russelldb force-pushed the bug/rdb/gh679-actor-epochs branch from 194e222 to dc581bd Compare November 11, 2014 15:06
use the bare vnodeid as long as possible, only start a new epoch
when needed. In a way the bare vnodeid is just epoch zero, so use it.

Also, consider the incoming counter when deciding about new epochs.
An incoming clock with the same actor+epoch but greater counter is
a hint that a byzantine failure occurred and a new epoch is needed
NOTE: make dialyzer exits with an error, but no information. Could
do with some help on that.
State;
maybe_lease_counter(State) ->
#state{status_mgr_pid=MgrPid, counter=CS=#counter_state{cnt=Cnt, lease=Lease, lease_size=LeaseSize}} = State,
%% @TODO configurable??
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this should probably be configurable/tunable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It is hard to decide if it should be. Trying to figure out a scenario where you'd want to change this. And adding it adds more chance to shoot yourself in the foot. And more complex validation of the config item. And can it change in the vnode's lifetime? Do we need to add it to state, or check the env var every iteration?

I'm not against making it configurable, I'd just like a better understanding of the benefit vs. risk/complexity.

vnode_pid :: undefined | pid()
}).

-type status() :: [proplists:property()] | [].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The empty list is not strictly necessary in this type. Also, is there any case where you will have bare atoms in the list? If not, orddict:orddict() might be better all around.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, it is inherited.

@russelldb
Copy link
Member Author

I rebased against 2.0 before it moves too far away. I addressed your comments. Sadly I can't push to this branch now. Instead I opened a new PR and ref'd this one from riak_kv#1053

@russelldb russelldb closed this Nov 24, 2014
@seancribbs seancribbs deleted the bug/rdb/gh679-actor-epochs branch April 1, 2015 23:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants