Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle synchronous events for rolling upgrades. #601

Merged
merged 2 commits into from
Jun 12, 2014

Conversation

cmeiklejohn
Copy link
Contributor

Add handling for synchronous events, related to #600. However, I can't get replication_upgrade to get past the following:

19:35:42.272 [info] Wait until ring converged on ['dev4@127.0.0.1','dev5@127.0.0.1','dev6@127.0.0.1']

@kellymclaughlin do you mind running?

@cmeiklejohn cmeiklejohn changed the title Features/csm/sync send event Handle synchronous events instead of switching to async. Jun 11, 2014
@cmeiklejohn cmeiklejohn changed the title Handle synchronous events instead of switching to async. Handle synchronous events for rolling upgrades. Jun 11, 2014
@kellymclaughlin kellymclaughlin added this to the 2.0-RC milestone Jun 12, 2014
@kellymclaughlin kellymclaughlin self-assigned this Jun 12, 2014
@kellymclaughlin
Copy link
Contributor

I have determined the issue plaguing the replication_upgrade test was the cancellation of fullsync when an attempt to start and wait for fullsync completion failed. This step is actually unnecessary because the only reason this condition occurs is that fullsync simply fails to start on the initial call to riak_repl_console:start_fullsync/1 due to a race condition. So it is never the case that fullsync needs to be canceled. This PR is open to remove that cancellation ad using the changes represented the replication_upgrade test passes for me. The PR contains an additional change to replace the wait function used to determine when a replication connection has been established after a node upgrade. It is not necessary for the test to pass, but should decrease the test execution time.

We should investigate why attempting to cancel fullsync when it is not running causes this problem, but in my view that seems like a separate and less severe issue.

%% Unknown event (ignored)
wait_for_partition(Event, State) ->
lager:debug("Full-sync with site ~p; ignoring event ~p",
[State#state.sitename, Event]),
{next_state, wait_for_partition, State}.

build_keylist(Command, State)
when Command == cancel_fullsync ->
build_keylist(Command, State) when Command == cancel_fullsync ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is preexisting cruft, but the guards here could be eliminated in favor an explicit pattern match on the Command atoms and would make the code a bit more clear and concise.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer I see that a similar pattern is used throughout this module so perhaps it is not worth worrying about at this time.

@kellymclaughlin
Copy link
Contributor

These changes look good and in concert with the riak_test PR mentioned above it resolves the replication_upgrade test failure.

👍

borshop added a commit that referenced this pull request Jun 12, 2014
Handle synchronous events for rolling upgrades.

Reviewed-by: kellymclaughlin
@cmeiklejohn
Copy link
Contributor Author

@borshop merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants