Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Behavior when tags configured is not available or propagated #28

Closed
mesutcelik opened this issue May 29, 2017 · 13 comments
Closed

Behavior when tags configured is not available or propagated #28

mesutcelik opened this issue May 29, 2017 · 13 comments

Comments

@mesutcelik
Copy link
Contributor

We need to define a behavior if tags can't be retrieved for some period of time and fail fast at the end if tag can't be resolved.

see the issue --> hazelcast/hazelcast#10537

@ghost
Copy link

ghost commented Jul 20, 2017

@mesutcelik we added configurable timeout for queries. I believe we can close this issue.

With PR: #30

@mesutcelik
Copy link
Contributor Author

Can you please reference the PR here?

@emrahkocaman emrahkocaman added this to the 2.0.2 milestone Jul 24, 2017
@emrahkocaman
Copy link
Contributor

Closing this issue, resolved by #30. (This line to be specific.)

@dimas
Copy link

dimas commented Feb 4, 2018

Chaps, are you sure the original issue is resolved by this PR?
This comment hazelcast/hazelcast#10537 (comment) explains how I see it - the important bit is not what timeout you have for AWS HTTP requests but how do you make sure if AWS response is good to use. (If the node cannot see itself in the response, it is probably just too early to trust that response - tags has not propagated yet...)

@mesutcelik
Copy link
Contributor Author

I am reopening the issue. You basically say we need to retry if we see empty list here. It has to return at least itself in case of a single-member or first member case.

@mesutcelik mesutcelik reopened this Feb 6, 2018
@dimas
Copy link

dimas commented Feb 9, 2018

Yes, correct.
Thank you.
In our custom scripts we run before starting Hazelcast we do do not check that node can find itself, we only check if list is empty or not. Even if the node cannot find itself it is probably ok as long as it sees another one - it will try to cluster with it.

@leszko
Copy link

leszko commented May 22, 2018

I think the detection of AWS tags are already propagated shouldn't be part of the responsibilities of the AWS Discovery. Tag is not propagated, so the instance is not discovered, which may result in the Split Brain, but the nodes will rediscover each other after some time.

After merging #64, it will work as follows.

  1. Set tag on instances
  2. Start Hazelcast members
  3. Tags are no propagated yet, so members won't discover each other - each member will create its own cluster (Split Brain)
  4. Tags are propagated
  5. Hazelcast member will rediscover each other and create one cluster

Split Brain is an issue, however, I think it's the correct behavior, nodes don't see each other, so they don't connect. Or am I missing something?

Technically, we could check that the node detects itself and if it doesn't then wait some time and retry, however such behavior would cause other issues, because tags can be propagates with different delays, so we can end up with Split Brain anyway.

@leszko
Copy link

leszko commented May 30, 2018

Moving the discussion started in #64 . Last comment from Mesut: "empty means to me either tags are not propagated as described in #28 or there is no tag configured although they are defined in hazelcast config.
It is good idea to move the discussion to #28"

@leszko
Copy link

leszko commented May 30, 2018

@mesutcelik So you described two cases:

  1. Tags are (yet) not propagated => the nodes will start separately (Split Brain) and rejoin when the tags are propagated (I think it's the correct behavior).
  2. No tag is configured, but they are defined in hazelcast config (misconfiguration) - it would be nice to give a user a meaningful message how to fix it, however I don't think it's possible to distinguish between it and point 1.

So IMO, the current behavior is fine. I'd just test that it's fine and (maybe) document it. Wdyt?

@mesutcelik
Copy link
Contributor Author

The original issues created is hazelcast/hazelcast#10537 (comment) and the delay mechanism that is used by @dimas was probably because they didn't want to see temporary Split Brain.

@dimas Can you please comment on this?

@leszko
Copy link

leszko commented Jul 9, 2018

Closing due to inactivity.

@leszko leszko closed this as completed Jul 9, 2018
@dimas
Copy link

dimas commented Jul 11, 2018

How temporary is Temporary Split Brain? I am not sure cluster ever healed in our case but, again, it was years ago.

@leszko
Copy link

leszko commented Jul 12, 2018

@dimas , I think in your case it didn't retried, but it's fixed now #71.

The possible "Temporary Split Brain" should take up to a few minutes after tags are propagated (I tried and in my case it took 4 min).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants