Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of endpoints zero'd on SC restart #20

Closed
dannycohen opened this issue Oct 7, 2013 · 4 comments
Closed

Number of endpoints zero'd on SC restart #20

dannycohen opened this issue Oct 7, 2013 · 4 comments
Assignees
Labels
Type: Bug Type: Bug
Milestone

Comments

@dannycohen
Copy link

The number of expected endpoints in SP is zero'd when SC is restarted.

@andreasohlund
Copy link
Member

@dannycohen is this really an issue? (since it will "refill" as soon as the heartbeats comes in again which is < 30 s)

@dannycohen
Copy link
Author

Yes, this is really an issue.

Monitored endpoints rely on expectation: SC expects that the heartbeats for X number of endpoints are received within a timeout period, based on prior registration of the endpoints (see #15).

If we do not have a persistent list of endpoints we will encounter the following (bad and inconsistent) scenario:

  1. Opie restarts the system, including 10 endpoints, SC and SP.
  2. Opie does some ops work and restarts all the endpoints, SC and SP
  3. Due to a configuration issue, some of the endpoints fail to start
  4. SC and SP restart properly
  5. Since SC's list of endpoints was zero'd, the heartbeat indicator is green and there are no events indicating that the endpoints failed, and all seems to be well
  6. The only indication that there is a major problem is the small number indicating the number of active endpoints;
  7. Only if Opie remembers the number of endpoints he needs to monitor, and only if he pays attention, will he understand that the green indicator for the active endpoints hides a very big problem.

@andreasohlund
Copy link
Member

Bumping to RC since I don't see this as a blocker for Beta1

@ghost ghost assigned johnsimons Nov 11, 2013
@johnsimons
Copy link
Member

Fixed in Particular/ServiceControl@cd70011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug Type: Bug
Projects
None yet
Development

No branches or pull requests

3 participants