Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Make metrics mgr fail fast when unexpected errors happen. #1473

Merged
merged 3 commits into from
Oct 6, 2016

Conversation

maosongfu
Copy link
Contributor

@maosongfu maosongfu commented Oct 5, 2016

Make metrics mgr fail fast when unexpected errors happen to avoid dangling process.

This is a bug reported by Twitter internally.

String sinkId = threadName;
Integer thisSinkRetryAttempts = sinksRetryAttempts.remove(sinkId);
if (exception instanceof Error || thread.getId() == mainThreadId) {
LOG.severe("Would not recover from error. Metrics Manager halts immediately");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of logging the exception separately above, can we have an if/then with a different log message in each. That way it's clear what's causing what.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry but I don't understand. This implementation logs exception at the start once; and then followings are handling logic with a different if/then ... Those are not causes but handling logics?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion was instead of doing this:

- log all the details of the exception
if it's the main thread
  - log that we have to die (due to the previously logged exception)

to the log reader it will be more clear if we consolidate those two:

if it's the main thread
  - log all the details of the exception and that we have to die
else
  - log all the details of the exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Will do that.

Runtime.getRuntime().halt(1);
}

String sinkId = thread.getName();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sinceThreadName? it's not the thread id, it's the name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current implementation, we enforce the thread-name the same as sink-id. Do you think we need to add an extra map to record it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting an extra map, just consistency in variable naming. Above we assign getId() to a variable named mainThreadId:

mainThreadId = Thread.currentThread().getId();

bit here we assign getName() to a variable named sinkId (instead of sinkName):

sinkId = thread.getName();

this is a minor nit though and can be disregarded if you think sinkId is more representative bases on it's usage.

Copy link
Contributor Author

@maosongfu maosongfu Oct 6, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments in the source code to avoid confusion. sinkId is more representative, since it is defined inside the metrics_sink.yaml

@kramasamy
Copy link
Contributor

@maosongfu - the title says that it is the stream manager instead of metrics manager? Can you please correct?

@maosongfu maosongfu changed the title Make stream mgr fail fast when unexpected errors happen. Make metrics mgr fail fast when unexpected errors happen. Oct 5, 2016
@kramasamy
Copy link
Contributor

@maosongfu - is this completed?

@maosongfu
Copy link
Contributor Author

@kramasamy Updated.

@billonahill
Copy link
Contributor

👍

@maosongfu maosongfu merged commit b859b72 into master Oct 6, 2016
@maosongfu maosongfu deleted the mfu/fail_fast_metricsmgr_when_error branch October 6, 2016 23:25
nicknezis pushed a commit that referenced this pull request Sep 14, 2020
* Make stream mgr fail fast when unexpected errors happen.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants