Skip to content
This repository has been archived by the owner on Feb 1, 2022. It is now read-only.

2831 use metered logging instead rollbar #21

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

akaspin
Copy link
Contributor

@akaspin akaspin commented Sep 7, 2018

@akaspin akaspin force-pushed the 2831-metered-logging branch 9 times, most recently from 0cb2424 to 381b09b Compare September 7, 2018 15:05
@akaspin akaspin force-pushed the 2831-metered-logging branch from 381b09b to d737b37 Compare September 7, 2018 15:15
@gburanov
Copy link
Contributor

This approach completely removes rollar even for critical cases (that should be immediately reported). Why? Original ticket was only about some errors

@akaspin
Copy link
Contributor Author

akaspin commented Sep 11, 2018

@gburanov Do you mean https://github.com/remerge/rex/pull/21/files#diff-39bc10f00ee6210b92b42aae65670f40R60. This rollbar reporting catches panic and not crashes the app and floods rollbar. In comparison with old approach in this PR rollbar replaced with alert-log.

.travis.yml Outdated
@@ -1,7 +1,7 @@
sudo: false
language: go
go:
- 1.7.3
- "1.10"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not 1.11?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. 👍

@gburanov
Copy link
Contributor

gburanov commented Sep 11, 2018

But it make sense to be notified using rollbar when u have panic in your app. And if panic happens so often that is SPAMS, maybe it needs to be fixed.

@gburanov
Copy link
Contributor

Again, the original story IMHO was about error that happen very often (because of network or because of 3rd party problem) - but it will not panic in this case. Or?

@akaspin
Copy link
Contributor Author

akaspin commented Sep 11, 2018

@gburanov It's not a panic actually ;-) It's just lazy way to catch errors during request.

@gburanov
Copy link
Contributor

Ok, so now we catch ALL errors and send NOTHING to rollbar (only to log and to metrics).

Question: Why? We supposed to send ONLY repetitive errors to metrics instead of rollbar IMHO

@akaspin
Copy link
Contributor Author

akaspin commented Sep 11, 2018

@gburanov Rex (and Act) are deprecated. It's pain from first sight. But right now it's a opportunity to find approach. Rollbar is expensive solution. We have a lot of options:

  1. Logging with targeted appenders: namespaces, levels ... Example: all errors goes to log with metrics, all critical errors goes to rollbar synchronously.
  2. Own error tracking solution: Sentry, APM.
  3. ???

In current "catch all" approach divide errors (and panics) is pure pain. But the point is that app remains running after and may flood Rollbar ("your account will be upgraded" ... AAA!!!). This PR fixes issue.

@itszootime
Copy link
Contributor

I agree with @gburanov - we should only use the log+metric approach for these repetitive errors we expect, e.g. "invalid URL escape", "appnexus preview asset not found". We should still report unexpected errors to Rollbar. The Rollbar flooding issue can be mitigated with rate limiting (already set) and disabling the auto plan upgrade.

@akaspin akaspin force-pushed the 2831-metered-logging branch from 22b2bb6 to ebaf8e4 Compare September 11, 2018 18:16
@akaspin
Copy link
Contributor Author

akaspin commented Sep 11, 2018

@gburanov @itszootime Ok. Done. But note very interesting Rollbar behaviour in https://rollbar.com/remerge/act/items/12900/. Actually it's not "invalid url escape" but "read tcp 207.244.122.35:443->52.4.254.244:37530: use of closed network connection". This PR affects only Gin errors. ... Hmm. I was young and stupid. Sec.

UPD: Done again. Now rex reporting to rollbar only on panic inside handler. Panics are our issues - they ours. Other errors are related to other side.

}

if len(c.Errors) == 0 {
return
}

for _, err := range c.Errors {
RequestErrorWithStackSkip(ERR, c.Request, err, 3)
LogMeteredError(err,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skill why do we change it here from rollbar to logs ?

@gburanov
Copy link
Contributor

@akaspin but again, why?

  • @itszootime wrote "we should ONLY use the log+metric approach for these repetitive errors we expect"

  • And u wrote later "Now rex reporting to rollbar only on panic inside handler"

These are 2 different and controversial statements as for me

I think we need to use logs ONLY for the repetative errors specified in the tickect. ONLY. All the other errors should be sent to rolbar as they are now (no change here)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants