-
-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up logging #2216
Clean up logging #2216
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #2216 +/- ##
==========================================
- Coverage 97.43% 97.42% -0.02%
==========================================
Files 102 102
Lines 3817 3800 -17
==========================================
- Hits 3719 3702 -17
Misses 98 98
|
6224ee9
to
da84af7
Compare
da84af7
to
e1a6603
Compare
9b8be87
to
c588979
Compare
loggable_event_type = event_type.capitalize | ||
log_error("#{loggable_event_type} sending failed", e, debug: configuration.debug) | ||
|
||
event_info = Event.get_log_message(event.to_hash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the past we relied on this message to look into the potentially problematic event payload. Do we have an alternative for this now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i want to move away from logging whole payloads for the end user, one can always log it in before_send
if one really wants. Even on our side, we spent a lot of logging resources in python by logging huge payloads once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging here should be triggered very rarely. Most expected errors should've been captured at HTTPTransport#send_data
(network errors...etc.) so they don't even reach here.
What's most likely captured here are errors from the before_send*
callbacks and event/envelope serialisation errors, which with the event data will make debugging much easier for users and us.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
send_data
re-raises here
sentry-ruby/sentry-ruby/lib/sentry/transport/http_transport.rb
Lines 65 to 69 in c588979
raise Sentry::ExternalError, error_info | |
end | |
rescue SocketError, *HTTP_ERRORS => e | |
on_error if respond_to?(:on_error) | |
raise Sentry::ExternalError.new(e&.message) |
so these will end up being logged for network errors too.
I can add back some logging but I'd make it debug
at best.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry that I missed it 🤦♂️
Ok let's remove it first and find a better place to log them if we later find it necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but I agree that debugging before_send
errors is not ideal. We've also discussed this internally.
handle_rate_limited_response(response) | ||
end | ||
handle_rate_limited_response(response) if has_rate_limited_header?(response) | ||
elsif response.code == "429" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋🏻 @sl0thentr0py @st0012 I was reading into the logic for 429s as we noticed errors weren't being raised anymore after upgrading from 5.14 to 5.16. This change seems to be the reason as 429s were included in the else block before which did call raise Sentry::ExternalError, error_info
.
Reading through the issue/PR, the changes seemed to only be related to updating logging, so I'm curious if this change not including raising an error for 429s was made on purpose. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charligoulet yes, this was deliberately done. In general, we shouldn't raise exceptions in userland for SDK internal things and if you were using this for seeing if you are limited on our SaaS offering, there is visibility in the product (Stats section).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the prompt response!
My concern here is around retrying rate-limited events. We are currently leveraging Sentry in the context of Fluentd (we built a plugin with the Ruby SDK).
Fluentd has a retry mechanism that leverages an exponential back-off strategy. For Fluentd to trigger retries, errors must be escalated from the SDK to Fluentd. For example, we're currently able to retry 503s and 504s rather than dropping events, because the errors for those are raised by the SDK. With this change, there is no way of programmatically knowing that a specific event has been rate-limited and that it should be retried. As I think you'll agree, digging through the Stats section or logs isn't realistic for retry purposes.
Unless I'm misunderstanding the docs, these events are also not being retried for us on the Sentry side. So without raising an error here and without Sentry retrying rate-limited events for us, isn't this a breaking change from a behavioural standpoint blocking users from retrying events that have been rate-limited and is essentially leaving us with the only option of dropping events in this scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for the explanation of your use case.
So a few points
- we never do a retry ever on our side, a dropped event is a dropped event
- the server's rate limiting is merely interpreted as 'stop doing stuff for a while'
- technically it is breaking yes, but this is considered internal to the http transport implementation and not really part of the API contract
That said, there are a couple of options for you now
- make your own transport deriving from
Sentry::HTTPTransport
and overridesend_data
there and do what you wish there - I can expose a
on_rate_limit
hook in the transport class similar toon_error
to make that easier so you just need to define that one method to retry
does that work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Option 2 (exposing an on_rate_limit
hook) seems like a great way forward.
Appreciate you taking the time to understand our context and coming up with a solution here! Let me know if you need anything further from my end. Looking forward to being able to use the hook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool I'll make a new issue to track that
closes #2031