Clean up logging #2216

sl0thentr0py · 2024-01-02T16:11:48Z

closes #2031

codecov · 2024-01-02T16:13:53Z

Codecov Report

Merging #2216 (c588979) into master (49a628e) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2216      +/-   ##
==========================================
- Coverage   97.43%   97.42%   -0.02%     
==========================================
  Files         102      102              
  Lines        3817     3800      -17     
==========================================
- Hits         3719     3702      -17     
  Misses         98       98

Components	Coverage Δ
sentry-ruby	`98.13% <100.00%> (-0.02%)`	⬇️
sentry-rails	`95.01% <ø> (ø)`
sentry-sidekiq	`94.70% <ø> (ø)`
sentry-resque	`92.06% <ø> (ø)`
sentry-delayed_job	`94.44% <ø> (ø)`
sentry-opentelemetry	`100.00% <ø> (ø)`

Files	Coverage Δ
sentry-ruby/lib/sentry/client.rb	`100.00% <100.00%> (ø)`
sentry-ruby/lib/sentry/configuration.rb	`98.78% <100.00%> (ø)`
sentry-ruby/lib/sentry/event.rb	`98.55% <ø> (-0.25%)`	⬇️
sentry-ruby/lib/sentry/transport.rb	`99.09% <100.00%> (ø)`
sentry-ruby/lib/sentry/transport/http_transport.rb	`100.00% <100.00%> (ø)`

st0012 · 2024-01-03T11:01:15Z

sentry-ruby/lib/sentry/client.rb

-      loggable_event_type = event_type.capitalize
-      log_error("#{loggable_event_type} sending failed", e, debug: configuration.debug)
-
-      event_info = Event.get_log_message(event.to_hash)


In the past we relied on this message to look into the potentially problematic event payload. Do we have an alternative for this now?

i want to move away from logging whole payloads for the end user, one can always log it in before_send if one really wants. Even on our side, we spent a lot of logging resources in python by logging huge payloads once.

The logging here should be triggered very rarely. Most expected errors should've been captured at HTTPTransport#send_data (network errors...etc.) so they don't even reach here.
What's most likely captured here are errors from the before_send* callbacks and event/envelope serialisation errors, which with the event data will make debugging much easier for users and us.

send_data re-raises here

sentry-ruby/sentry-ruby/lib/sentry/transport/http_transport.rb

Lines 65 to 69 in c588979

raise Sentry::ExternalError, error_info

end

rescue SocketError, *HTTP_ERRORS => e

on_error if respond_to?(:on_error)

raise Sentry::ExternalError.new(e&.message)

so these will end up being logged for network errors too.

I can add back some logging but I'd make it debug at best.

Sorry that I missed it 🤦‍♂️
Ok let's remove it first and find a better place to log them if we later find it necessary.

Yes but I agree that debugging before_send errors is not ideal. We've also discussed this internally.

charligoulet · 2024-01-30T18:36:31Z

sentry-ruby/lib/sentry/transport/http_transport.rb

-          handle_rate_limited_response(response)
-        end
+        handle_rate_limited_response(response) if has_rate_limited_header?(response)
+      elsif response.code == "429"


👋🏻 @sl0thentr0py @st0012 I was reading into the logic for 429s as we noticed errors weren't being raised anymore after upgrading from 5.14 to 5.16. This change seems to be the reason as 429s were included in the else block before which did call raise Sentry::ExternalError, error_info.

Reading through the issue/PR, the changes seemed to only be related to updating logging, so I'm curious if this change not including raising an error for 429s was made on purpose. Thanks!

@charligoulet yes, this was deliberately done. In general, we shouldn't raise exceptions in userland for SDK internal things and if you were using this for seeing if you are limited on our SaaS offering, there is visibility in the product (Stats section).

Thanks for the prompt response!

My concern here is around retrying rate-limited events. We are currently leveraging Sentry in the context of Fluentd (we built a plugin with the Ruby SDK).

Fluentd has a retry mechanism that leverages an exponential back-off strategy. For Fluentd to trigger retries, errors must be escalated from the SDK to Fluentd. For example, we're currently able to retry 503s and 504s rather than dropping events, because the errors for those are raised by the SDK. With this change, there is no way of programmatically knowing that a specific event has been rate-limited and that it should be retried. As I think you'll agree, digging through the Stats section or logs isn't realistic for retry purposes.

Unless I'm misunderstanding the docs, these events are also not being retried for us on the Sentry side. So without raising an error here and without Sentry retrying rate-limited events for us, isn't this a breaking change from a behavioural standpoint blocking users from retrying events that have been rate-limited and is essentially leaving us with the only option of dropping events in this scenario?

I see, thanks for the explanation of your use case.
So a few points

we never do a retry ever on our side, a dropped event is a dropped event

the server's rate limiting is merely interpreted as 'stop doing stuff for a while'

technically it is breaking yes, but this is considered internal to the http transport implementation and not really part of the API contract

That said, there are a couple of options for you now

make your own transport deriving from Sentry::HTTPTransport and override send_data there and do what you wish there

I can expose a on_rate_limit hook in the transport class similar to on_error to make that easier so you just need to define that one method to retry

does that work?

Option 2 (exposing an on_rate_limit hook) seems like a great way forward.

Appreciate you taking the time to understand our context and coming up with a solution here! Let me know if you need anything further from my end. Looking forward to being able to use the hook.

cool I'll make a new issue to track that

sl0thentr0py requested a review from st0012 January 2, 2024 16:11

sl0thentr0py force-pushed the neel/logger-cleanup branch 2 times, most recently from 6224ee9 to da84af7 Compare January 2, 2024 17:10

Clean up logging

e1a6603

sl0thentr0py force-pushed the neel/logger-cleanup branch from da84af7 to e1a6603 Compare January 2, 2024 17:11

Fix changelog

c588979

sl0thentr0py force-pushed the neel/logger-cleanup branch from 9b8be87 to c588979 Compare January 2, 2024 17:49

st0012 reviewed Jan 3, 2024

View reviewed changes

st0012 approved these changes Jan 4, 2024

View reviewed changes

sl0thentr0py merged commit 918b83d into master Jan 4, 2024
122 of 123 checks passed

sl0thentr0py deleted the neel/logger-cleanup branch January 4, 2024 13:22

charligoulet reviewed Jan 30, 2024

View reviewed changes

sl0thentr0py mentioned this pull request Jan 31, 2024

Expose on_rate_limit as part of the transport contract #2239

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up logging #2216

Clean up logging #2216

sl0thentr0py commented Jan 2, 2024

codecov bot commented Jan 2, 2024 •

edited

Loading

st0012 Jan 3, 2024

sl0thentr0py Jan 3, 2024

st0012 Jan 4, 2024

sl0thentr0py Jan 4, 2024 •

edited

Loading

st0012 Jan 4, 2024

sl0thentr0py Jan 4, 2024

charligoulet Jan 30, 2024

sl0thentr0py Jan 30, 2024

charligoulet Jan 31, 2024

sl0thentr0py Jan 31, 2024

charligoulet Jan 31, 2024

sl0thentr0py Jan 31, 2024

	raise Sentry::ExternalError, error_info
	end
	rescue SocketError, *HTTP_ERRORS => e
	on_error if respond_to?(:on_error)
	raise Sentry::ExternalError.new(e&.message)

Clean up logging #2216

Clean up logging #2216

Conversation

sl0thentr0py commented Jan 2, 2024

codecov bot commented Jan 2, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sl0thentr0py Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 2, 2024 •

edited

Loading

sl0thentr0py Jan 4, 2024 •

edited

Loading