-
Notifications
You must be signed in to change notification settings - Fork 754
feat(http): add error handling for exporting #4709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat(http): add error handling for exporting #4709
Conversation
|
|
d3d4031 to
afb76f2
Compare
| try: | ||
| resp = self._export(serialized_data, deadline_sec - time()) | ||
| except Exception as error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you mind opening an issue or point to some existing issue for the behavior you're encountering? We can discuss possible fixes there.
I think there is a bug in _export() on L157, where the code assumes the connection will succeed, and so the retry loop here is never executed. However the raised exception should be ultimately caught here
opentelemetry-python/opentelemetry-sdk/src/opentelemetry/sdk/_shared_internal/__init__.py
Lines 167 to 170 in d327927
| except Exception: # pylint: disable=broad-exception-caught | |
| self._logger.exception( | |
| "Exception while exporting %s.", self._exporting | |
| ) |
If you just want the logs to go away, you can set a filter on the opentelemetry.sdk._shared_internal logger or disable it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea so the issue for me was that the logger logged the entire error trace to the std.out, which totally makes sense given the code that you shared.
So my goal was to still get to know when something fails and not just disable the logger, as I feel like the otlp_exporters should handle this internally.
On top of that I would like to generally comment that doing a retry over a try-except in _export() feels kinda odd to me.
Thanks for taking your time to look into this! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you still think I should open an issue for that generally?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aabmass, from what I understand, this is basically when the HTTP endpoint is not available and the urllib call is unable to complete the request, and this outputs the full stacktrace of urllib with the connection refused error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an appropriate fix could be in export fix the retry logic under a try/catch, like:
try:
resp = self._export(serialized_data, remaining_time)
except Exception as exc:
is_retryable = True
else:
is_retryable = _is_retryable(resp)
and use that variable in the if at L183. Maybe needs more adjustments. wdyt @aabmass @DylanRussell ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the issues is, that the endpoint is not available in that case.
So you propose that the retry loop is executed for as long as _MAX_RETRYS in case the endpoint is not reachable yea? And if the loop exits without success then the logger creates a error message anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only issue with that is that the logging message won't work, because it uses the status code and the text of the response. However the response doesn't exist in the case of the endpoint not being available 😄
So if the retry logic should be executed in case the endpoint is not available I need to introduce some more logic to the functions like setting the resp=None by default and after the retry loop checking if it still is and then log a different error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah makes sense to me @pafi-code
|
Okay so I know that the pipeline is failing and I assume this PR should be rebased, but for me there is still the question if my proposed fix is actually how it should be handled. Would appreciate some more feedback here. Also I've created an issue related to this PR: #4712 |
afb76f2 to
f581f22
Compare
Description
When using the OTEL http exporters the std.out gets polluted by unhandled error traces. I don't think that this should be the desired result. This happens when the endpoint is not available when exporting telemetry. This can be cause by different events but in general I think it should be better handled.
Therefore I added error handling with more specific and shorted error messages.
Type of change
I'm not sure if the change type is rather fix or feature, as the service was still running.
How Has This Been Tested?
Does This PR Require a Contrib Repo Change?
Checklist: