-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SDK Thrown Unhandled Exception is OTEL Collector if not available #1447
Comments
Hi @yahorsi are you certain that it crashed the app? The export failure exception has been captured and logged (and you can configure where the logs go) I updated one of our example scripts here to use batch span processor, and it seems to me to behave correctly: # php examples/traces/exporters/otlp_http.php
Starting OTLP example
OTLP example complete!
OpenTelemetry: [error] Export failure [exception] Export retry limit exceeded [previous] cURL error 6: Could not resolve host: collector (see https://curl.haxx.se/libcurl/c/libcurl-errors.html) for http://collector:4318/v1/traces
#0 /usr/src/myapp/src/Contrib/Otlp/SpanExporter.php(38): OpenTelemetry\SDK\Common\Export\Http\PsrTransport->send('\n\x98\r\n\x9E\x06\n\e\n\thost....', NULL)
#1 /usr/src/myapp/src/SDK/Trace/SpanProcessor/BatchSpanProcessor.php(247): OpenTelemetry\Contrib\Otlp\SpanExporter->export(Array)
#2 /usr/src/myapp/src/SDK/Trace/SpanProcessor/BatchSpanProcessor.php(190): OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor->flush('shutdown', NULL)
#3 /usr/src/myapp/src/SDK/Trace/TracerSharedState.php(71): OpenTelemetry\SDK\Trace\SpanProcessor\BatchSpanProcessor->shutdown(NULL)
#4 /usr/src/myapp/src/SDK/Trace/TracerProvider.php(98): OpenTelemetry\SDK\Trace\TracerSharedState->shutdown(NULL)
#5 /usr/src/myapp/examples/traces/exporters/otlp_http.php(53): OpenTelemetry\SDK\Trace\TracerProvider->shutdown()
#6 {main}
# echo $?
0 The return code from CLI is 0 (success). If you change the log destination, for example dropping |
@yahorsi do you have a custom error handler that might be turning a NOTICE into an exception? |
I am also getting this error, also noted in #1448. E.g.,
My config is fairly simple, no custom error handler: $loggerProvider = \OpenTelemetry\SDK\Logs\LoggerProvider::builder()
->addLogRecordProcessor(
new \OpenTelemetry\SDK\Logs\Processor\SimpleLogRecordProcessor(
new \OpenTelemetry\Contrib\Otlp\LogsExporter(
(new \OpenTelemetry\Contrib\Otlp\OtlpHttpTransportFactory())
->create(
'http://localhost:4318/v1/logs',
'application/x-protobuf'
)
)
)
)
->setResource(
\OpenTelemetry\SDK\Resource\ResourceInfoFactory::emptyResource()
->merge(
\OpenTelemetry\SDK\Resource\ResourceInfo::create(
\OpenTelemetry\SDK\Common\Attribute\Attributes::create([
\OpenTelemetry\SemConv\ResourceAttributes::SERVICE_NAME => 'MyService',
\OpenTelemetry\SemConv\ResourceAttributes::SERVICE_VERSION => '1.0.0'
])
)
)
)
->build(); This is running our PHP app as a daemon via a systemd service file. I can POST via cURL from the shell without issue. As @brettmc mentioned in both issues, I could disable the logging to hide these errors, but from the wording I'm assuming logs are being lost, not that it will keep trying, which is concerning. PECL extensions:
Composer packages:
|
More testing, each "export failure" is reported as an error, not a notice. It also blocks our app for up to 40 seconds each time the export error occurs, so a process that normally takes ~6 seconds is taking ~5 minutes, and causing other async HTTP clients to time out so our process is not even completing. |
What's going on with your local collector? My understanding is that the collector was designed to be a high-availability, high-throughput solution to the problem of offloading telemetry data. For a shared-nothing PHP setup, it is usually a critical piece of infrastructure because each PHP process has to complete really quickly so as not to exhaust all your web server workers.
Yes, they'll be lost. We don't have a way to configure retries because the spec hasn't defined anything (eg, see open-telemetry/opentelemetry-specification#3639). In this scenario, you likely don't want it to keep retrying forever. I'd start by looking at your collector setup. In my experience, it's been very reliable. If for some reason you cannot get the collector working reliably, or you just don't trust it and want 100% telemetry collection, you could look at otlp/stdout exporting, or even revert to your preferred logging system for that signal. If you're happy with some manual configuration for the time being, you can also look into otlp/file (which is the same thing as otlp/stdout but with one file per signal, and an actual file instead of stdout)
Is the issue that the errors are directly interfering with your application, or just that they exist? If the former, you could send them to the php error_log instead, or some other destination that won't trigger an error_handler. How you configure otel to behave when the collector is unavailable is an important consideration. Clearly waiting for the default 10s timeout, then retrying x3 is very time-consuming. With a shared-nothing PHP webserver, each request is delayed and you'll likely run out of workers very quickly. |
Out of scope of this project, but for completeness, our proxy collector is failing when a downstream collector is unreachable, so filed an issue with the proxy repo to separate those issues. grafana/alloy#2424 For the immediate issue here, shortening the timeouts to 500ms and no retries, our app is still timing out, so we may have to go back to logging to disk and scraping that. |
Steps to reproduce
Start Open Telemetry instrumented app while OTEL Collector is not available, app cannot start
What is the expected behavior?
App does not crash
What is the actual behavior?
App crashes
The text was updated successfully, but these errors were encountered: