Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OTLPExporterBase reports connection issues to otel sink #4444

Closed
Seikilos opened this issue Jan 27, 2024 · 7 comments
Closed

OTLPExporterBase reports connection issues to otel sink #4444

Seikilos opened this issue Jan 27, 2024 · 7 comments
Assignees
Labels
question User is asking a question not related to a new feature or bug

Comments

@Seikilos
Copy link

Seikilos commented Jan 27, 2024

What happened?

Steps to Reproduce

// Set a OTEL_EXPORTER_OTLP_ENDPOINT to an IP that does not exist
console.log('Starting instrumentation');

Expected Result

Nothing

Actual Result

{
    "stack": "OTLPExporterError: Request Timeout\n\r
                at ClientRequest.<anonymous> (.\\node_modules\\@opentelemetry\\otlp-exporter-base\\build\\src\\platform\\node\\util.js:114:29)\n\r
                at ClientRequest.emit (node:events:513:28)\n\r
                at Socket.socketCloseListener (node:_http_client:475:11)\n\r
                at Socket.emit (node:events:525:35)\n\r
                at TCP.<anonymous> (node:net:322:12)\n\r
                at TCP.callbackTrampoline (node:internal/async_hooks:130:17)",
    "message": "Request Timeout",
    "name": "OTLPExporterError",
    "code": "ECONNRESET"
}

Additional Details

I regard telemetry as being optional, e.g. when telemetry sink is available, it should send data to it. Sometimes the otel receiver is in maintenance, etc. This causes the applications to spam ECONNRESET error messages. Since they are batched, they sometimes come through to the otel receiver when it becomes available and spam the observability platform with errors, that the service could not connect to the destination.

IMHO this is undesired (but I am open for more feedback on my view!) or at least the developers should be able to filter requests on OTLPExporterError. But afaik I have not been able to find a filter behaviour on OTLPExporterError or his parent classes

OpenTelemetry Setup Code

const loggerProvider = new LoggerProvider({
  });

  // Currently no impl of BatchLogRecordProcessorBase exists
  loggerProvider.addLogRecordProcessor(new SimpleLogRecordProcessor(new ConsoleLogRecordExporter()));
  loggerProvider.addLogRecordProcessor(new SimpleLogRecordProcessor(new OTLPLogExporter({
    keepAlive: true
  })));
  logsAPI.logs.setGlobalLoggerProvider(loggerProvider);

package.json

{
  "name": "my_service",
  "version": "1.0.1",
  "description": "",
  "main": "index.js",
  "scripts": {
    "build": "tsc",
  },
  "author": "",
  "license": "ISC",
  "devDependencies": {
    "@types/chai": "^4.3.6",
    "@types/chai-arrays": "^2.0.0",
    "@types/chai-as-promised": "^7.1.6",
    "@types/config": "^3.3.1",
    "@types/express": "^4.17.17",
    "@types/mocha": "^10.0.1",
    "@types/node": "^20.5.9",
    "@types/parse-color": "^1.0.1",
    "@types/sinon": "^10.0.16",
    "@types/sinon-chai": "^3.2.9",
    "@types/supertest": "^2.0.12",
    "chai": "^4.3.8",
    "chai-arrays": "^2.2.0",
    "chai-as-promised": "^7.1.1",
    "mocha": "^10.2.0",
    "sinon": "^15.2.0",
    "sinon-chai": "^3.7.0",
    "supertest": "^6.3.3",
    "ts-node-dev": "^2.0.0",
    "typescript": "^5.2.2"
  },
  "dependencies": {
    "@opentelemetry/api": "^1.7.0",
    "@opentelemetry/exporter-logs-otlp-http": "^0.46.0",
    "@opentelemetry/exporter-metrics-otlp-proto": "^0.46.0",
    "@opentelemetry/exporter-trace-otlp-proto": "^0.46.0",
    "@opentelemetry/instrumentation-express": "^0.34.1",
    "@opentelemetry/instrumentation-http": "^0.46.0",
    "@opentelemetry/resources": "^1.19.0",
    "@opentelemetry/sdk-logs": "^0.46.0",
    "@opentelemetry/sdk-metrics": "^1.19.0",
    "@opentelemetry/sdk-node": "^0.46.0",
    "@opentelemetry/sdk-trace-base": "^1.19.0",
    "@opentelemetry/semantic-conventions": "^1.19.0",
    "config": "^3.3.9",
    "express": "^4.18.2"
  }
}

Relevant log output

No response

@Seikilos Seikilos added bug Something isn't working triage labels Jan 27, 2024
@pichlermarc
Copy link
Member

I think we don't log to the console anywhere except for when the diagnostics logging is turned on.
Do you have a line in the setup code that looks something like this?

diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.WARN);

This is specifically intended for troubleshooting and having that info during troubleshooting is important. To turn it off you can remove the line or set DiagLogLevel.NONE).

The same is true for the OTEL_LOG_LEVEL environment variable when used with @opentelemetry/sdk-node, when set it will turn on diagnostics logging to the console for troubleshooting purposes. It's not recommended to turn this on in production.

@Seikilos
Copy link
Author

@pichlermarc diag is not used anywhere.

I set up the logger the way in my initial post to get log telemetry for my applications. AFAIK the NodeSDK currently has no support for it and simply creating this logger started to send telemetry to my otel sink, so I was very happy with that since I like otel very much!

If this is currently not supported, is there another way how I can send log data to an otel sink?

@pichlermarc
Copy link
Member

I've taken the config provided above but was unable to reproduce this without explicitly turning SDK diagnostics logging on.

Could you try to reproduce it using a fork of the following repo?
https://github.com/pichlermarc/repro-4444

Using that repo I was only able to get the output you described when a SDK diagnostics logging is turned on (see line 8 of index.ts)

@pichlermarc pichlermarc added the needs:reproducer This bug/feature is in need of a minimal reproducer label Jan 30, 2024
@Seikilos
Copy link
Author

Seikilos commented Feb 2, 2024

@pichlermarc I am on it.
Testing it and giving feedback asap.

@Seikilos
Copy link
Author

Seikilos commented Feb 2, 2024

I reproduced it.

Add sdk-node
npm install @opentelemetry/sdk-node

And then instead of diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.ALL);

just add these lines:

import { NodeSDK } from "@opentelemetry/sdk-node";
//....


process.env.OTEL_LOG_LEVEL = "info"; // doesn't matter if info, warn, only FATAL will prevent log spam

const sdk = new NodeSDK({
    resourceDetectors: [], // we are using our own
  });

Constructing empty NodeSDK does nothing. But setting process.env.OTEL_LOG_LEVEL to anything dumps the connection error to the logs.
Can you confirm this?

My scenario involves passing multiple OTEL variables by the orchestrator, one of them controls the log level

@pichlermarc pichlermarc added question User is asking a question not related to a new feature or bug and removed bug Something isn't working triage pkg:otlp-exporter-base needs:reproducer This bug/feature is in need of a minimal reproducer labels Feb 5, 2024
@pichlermarc
Copy link
Member

Hi @Seikilos thanks for trying that out.

OTEL_LOG_LEVEL does not set the log level of the Logs SDK, it sets the log level of the diagnostics logging.

These two are both equivalent:

  • diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.INFO;
  • process.env.OTEL_LOG_LEVEL = "info";

You can set OTEL_LOG_LEVEL to any value, or not set it at all, the Logs SDK will still export all the logs that you pass to it.

What setting OTEL_LOG_LEVEL does is it allows you to check diagnostics logs from the OpenTelemetry SDK internals, such as the Exporter. It logging to the console when OTEL_LOG_LEVEL is set is intended behavior as otherwise there would be no way for users to troubleshoot. (therefore I'm unassiging the bug labels, and adding question)

It's rather confusing, but there is a specification issue to clarify this behavior: open-telemetry/opentelemetry-specification#920

The way forward here is to not use diag.setLogger() or OTEL_LOG_LEVEL in production.

@Seikilos
Copy link
Author

Seikilos commented Feb 8, 2024

@pichlermarc I've seen the mentioned issue with the "clarification", the last comment also states, that it needs to be clarified.

Maybe a simple note on this page https://opentelemetry.io/docs/specs/otel/configuration/sdk-environment-variables/#general-sdk-configuration for OTEL_LOG_LEVEL would be enough. There is even a "note" column, but it is empty.

Probably just add something like

This variable is used for diagnostic purposes inside otel. It does not affect what log level will be sent to OTEL sinks. OTEL_LOG_LEVEL should not be used in production.

?

Regardless of that: Thank you for clarifying this! 👍

@Seikilos Seikilos closed this as completed Feb 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question User is asking a question not related to a new feature or bug
Projects
None yet
Development

No branches or pull requests

2 participants