Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropped log messages #190

Open
dhruv-nudge opened this issue Sep 30, 2024 · 10 comments
Open

Dropped log messages #190

dhruv-nudge opened this issue Sep 30, 2024 · 10 comments

Comments

@dhruv-nudge
Copy link

I am seeing many dropped log messages when using pino-opentelemetry-transport.

I am having these env variables:

OTEL_EXPORTER_OTLP_COMPRESSION==gzip
OTEL_EXPORTER_OTLP_LOGS_PROTOCOL=grpc
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://url:4317

And setup looks like this:

const pinoLogger = pino(
	{
		level: 'debug',
		timestamp: true,
	},
	pino.transport({
		targets: [
			{
				level: 'debug',
				target: 'pino-opentelemetry-transport',
				options: {
					sync: false,
				},
			},
		],
	}),
);

is there any way I can debug what the issue might be? any suggestions are appreciated

@dhruv-nudge
Copy link
Author

logRecordProcessorOptions: [{ recordProcessorType: 'batch', exporterOptions: { protocol: 'grpc' } }],

I tried adding these options, but didn't help

@Dhruv-Garg79
Copy link

@Vunovati can you please check this?

@Vunovati
Copy link
Collaborator

Vunovati commented Oct 8, 2024

Hello @Dhruv-Garg79

The way you have set up your log processor, it will export the record in batches. It will keep the records in a buffer, and then, at a certain point, it will send them to the collector.

Without more context I can only theoreticize ... for example

If your application crashes while your log record buffer is full, it is possible that buffer will be lost as the batch is not sent.

You can try reducing the size of your buffer or export more frequently by setting the batch logrecord processor env vars.

To rule out batch processing, you can also try additionally using a simple exporter like in this example where we use multiple log processors.

The simple exporter will try to export immediately as the log record is created. Then, you can compare the output you get from the batched processor and the output you get from the simple one.

@Dhruv-Garg79
Copy link

Hey @Vunovati, Thanks for the reply.

Earlier I wasn't using batchLogProcessor, and still saw that messages were getting dropped. Is batchProcessor the default option, if no option is passed?

I don't think application crashes are a problem because I am using Kubernetes for deployments and I don't see any pod restarts or crashes. I also have a grace period set up for the cleanup tasks. Additionally, I have batch processing for traces, and those are not experiencing any issues.

is there anything else which might be causing issue? any suggestion to debug this issue?

@Vunovati
Copy link
Collaborator

Vunovati commented Oct 9, 2024 via email

@Dhruv-Garg79
Copy link

I checked the collector, it does not have any dropped messages, we have metrics for anything dropped by the OTEL collector and also checked logs of the OTEL collector for the same.

I can try logging to both console and OTEL, but then I need to send console logs somewhere for comparison. Can do this as the last option.

I am sending logs to Signoz, it is self-hosted using k8s. The limits are quite high, so it's unlikely, since dropped messages at the collector will have logs/metrics.

minimal example with code and collector both? or just code?

@Vunovati
Copy link
Collaborator

Vunovati commented Oct 9, 2024

I checked the collector, it does not have any dropped messages, we have metrics for anything dropped by the OTEL collector and also checked logs of the OTEL collector for the same.

I can try logging to both console and OTEL, but then I need to send console logs somewhere for comparison. Can do this as the last option.

How do you identify what has been dropped? You are assuming something was logged, but there is no record of it anywhere. Is there a way to be 100% sure?

I am sending logs to Signoz, it is self-hosted using k8s. The limits are quite high, so it's unlikely, since dropped messages at the collector will have logs/metrics.

minimal example with code and collector both? or just code?

Since we don't know if the problem is with the code or the collector, we'll need both, if possible.

The fastest way to do this would be to pick one of the examples here and modify it just enough to reproduce the issue. Then add instructions which make it clear how to configure and run a Signoz instance where this problem is visible.

@Dhruv-Garg79
Copy link

Screenshot 2024-09-30 at 5 32 06 PM

There is record here, that's why I was confident that collector does not have issue. Also I am not facing this issue with traces.

Okay, let me try to create a reproducible example.

@Dhruv-Garg79
Copy link

Hey, I just wanted to share the update, the issue was with the application side exporting itself. After tweaking these parameters, I can see that the log drop is lesser than before.

OTEL_BLRP_SCHEDULE_DELAY=500
OTEL_BLRP_MAX_QUEUE_SIZE=2048
OTEL_BLRP_MAX_EXPORT_BATCH_SIZE=256

I also checked if the flush is happening when an app is killed, and I can see that pino.flush() occurs without any error. But there still seems to be some drop-off, do you have any other suggestions to tweak the parameters?

@Vunovati
Copy link
Collaborator

I do not. You will have to consult with the Otel documentation.

Is it possible to get a code example which would enable me to reproduce your issue? It would be a great way to identify if there is a problem with the library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants