Dropped log messages #190

dhruv-nudge · 2024-09-30T12:10:03Z

I am seeing many dropped log messages when using pino-opentelemetry-transport.

I am having these env variables:

OTEL_EXPORTER_OTLP_COMPRESSION==gzip
OTEL_EXPORTER_OTLP_LOGS_PROTOCOL=grpc
OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://url:4317

And setup looks like this:

const pinoLogger = pino(
	{
		level: 'debug',
		timestamp: true,
	},
	pino.transport({
		targets: [
			{
				level: 'debug',
				target: 'pino-opentelemetry-transport',
				options: {
					sync: false,
				},
			},
		],
	}),
);

is there any way I can debug what the issue might be? any suggestions are appreciated

The text was updated successfully, but these errors were encountered:

dhruv-nudge · 2024-10-03T13:21:06Z

logRecordProcessorOptions: [{ recordProcessorType: 'batch', exporterOptions: { protocol: 'grpc' } }],

I tried adding these options, but didn't help

Dhruv-Garg79 · 2024-10-08T13:53:24Z

@Vunovati can you please check this?

Vunovati · 2024-10-08T19:30:25Z

Hello @Dhruv-Garg79

The way you have set up your log processor, it will export the record in batches. It will keep the records in a buffer, and then, at a certain point, it will send them to the collector.

Without more context I can only theoreticize ... for example

If your application crashes while your log record buffer is full, it is possible that buffer will be lost as the batch is not sent.

You can try reducing the size of your buffer or export more frequently by setting the batch logrecord processor env vars.

To rule out batch processing, you can also try additionally using a simple exporter like in this example where we use multiple log processors.

The simple exporter will try to export immediately as the log record is created. Then, you can compare the output you get from the batched processor and the output you get from the simple one.

Dhruv-Garg79 · 2024-10-09T07:13:09Z

Hey @Vunovati, Thanks for the reply.

Earlier I wasn't using batchLogProcessor, and still saw that messages were getting dropped. Is batchProcessor the default option, if no option is passed?

I don't think application crashes are a problem because I am using Kubernetes for deployments and I don't see any pod restarts or crashes. I also have a grace period set up for the cleanup tasks. Additionally, I have batch processing for traces, and those are not experiencing any issues.

is there anything else which might be causing issue? any suggestion to debug this issue?

Vunovati · 2024-10-09T08:27:11Z

Did you try having two log processor options, one logging to console and the other sending to collector like in the example I linked above? Is that an option for you? It would help identifying if the problem is with the collector or the library/sdk. What is the logs backend you are using? Is there some kind of limit or load cap there? Do you think it would be possible to create a minimal example that would enable us to reproduce this issue?

…

On Wed, 9 Oct 2024 at 09:13, Dhruv garg ***@***.***> wrote: Hey @Vunovati <https://github.com/Vunovati>, Thanks for the reply. Earlier I wasn't using batchLogProcessor, and still saw that messages were getting dropped. Is batchProcessor the default option, if no option is passed? I don't think application crashes are a problem because I am using Kubernetes for deployments and I don't see any pod restarts or crashes. I also have a grace period set up for the cleanup tasks. Additionally, I have batch processing for traces, and those are not experiencing any issues. is there anything else which might be causing issue? any suggestion to debug this issue? — Reply to this email directly, view it on GitHub <#190 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADLX5OFRKLTDOL4ANJOLWTZ2TJRXAVCNFSM6AAAAABPDEOUFGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMBRGUYTSMBWG4> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Dhruv-Garg79 · 2024-10-09T10:51:53Z

I checked the collector, it does not have any dropped messages, we have metrics for anything dropped by the OTEL collector and also checked logs of the OTEL collector for the same.

I can try logging to both console and OTEL, but then I need to send console logs somewhere for comparison. Can do this as the last option.

I am sending logs to Signoz, it is self-hosted using k8s. The limits are quite high, so it's unlikely, since dropped messages at the collector will have logs/metrics.

minimal example with code and collector both? or just code?

Vunovati · 2024-10-09T13:27:15Z

I checked the collector, it does not have any dropped messages, we have metrics for anything dropped by the OTEL collector and also checked logs of the OTEL collector for the same.

I can try logging to both console and OTEL, but then I need to send console logs somewhere for comparison. Can do this as the last option.

How do you identify what has been dropped? You are assuming something was logged, but there is no record of it anywhere. Is there a way to be 100% sure?

I am sending logs to Signoz, it is self-hosted using k8s. The limits are quite high, so it's unlikely, since dropped messages at the collector will have logs/metrics.

minimal example with code and collector both? or just code?

Since we don't know if the problem is with the code or the collector, we'll need both, if possible.

The fastest way to do this would be to pick one of the examples here and modify it just enough to reproduce the issue. Then add instructions which make it clear how to configure and run a Signoz instance where this problem is visible.

Dhruv-Garg79 · 2024-10-09T13:34:49Z

There is record here, that's why I was confident that collector does not have issue. Also I am not facing this issue with traces.

Okay, let me try to create a reproducible example.

Dhruv-Garg79 · 2024-10-14T14:53:20Z

Hey, I just wanted to share the update, the issue was with the application side exporting itself. After tweaking these parameters, I can see that the log drop is lesser than before.

OTEL_BLRP_SCHEDULE_DELAY=500
OTEL_BLRP_MAX_QUEUE_SIZE=2048
OTEL_BLRP_MAX_EXPORT_BATCH_SIZE=256

I also checked if the flush is happening when an app is killed, and I can see that pino.flush() occurs without any error. But there still seems to be some drop-off, do you have any other suggestions to tweak the parameters?

Vunovati · 2024-10-26T00:11:54Z

I do not. You will have to consult with the Otel documentation.

Is it possible to get a code example which would enable me to reproduce your issue? It would be a great way to identify if there is a problem with the library.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dropped log messages #190

Dropped log messages #190

dhruv-nudge commented Sep 30, 2024

dhruv-nudge commented Oct 3, 2024

Dhruv-Garg79 commented Oct 8, 2024

Vunovati commented Oct 8, 2024

Dhruv-Garg79 commented Oct 9, 2024

Vunovati commented Oct 9, 2024 via email

Dhruv-Garg79 commented Oct 9, 2024

Vunovati commented Oct 9, 2024 •

edited

Loading

Dhruv-Garg79 commented Oct 9, 2024

Dhruv-Garg79 commented Oct 14, 2024

Vunovati commented Oct 26, 2024

Dropped log messages #190

Dropped log messages #190

Comments

dhruv-nudge commented Sep 30, 2024

dhruv-nudge commented Oct 3, 2024

Dhruv-Garg79 commented Oct 8, 2024

Vunovati commented Oct 8, 2024

Dhruv-Garg79 commented Oct 9, 2024

Vunovati commented Oct 9, 2024 via email

Dhruv-Garg79 commented Oct 9, 2024

Vunovati commented Oct 9, 2024 • edited Loading

Dhruv-Garg79 commented Oct 9, 2024

Dhruv-Garg79 commented Oct 14, 2024

Vunovati commented Oct 26, 2024

Vunovati commented Oct 9, 2024 •

edited

Loading