Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In the browser environment, if there are many requests in the page, and the chrome tag is frequently switched back and forth, the sendBeacon request will appear in pending status, and the console promise will report an error #3489

Open
yuanman0109 opened this issue Dec 15, 2022 · 12 comments
Labels
bug Something isn't working priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc

Comments

@yuanman0109
Copy link

yuanman0109 commented Dec 15, 2022

What happened?

Steps to Reproduce

Visit a page with more than 200 requests, switch to other tabs when sending to the server, and then switch back quickly

Expected Result

Actual Result

The data can be sent successfully, and there is no promise error on the console

Additional Details

image
image

OpenTelemetry Setup Code

import { WebTracerProvider } from '@opentelemetry/sdk-trace-web'
import { Resource } from '@opentelemetry/resources'
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions'
import { ZoneContextManager } from '@opentelemetry/context-zone'
import { registerInstrumentations } from '@opentelemetry/instrumentation'
import { FetchInstrumentation } from '@opentelemetry/instrumentation-fetch'
import { XMLHttpRequestInstrumentation } from '@opentelemetry/instrumentation-xml-http-request'
import { BatchSpanProcessor }  from '@opentelemetry/sdk-trace-base';
import { CustomIdGenerator } from '../utils/generator'
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-http';
.
.
.
provider.register({
      contextManager: new ZoneContextManager(),
    })
    const collectorOptions = {
      url: 'xxx/api/traces', // url is optional and can be omitted - default is
    };
    const exporter = new OTLPTraceExporter(collectorOptions);
    const spanProcess = new BatchSpanProcessor(exporter, {
      // The maximum queue size. After the size is reached spans are dropped.
      maxQueueSize: 100,
      // The maximum batch size of every export. It must be smaller or equal to maxQueueSize.
      maxExportBatchSize: 10,
      // The interval between two consecutive exports
      scheduledDelayMillis: 2000,
      // How long the export can run before it is cancelled
      exportTimeoutMillis: 30000,
    })
    provider.addSpanProcessor(spanProcess);

package.json

"@opentelemetry/api": "^1.3.0",
    "@opentelemetry/context-zone": "^1.8.0",
    "@opentelemetry/exporter-trace-otlp-http": "^0.34.0",
    "@opentelemetry/instrumentation": "^0.34.0",
    "@opentelemetry/instrumentation-fetch": "^0.34.0",
    "@opentelemetry/instrumentation-xml-http-request": "^0.34.0",
    "@opentelemetry/otlp-transformer": "^0.34.0",
    "@opentelemetry/resources": "^1.8.0",
    "@opentelemetry/sdk-trace-base": "^1.8.0",
    "@opentelemetry/sdk-trace-web": "^1.8.0",
    "@opentelemetry/semantic-conventions": "^1.8.0",

Relevant log output

No response

@yuanman0109 yuanman0109 added bug Something isn't working triage labels Dec 15, 2022
@dyladan
Copy link
Member

dyladan commented Dec 15, 2022

I'm sorry it isn't really clear from your report, but is the data actually sent successfully or not? Is the bug just the log statement? Can you include the stacktrace so I can try to track down where this is actually coming from?

@dyladan dyladan added the information-requested Bug is waiting on additional information from the user label Dec 15, 2022
@yuanman0109
Copy link
Author

I'm sorry it isn't really clear from your report, but is the data actually sent successfully or not? Is the bug just the log statement? Can you include the stacktrace so I can try to track down where this is actually coming from?

The data is in pending status and there are many data. It has not been verified whether the transmission is successful. However, the console will throw a promise error, which will affect the indicators of exception collection. Most span data comes from xhr and fetch instrumentation. There is stack information, which is currently located here:

const error = new OTLPExporterError(`sendBeacon - cannot send ${body}`);

It may be difficult to troubleshoot this problem only through what I have described. Because in general, users will not switch labels violently like me, even if The length of the '_finishedSpans' is very long, and they will also be queued for sending. However, if you switch to another tag when sending, I guess that the 'visibilitychange' or 'pagehide' is triggered, which requires simultaneous execution 'flushAll' function. You can simulate a scene and let ' The maxQueueSize 'should be as close to the maximum value as possible. Then, when sending data, switch to other pages, and then switch back quickly. This operation should be repeated several times, and then observe the console
image

@dyladan
Copy link
Member

dyladan commented Dec 21, 2022

The bug here is most likely in the simultaneous execution of the flush then. I believe the spec is that it should not be executed concurrently.

@dyladan dyladan added priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect and removed triage labels Dec 21, 2022
@dyladan
Copy link
Member

dyladan commented Dec 21, 2022

Labeling as p2 since it doesn't seem to be a crasher or affect your application (please correct me if that's wrong) but may be missing telemetry to the backend.

@isanchen
Copy link

isanchen commented Jan 5, 2023

We are seeing the same error, and the stacktrace suggests this line navigator.sendBeacon() is returning false here and causing an unhandled promise to bubble up ultimately.

My guess is the data has exceeded the browser limit for queueing the sendBeacon request (https://w3c.github.io/beacon/#return-value):

If the amount of data to be queued exceeds the user agent limit (as defined in HTTP-network-or-cache fetch), this method returns false

BTW we use the BatchSpanProcessor with the following config:

{
      maxQueueSize: 100,
      maxExportBatchSize: 10,
      scheduledDelayMillis: 500,
      exportTimeoutMillis: 30000,
}

If this theory sounds reasonable, we will try reducing the maxQueueSize, maxExportBatchSize and scheduledDelayMillis to see if it helps.

On the other hand, I am not sure if the fact that the SDK emitting unhandled promises is acceptable by the OTEL development guidance or not: https://opentelemetry.io/docs/reference/specification/error-handling/

And finally, is there a way for users to configure error handlers for these errors?

Thanks

@ahayworth
Copy link

I believe we are also seeing this bug in production. In the interest of not just registering a "me too" comment, we can add the following details:

  • We see this with the metrics SDK, rather than traces
  • We also see stacktraces pointing to the same code

We are seeing around 100 per day, which is a rather small fraction of the metrics requests we are handling; so I do not believe it to be extraordinarily urgent. But, we do see it.

@dyladan
Copy link
Member

dyladan commented Feb 21, 2023

We are seeing the same error, and the stacktrace suggests this line navigator.sendBeacon() is returning false here and causing an unhandled promise to bubble up ultimately.

My guess is the data has exceeded the browser limit for queueing the sendBeacon request (https://w3c.github.io/beacon/#return-value):

If the amount of data to be queued exceeds the user agent limit (as defined in HTTP-network-or-cache fetch), this method returns false

BTW we use the BatchSpanProcessor with the following config:

{
      maxQueueSize: 100,
      maxExportBatchSize: 10,
      scheduledDelayMillis: 500,
      exportTimeoutMillis: 30000,
}

If this theory sounds reasonable, we will try reducing the maxQueueSize, maxExportBatchSize and scheduledDelayMillis to see if it helps.

Yes this seems reasonable. I'm not sure from the wording there if a single batch is too big or if the entire beacon queue as grown too large. If the first, then only changing maxExportBatchSize should suffice.

On the other hand, I am not sure if the fact that the SDK emitting unhandled promises is acceptable by the OTEL development guidance or not: https://opentelemetry.io/docs/reference/specification/error-handling/

An unhandled promise rejection is equivalent to an unhandled throw and it is not acceptable. This is a bug. Thanks for the additional info and troubleshooting.

And finally, is there a way for users to configure error handlers for these errors?

There is a global error handler but if we aren't catching the error then we can't send it to the global error handler. You may also be able to use the window unhandledrejection event to log them if you want.

@dyladan dyladan added priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc and removed priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect information-requested Bug is waiting on additional information from the user labels Feb 21, 2023
@dyladan
Copy link
Member

dyladan commented Feb 21, 2023

I'm changing this to p1 since it is throwing unhandled promise rejections

@MSNev
Copy link
Contributor

MSNev commented Sep 7, 2023

This is also an issue on the "receiving" server.

When a (recent) browser (older ones work differently) sends a sendBeacon request to it "expects" that the receiving server will NOT return a response and therefore it (should) return a 204 (with no body), this enables the browser to "reuse" the existing connection. If the server sends back a 200 with a response the browser (chrome) will cause the current connection to be closed resulting in the need to establish a new connection to the receiving server.

This issue is compounded when the "sender" uses multiple sendBeacon request in a small amount of time as this causes browser to create / establish multiple connections to the same domain resulting in "many" end up in the pending state and often never getting sent.

As mentioned above older versions of Chromium (Chrome, Edge) and specifically original Microsoft Edge (non-chromium version) handle the "200" response without issues and can actually cause other issues if many 204's are returned. For Microsoft's internal SDK we have a configuration which enables the browser instance to "detect" these environments and it works with the backend by supplying a query string to "inform" the receiving server that the runtime is expecting a 204 for the current request (assuming all is OK, ie. it would have normally returned a 200)

@myieye
Copy link

myieye commented Oct 9, 2023

It seems to me that the traces should now be sent using the fetch api with the keepalive parameter instead of the Beacon API (in browsers that support it at least, which is pretty much everyone except FireFox):

The keepalive option can be used to allow the request to outlive the page. Fetch with the keepalive flag is a replacement for the Navigator.sendBeacon() API.

@MSNev
Copy link
Contributor

MSNev commented Oct 25, 2023

fetch api with the keepalive

The fetch with keep-alive flag also has the same 64kb limitation, so if this is payload related it's still going to be an issue (with getting the telemetry out of the environment)

@istvan-hevele
Copy link

A potential workaround is to pass in headers: {} to the OTLPTraceExporter's options. Because the Beacon API doesn't support custom headers, this will cause the library to use XHR instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p1 Bugs which cause problems in end-user applications such as crashes, data inconsistencies, etc
Projects
None yet
Development

No branches or pull requests

7 participants