Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Retry on sampled out telemetry is looping infinitely #2227

Closed
developpementsweb opened this issue Dec 14, 2023 · 5 comments
Closed

[BUG] Retry on sampled out telemetry is looping infinitely #2227

developpementsweb opened this issue Dec 14, 2023 · 5 comments
Milestone

Comments

@developpementsweb
Copy link

Description/Screenshot
When we activate sampling in applicationInsights the SDK appears to retry over and over again on each logs that was sampled out. The ping calls never resolve, they stay pending infinitely.
sampledout
appinsights

Even though setting the sampling at 100% the track calls are returning a 200 response status code which do not trigger the retry logic, we should be able to reduce sampling without having this side effect on the client side
image

Steps to Reproduce

  1. Reduce sampling to something below 100% on azure portal
  2. Send telemetry with trackEvent function on client side
  • OS/Browser: Chrome 120.0.6099.109
  • SDK Version [e.g. 22]: 3.0.6
  • How you initialized the SDK:
    image

Expected behavior
The sampled telemetry do not get resent
Additional context

@MSNev
Copy link
Collaborator

MSNev commented Dec 14, 2023

This isn't specifically a sampling issue, I (think) it's related to this issue #2205, which we are currently publishing a fix for.

The npm package is already published and the CDN will be updated within the next hour.

@MSNev MSNev added this to the 3.0.7 milestone Dec 14, 2023
@MSNev MSNev added fixed - waiting release PR Committed and waiting deployment waiting - CDN deployment labels Dec 14, 2023
@MSNev
Copy link
Collaborator

MSNev commented Dec 14, 2023

Sorry, got distracted by performing the release steps for v3.0.7 on this

The ping calls never resolve, they stay pending infinitely.

This is a side effect of the browser "queuing" the requests, according to the specification, the browser is eventually supposed to send the batch events.

What is happening here is that during the page unload, we go into "send all batched events now" mode (so they are not lost), as part of this process we use either the sendBeacon API or when available the fetch() (with the keep-alive flag set). Previously (before these API's existed) we use to send the events as synchronous XMLHttpRequests (which are now banned) and this API blocked page navigation until all of the events are sent and responses received.

With the sendBeacon and fetch (with keep-alive) both of these API's have a maximum payload size limit of 64kb, so if (when) you have more than 64kb of serialized events (JSON encoded) we need to "split" the payload into multiple requests (at the least) and if the API's tell us that it could not accept the payload (even if we constructed a size < 64kb) then we go into a mode where we try and send as many events as possible (1 event per-request) until we are told that no more can be scheduled.

Why are there so many request?

  • Part of the answer is the above
  • The other part is that are part of some recent refactoring (in v3.0.3) we introduced an edge case bug where when this sequence occurs we went into a loop which caused events to be sent more than once 😢

We believe that we have now fix the introduced these edge cases and we have also introduced a new configuration that you can use to disable this behavior if your use-cases causes the frequent splitting / sending of multiple "pings" during page unload / navigation (which is also triggered by switching tabs)

@MSNev MSNev closed this as completed Dec 14, 2023
@MSNev MSNev added released - NPM and removed waiting - CDN deployment fixed - waiting release PR Committed and waiting deployment labels Dec 14, 2023
@MSNev
Copy link
Collaborator

MSNev commented Dec 14, 2023

This issue has been closed, because it was linked to the v3.0.7 release which is now fully deployed. If this version doesn't resolve your issue (or you just want more information), please feel free to continue commenting as if it's not fixed we can reopen and further investigate.

@developpementsweb
Copy link
Author

developpementsweb commented Dec 14, 2023

Thanks! The version 3.0.7 definitely fixed this issue. But we are still seeing some (around 25%) ping request never getting resolved (we waited over 10mins). We were stress testing our app with 50 000 logs to be sent over to appinsight using the trackEvent function. We are sending 1000 logs every 30 secs. Like you said, when we switch tabs the ping requests get sent, but some of them never resolve ?

Also, we set our sampling to 50% on our microsoft azure application insights, but every single log sent from the client browser with the trackEvent function are getting a 206 (Telemetry sampled out) ?

@MSNev
Copy link
Collaborator

MSNev commented Dec 15, 2023

But we are still seeing some (around 25%) ping request never getting resolved (we waited over 10mins).

This is underlying browser functionality and I'm not sure exactly on "when" it would send out the requests, just that based on the specification once a browser "accepts" the request it is supposed to guarantee that it will deliver the request (I don't recall if there is a time limit involved as part of the spec). So either this is a browser bug (where it's not sending it) or just a UI issue where teh dev tools don't get updated ???

Also, we set our sampling to 50% on our microsoft azure application insights, but every single log sent from the client browser with the trackEvent function are getting a 206 (Telemetry sampled out)

Hmm, not sure why that would be as one would expect the sampling to be based on some form of randomness (not a direct part of my team), maybe they are using a combination of the source location / time or something. I'll ping some people to see if I can find an answer on why or whether this was just bad timing / luck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants