Skip to content
This repository has been archived by the owner on Jul 5, 2020. It is now read-only.

Adaptive Sampling leads to self-starvation in default configuration when app uses TrackEvent #756

Closed
zakimaksyutov opened this issue Nov 28, 2017 · 2 comments
Assignees
Labels
Milestone

Comments

@zakimaksyutov
Copy link
Member

Current implementation of adaptive sampling counts towards sampled in quota all items which don't satisfy filter criteria. As a result the default configuration leads to self-starvation.

		<Add Type="Microsoft.ApplicationInsights.WindowsServer.TelemetryChannel.AdaptiveSamplingTelemetryProcessor, Microsoft.AI.ServerTelemetryChannel">
			<MaxTelemetryItemsPerSecond>5</MaxTelemetryItemsPerSecond>
			<ExcludedTypes>Event</ExcludedTypes>
		</Add>
		<Add Type="Microsoft.ApplicationInsights.WindowsServer.TelemetryChannel.AdaptiveSamplingTelemetryProcessor, Microsoft.AI.ServerTelemetryChannel">
			<MaxTelemetryItemsPerSecond>5</MaxTelemetryItemsPerSecond>
			<IncludedTypes>Event</IncludedTypes>
		</Add>

Repro Steps

  1. Write a console app which does 100 TrackRequest calls/sec and 10 TrackEvent calls/sec
  2. Observe the behavior that two adaptive processors conflict with each other leading to self-starvation and not sending almost any telemetry

Actual Behavior

In picture below every line represents 10 seconds. First number - actually sent, second number - tracked.

image

Expected Behavior

For both streams - requests and events - AI SDK should have sent 5 documents / sec (so, first column should have been 50 for both types).

Fix

Process function (https://github.com/Microsoft/ApplicationInsights-dotnet/blob/7321d2bf7b9c4733fb43b6c186e381863931eb1a/src/ServerTelemetryChannel/SamplingTelemetryProcessor.cs#L146) should return control to estimator only for "//// Ok, now we can actually sample:" items.

@zakimaksyutov
Copy link
Member Author

After leaving test console app to run for a long time it ended up with these ratios:

image

@d3r3kk d3r3kk self-assigned this Nov 28, 2017
@TimothyMothra TimothyMothra added this to the 2.5-Beta2 milestone Nov 29, 2017
d3r3kk added a commit to microsoft/ApplicationInsights-dotnet that referenced this issue Dec 1, 2017
-Add an internal constructor to `SamplingTelemetryProcessor` that gives it a next processor for sampled, and a next processor for unsampled items.
- In the `AdaptiveSamplingTelemetryProcessor`, give the `SamplingPercentageEstimatorTelemetryProcessor` as well as the actual next `ITelemetryProcessor` in the pipeline to the `SamplingTelemetryProcessor`'s internal constructor.
- Add property `SamplingTelemetryProcessor.UnsampledNext` to send to the next `ITelemetryProcessor` in the pipeline, bypassing the estimator which usually comes immediately next in the chain.
- Enhance constructor for `SamplingTelemetryProcessor` to set the `Next` and `UnsampledNext` fields based on which constructor was called. (`UnsampledNext` == `Next` when calling the public constructor only).
- `SamplingTelemetryProcessor.Process(ITelemetry item)` updated to call the appropriate `Next`/`UnsampledNext` property.
- Remove an unused using.
- Update CHANGELOG.md with the fix info
@SergeyKanzhelev
Copy link
Contributor

closing than. Thanks @d3r3kk

TimothyMothra added a commit to microsoft/ApplicationInsights-dotnet that referenced this issue Dec 4, 2017
* apply patches from source build

* remove legacy file

* Fix null check to use string.IsNullOrEmpty. Even though unlikely, if a string.empty is passed as input it will result in an infinite loop.

* increase the version to 2.5.0-beta2

* The issue with compareNetObject got resolved

* Update Microsoft.ApplicationInsights.netcoreapp11.Tests.csproj

* GH items updated

* dirs.proj to do solution restore.

* Remove sampling score on telemetry items related to Context.User.Id

Issue #625

* Update Changelog

* Add link to CHANGELOG regarding issue

* Update PULL_REQUEST_TEMPLATE.md

* Update PULL_REQUEST_TEMPLATE.md

* Test implementation for Linux vs Windows

* ApplicationFolderProvider modified to work better in non-windows.
API Surface updated.

* Minor renaming, build fixes.

* UnitTest fix and addition of new tests

* Error messages updated

* Enable building on mac

* disabled non-netstandard targets

* clean up projects

* clean up projects

* fix it back

* sln from dirs

* test task

* renamings

* Smaller changes and updating GH templates

* remove public api changes. code change to follow this.

* PR suggestion incorporated - removed flg to override storage behaviour

* Minor

* Correctmessage

* Documentation

* better error messages.

* readyonly IIdentityProvider

* update changelog

* changelog details

* Revert references to MetricManager.
Not ready for public use.

* Proposed fix for issue microsoft/ApplicationInsights-dotnet-server#756.

-Add an internal constructor to `SamplingTelemetryProcessor` that gives it a next processor for sampled, and a next processor for unsampled items.
- In the `AdaptiveSamplingTelemetryProcessor`, give the `SamplingPercentageEstimatorTelemetryProcessor` as well as the actual next `ITelemetryProcessor` in the pipeline to the `SamplingTelemetryProcessor`'s internal constructor.
- Add property `SamplingTelemetryProcessor.UnsampledNext` to send to the next `ITelemetryProcessor` in the pipeline, bypassing the estimator which usually comes immediately next in the chain.
- Enhance constructor for `SamplingTelemetryProcessor` to set the `Next` and `UnsampledNext` fields based on which constructor was called. (`UnsampledNext` == `Next` when calling the public constructor only).
- `SamplingTelemetryProcessor.Process(ITelemetry item)` updated to call the appropriate `Next`/`UnsampledNext` property.
- Remove an unused using.
- Update CHANGELOG.md with the fix info

* PR Suggestions and further fix

- Renamed Next to SampledNext
- Unnecessary constructor logic removed
- Comments fixed up (remove unnecessary commenting)
- Early exit from processing removed when SampledNext != UnsampledNext, leaving original intent intact

* Add a few tests to ensure assumptions during AdaptiveSampling are maintained

- Ensure Process sends telemetry items to the correct '...Next' method while being passed through the SamplingTelemetryProcessor

- Ensure past behaviour on the standalone (not part of Adaptive sampling chain) SamplingTelemetryProcessor is maintained

* Initial PR for HealthHeartbeat (#636)

Add Heartbeat feature to Application Insights SDK

- [x] Changes in public surface reviewed
- [x] Design discussion issue #628  
- [x] CHANGELOG.md updated

- Add all unit tests necessary to comform to the current stated feature spec
- Add API updates to the Unshipped txt file
- Add remark (still with TODO as I have yet to implement) about all default payload members
- Changes in design from polling providers that will extend heartbeat payloads to expecting consumers of the SDK to 'push' state changes as necessary.
- Add/Set methods to handle field collision management
- Expose new property 'IsHeartbeatEnabled' to turn on/off heartbeats
- Changelog updated for beta2
- create HeartbeatProvider with DiagnosticsTelemetryProvider to enable simpler initialization / logic requirements
- Increase default heartbeat interval to 15 minutes
- Documentation on how to extend Heartbeat properties

* Update Changelog, prep for 2.5.0-beta2 (#664)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants