-
Notifications
You must be signed in to change notification settings - Fork 33
fix(logging): Fix sampling when log level is above Debug #980
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(logging): Fix sampling when log level is above Debug #980
Conversation
Thanks a lot for your first contribution! Please check out our contributing guidelines and don't hesitate to ask whatever you need. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #980 +/- ##
===========================================
+ Coverage 77.80% 77.82% +0.02%
===========================================
Files 285 286 +1
Lines 11402 11464 +62
Branches 1341 1349 +8
===========================================
+ Hits 8871 8922 +51
- Misses 2100 2112 +12
+ Partials 431 430 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
The current pull request, while technically correct, does not fully address the underlying issue. The core problem appears to be that the sampling rate needs to be recalculated each time. Look at the TypeScript tests And the TypeScript implementation recalculates the sampling rate as shown in this section of the code: To proceed, we should:
|
ee242f4
to
e1406e8
Compare
## Problem Initially reported as GetSafeRandom() generating invalid values like 1.79E+308 instead of [0,1] range. However, after deeper investigation comparing with the TypeScript implementation, discovered the real issue was architectural. ## Root Cause Analysis 1. **Surface Issue:** BitConverter.ToDouble() with 8 bytes created values outside [0,1] range 2. **Real Issue:** Log sampling was calculated only ONCE during initialization, not recalculated on each log operation as it should be (and as TypeScript does) 3. **TypeScript Comparison:** Found that TypeScript calls refreshSampleRateCalculation() on each log operation, while .NET was doing static calculation ## Solution **Two-part fix addressing both issues:** 1. **Fixed GetSafeRandom():** Changed from BitConverter.ToDouble(8 bytes) to proper uint normalization: (double)randomUInt / uint.MaxValue 2. **Implemented Dynamic Sampling:** Following TypeScript pattern exactly: - Added RefreshSampleRateCalculation() method with cold start protection - Modified PowertoolsLogger.Log() to call refresh before each log operation - Added debug logging when sampling activates (matches TypeScript behavior) - Proper reset to initial log level when sampling doesn't activate ## Changes Made - PowertoolsLoggerConfiguration.cs: Added dynamic sampling methods following TypeScript - PowertoolsLogger.cs: Integrated sampling refresh into log flow like TypeScript - PowertoolsLoggerProvider.cs: Store initial log level for reset capability - PowertoolsLoggerTest.cs: Fixed tests referencing removed Random property - SamplingSimpleTest.cs: Added validation tests for dynamic sampling - SamplingTestFunction.cs: Added practical example demonstrating the fix ## Validation - 415/417 tests pass (99.5% success rate) - Only 2 old sampling tests fail (expected - they tested the broken static behavior) - New tests validate dynamic recalculation works correctly - Compatible with .NET 6 and .NET 8 - No breaking changes to public API ## Key Insight The BitConverter fix alone wasn't sufficient. The real solution required implementing dynamic sampling recalculation on each log call, matching the TypeScript implementation pattern exactly. Fixes aws-powertools#951
e1406e8
to
e9348b6
Compare
- Replace insecure Random() with cryptographically secure RandomNumberGenerator - Fix GetSafeRandom() to return proper [0,1] range using uint normalization - Implement dynamic sampling recalculation on each log operation - Update sampling debug messages to match expected test format - Make GetRandom() virtual to allow test mocking - Resolve SonarCloud quality gate failure (0.0% security hotspots) Fixes aws-powertools#951
@dcabib thanks for updating the pull request, currently the tests are failing. |
…mpling ## Issues Fixed 1. **Cold Start Protection Logic** - Fixed SamplingRefreshCount increment timing in RefreshSampleRateCalculation() - First call now properly skipped as intended (matches TypeScript behavior) - Counter now increments at method start, not end 2. **Sampling Activation Return Logic** - Simplified return logic to properly indicate when sampling activates - Method now returns shouldEnableDebugSampling directly - Debug messages now print correctly when sampling triggers 3. **Test Compatibility** - Updated failing tests to account for cold start protection - Tests now make two log calls: first skipped, second triggers sampling - Fixed Log_SamplingRateGreaterThanRandom_ChangedLogLevelToDebug - Fixed Log_SamplingWithRealRandomGenerator_ShouldWorkCorrectly 4. **Removed Problematic File** - Deleted SamplingTestFunction.cs causing duplicate LambdaSerializer errors - Resolves compilation issues noted in code review ## Test Results - All 423/423 logging tests now passing ✅ - All 6/6 sampling-specific tests passing ✅ - SonarCloud quality gate: PASSED ✅ - No breaking changes to existing API ## Verification - Matches TypeScript implementation behavior exactly - Cold start protection works as designed - Dynamic sampling recalculation functional - Proper [0,1] range random generation maintained Addresses feedback from hjgraca in PR review comments. Fixes aws-powertools#951
Thanks @dcabib the tests are green now, I will run this pull request locally and update my findings |
…ment variable configurations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
When the minimum log level was set above Debug (e.g., Error), the Microsoft.Extensions.Logging framework was filtering out logs before they could reach PowertoolsLogger for sampling evaluation. Problem
Solution
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to update the documentation like this: https://docs.powertools.aws.dev/lambda/python/latest/core/logger/#sampling-debug-logs

…shSampleRateCalculation
@leandrodamascena updated documentation ![]() Also added a manual RefreshSampleRateCalculation method |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GTM
Awesome work, congrats on your first merged pull request and thank you for helping improve everyone's experience! |
❤️ |
Description
Fixes the GetSafeRandom() method in PowertoolsLoggerConfiguration.cs that was generating invalid values like 1.79E+308 instead of proper [0,1] range values for log sampling.
Root Cause
The method was using BitConverter.ToDouble() with 8 bytes from crypto RNG, which created values outside the [0,1] range needed for sampling probability calculations.
Solution
Changes Made
Validation
All Core Components Passing (669/669 tests):
Compatibility
Fixes #951