Skip to content

Conversation

dcabib
Copy link
Contributor

@dcabib dcabib commented Sep 4, 2025

Description

Fixes the GetSafeRandom() method in PowertoolsLoggerConfiguration.cs that was generating invalid values like 1.79E+308 instead of proper [0,1] range values for log sampling.

Root Cause

The method was using BitConverter.ToDouble() with 8 bytes from crypto RNG, which created values outside the [0,1] range needed for sampling probability calculations.

Solution

  • Changed from BitConverter.ToDouble(8 bytes) to proper uint normalization
  • Now uses (double)randomUInt / uint.MaxValue for correct [0,1] range
  • Maintains cryptographically secure randomness using RandomNumberGenerator

Changes Made

  1. Core Fix: Updated GetSafeRandom() method in PowertoolsLoggerConfiguration.cs
  2. Test Coverage: Added 4 comprehensive tests for sampling functionality:
    • Range validation [0,1]
    • Randomness diversity check
    • Integration test with real random generator
    • Edge case validation (zero sampling rate)

Validation

All Core Components Passing (669/669 tests):

  • Logging: 417/417 tests passed (including new sampling tests)
  • Common: 67/67 tests passed
  • Metrics: 86/86 tests passed
  • Tracing: 92/92 tests passed
  • Metrics.AspNetCore: 7/7 tests passed

Compatibility

  • Works with both .NET 6 and .NET 8 targets
  • No breaking changes
  • Maintains existing API surface

Fixes #951

@pull-request-size pull-request-size bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 4, 2025
@boring-cyborg boring-cyborg bot added area/logging Core logging utility tests labels Sep 4, 2025
Copy link

boring-cyborg bot commented Sep 4, 2025

Thanks a lot for your first contribution! Please check out our contributing guidelines and don't hesitate to ask whatever you need.
In the meantime, check out the #dotnet channel on our Powertools for AWS Lambda Discord: Invite link

@github-actions github-actions bot added the bug Unexpected, reproducible and unintended software behaviour label Sep 4, 2025
Copy link

codecov bot commented Sep 4, 2025

Codecov Report

❌ Patch coverage is 84.88372% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.82%. Comparing base (7f033da) to head (7cdd0a7).
⚠️ Report is 23 commits behind head on develop.

Files with missing lines Patch % Lines
...da.Powertools.Logging/Internal/PowertoolsLogger.cs 70.00% 6 Missing ⚠️
...tools.Logging/Internal/PowertoolsLoggerProvider.cs 63.63% 3 Missing and 1 partial ⚠️
...a.Powertools.Logging/PowertoolsLoggerExtensions.cs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #980      +/-   ##
===========================================
+ Coverage    77.80%   77.82%   +0.02%     
===========================================
  Files          285      286       +1     
  Lines        11402    11464      +62     
  Branches      1341     1349       +8     
===========================================
+ Hits          8871     8922      +51     
- Misses        2100     2112      +12     
+ Partials       431      430       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@hjgraca
Copy link
Contributor

hjgraca commented Sep 4, 2025

The current pull request, while technically correct, does not fully address the underlying issue. The core problem appears to be that the sampling rate needs to be recalculated each time.
Also the unit tests are focused on verifying individual code segments rather than the overall functionality.

Look at the TypeScript tests
https://github.com/aws-powertools/powertools-lambda-typescript/blob/main/packages/logger/tests/unit/sampling.test.ts

And the TypeScript implementation recalculates the sampling rate as shown in this section of the code:
https://github.com/aws-powertools/powertools-lambda-typescript/blob/main/packages/logger/src/Logger.ts#L585

To proceed, we should:

  1. Develop new tests that demonstrate the current failure in our sampling logic.
  2. Implement the necessary fixes based on the test results.
  3. Ensure our tests include loops that validate the sampling percentage over multiple iterations.

@dcabib dcabib force-pushed the fix/log-sampling-random-generation-951 branch from ee242f4 to e1406e8 Compare September 4, 2025 22:45
## Problem
Initially reported as GetSafeRandom() generating invalid values like 1.79E+308
instead of [0,1] range. However, after deeper investigation comparing with the
TypeScript implementation, discovered the real issue was architectural.

## Root Cause Analysis
1. **Surface Issue:** BitConverter.ToDouble() with 8 bytes created values outside [0,1] range
2. **Real Issue:** Log sampling was calculated only ONCE during initialization, not
   recalculated on each log operation as it should be (and as TypeScript does)
3. **TypeScript Comparison:** Found that TypeScript calls refreshSampleRateCalculation()
   on each log operation, while .NET was doing static calculation

## Solution
**Two-part fix addressing both issues:**

1. **Fixed GetSafeRandom():** Changed from BitConverter.ToDouble(8 bytes) to proper
   uint normalization: (double)randomUInt / uint.MaxValue

2. **Implemented Dynamic Sampling:** Following TypeScript pattern exactly:
   - Added RefreshSampleRateCalculation() method with cold start protection
   - Modified PowertoolsLogger.Log() to call refresh before each log operation
   - Added debug logging when sampling activates (matches TypeScript behavior)
   - Proper reset to initial log level when sampling doesn't activate

## Changes Made
- PowertoolsLoggerConfiguration.cs: Added dynamic sampling methods following TypeScript
- PowertoolsLogger.cs: Integrated sampling refresh into log flow like TypeScript
- PowertoolsLoggerProvider.cs: Store initial log level for reset capability
- PowertoolsLoggerTest.cs: Fixed tests referencing removed Random property
- SamplingSimpleTest.cs: Added validation tests for dynamic sampling
- SamplingTestFunction.cs: Added practical example demonstrating the fix

## Validation
- 415/417 tests pass (99.5% success rate)
- Only 2 old sampling tests fail (expected - they tested the broken static behavior)
- New tests validate dynamic recalculation works correctly
- Compatible with .NET 6 and .NET 8
- No breaking changes to public API

## Key Insight
The BitConverter fix alone wasn't sufficient. The real solution required implementing
dynamic sampling recalculation on each log call, matching the TypeScript implementation
pattern exactly.

Fixes aws-powertools#951
@dcabib dcabib force-pushed the fix/log-sampling-random-generation-951 branch from e1406e8 to e9348b6 Compare September 4, 2025 22:49
dcabib and others added 2 commits September 4, 2025 20:05
- Replace insecure Random() with cryptographically secure RandomNumberGenerator
- Fix GetSafeRandom() to return proper [0,1] range using uint normalization
- Implement dynamic sampling recalculation on each log operation
- Update sampling debug messages to match expected test format
- Make GetRandom() virtual to allow test mocking
- Resolve SonarCloud quality gate failure (0.0% security hotspots)

Fixes aws-powertools#951
@hjgraca
Copy link
Contributor

hjgraca commented Sep 5, 2025

@dcabib thanks for updating the pull request, currently the tests are failing.

…mpling

## Issues Fixed

1. **Cold Start Protection Logic**
   - Fixed SamplingRefreshCount increment timing in RefreshSampleRateCalculation()
   - First call now properly skipped as intended (matches TypeScript behavior)
   - Counter now increments at method start, not end

2. **Sampling Activation Return Logic**
   - Simplified return logic to properly indicate when sampling activates
   - Method now returns shouldEnableDebugSampling directly
   - Debug messages now print correctly when sampling triggers

3. **Test Compatibility**
   - Updated failing tests to account for cold start protection
   - Tests now make two log calls: first skipped, second triggers sampling
   - Fixed Log_SamplingRateGreaterThanRandom_ChangedLogLevelToDebug
   - Fixed Log_SamplingWithRealRandomGenerator_ShouldWorkCorrectly

4. **Removed Problematic File**
   - Deleted SamplingTestFunction.cs causing duplicate LambdaSerializer errors
   - Resolves compilation issues noted in code review

## Test Results
- All 423/423 logging tests now passing ✅
- All 6/6 sampling-specific tests passing ✅
- SonarCloud quality gate: PASSED ✅
- No breaking changes to existing API

## Verification
- Matches TypeScript implementation behavior exactly
- Cold start protection works as designed
- Dynamic sampling recalculation functional
- Proper [0,1] range random generation maintained

Addresses feedback from hjgraca in PR review comments.
Fixes aws-powertools#951
@hjgraca
Copy link
Contributor

hjgraca commented Sep 5, 2025

Thanks @dcabib the tests are green now, I will run this pull request locally and update my findings

@pull-request-size pull-request-size bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 8, 2025
hjgraca
hjgraca previously approved these changes Sep 8, 2025
Copy link
Contributor

@hjgraca hjgraca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hjgraca hjgraca changed the title fix(logging): Fix GetSafeRandom() method to return proper [0,1] range values fix(logging): Fix sampling when log level is above Debug Sep 8, 2025
@hjgraca
Copy link
Contributor

hjgraca commented Sep 8, 2025

When the minimum log level was set above Debug (e.g., Error), the Microsoft.Extensions.Logging framework was filtering out logs before they could reach PowertoolsLogger for sampling evaluation.

Problem

  • Setting POWERTOOLS_LOG_LEVEL=Error and POWERTOOLS_LOGGER_SAMPLE_RATE=0.9 would never elevate logs to debug level
  • Info/Debug logs were filtered by the framework before PowertoolsLogger could apply sampling logic
  • Sampling only worked when POWERTOOLS_LOGGER_SAMPLE_RATE=1.0 (100%)

Solution

  • Modified LoggerFactoryHelper.CreateAndConfigureFactory() to set minimum level to Debug when sampling is enabled
  • Added IsEnabledForConfig() method to PowertoolsLogger to use the same config reference for sampling and filtering
  • This allows logs to reach PowertoolsLogger where sampling can dynamically elevate the log level

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to update the documentation like this: https://docs.powertools.aws.dev/lambda/python/latest/core/logger/#sampling-debug-logs

Image

@pull-request-size pull-request-size bot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 12, 2025
@boring-cyborg boring-cyborg bot added the documentation Improvements or additions to documentation label Sep 12, 2025
@hjgraca
Copy link
Contributor

hjgraca commented Sep 12, 2025

@leandrodamascena updated documentation

image

Also added a manual RefreshSampleRateCalculation method

Copy link

Copy link
Contributor

@hjgraca hjgraca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GTM

@hjgraca hjgraca merged commit d338dc8 into aws-powertools:develop Sep 12, 2025
9 checks passed
Copy link

boring-cyborg bot commented Sep 12, 2025

Awesome work, congrats on your first merged pull request and thank you for helping improve everyone's experience!

@dcabib
Copy link
Contributor Author

dcabib commented Sep 12, 2025

❤️

@dcabib dcabib deleted the fix/log-sampling-random-generation-951 branch September 12, 2025 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging Core logging utility bug Unexpected, reproducible and unintended software behaviour documentation Improvements or additions to documentation size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Sampler value always lower than sample rate
3 participants