Skip to content

processor_sampling: perf improvement in tail sampling reconciliation #11441

Merged
edsiper merged 5 commits intomasterfrom
processor_sampling-trace-perf
Feb 10, 2026
Merged

processor_sampling: perf improvement in tail sampling reconciliation #11441
edsiper merged 5 commits intomasterfrom
processor_sampling-trace-perf

Conversation

@edsiper
Copy link
Member

@edsiper edsiper commented Feb 9, 2026

This PR adds an in-tree benchmarking workflow for processor_sampling and introduces a runtime toggle to compare the legacy and optimized tail reconcile implementations without switching
branches.

What’s included

  • Added new benchmark binary: flb-bench-processor_sampling
    • New file: benchmarks/flb-bench-processor_sampling.c
    • CMake wiring in benchmarks/CMakeLists.txt
  • Extended benchmark modes to support tail reconcile profiling:
    • -m tail-reconcile
    • -L 0|1 to select optimized vs legacy reconcile path
  • Updated tail sampling implementation in plugins/processor_sampling/sampling_tail.c:
    • New config option: legacy_reconcile (default false)
    • Explicit reconcile implementations:
      • reconcile_and_create_ctrace_legacy(...)
      • reconcile_and_create_ctrace_optimized(...)
    • Runtime dispatch between both paths
    • Benchmark-only helper: sampling_tail_bench_reconcile(...) (guarded by FLB_SAMPLING_BENCH)

Why

  • Enables deterministic A/B performance testing and easier perf analysis.
  • Improves developer workflow for validating sampling reconcile changes and catching regressions.

Observed benchmark result (same workload)

  • Params: -m tail-reconcile -i 200000 -w 20000 -s 20 -a 4 -c 4096 -t 50000 -S 12345
Metric Legacy (-L 1) Optimized (-L 0) Delta
Total time 4.132s 3.765s -8.9%
Per span 1033.02ns 941.14ns -8.9%
Throughput 968,038/s 1,062,541/s +9.8%
Perf samples 17,612 16,004 -1,608

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features

    • Added configurable legacy vs. optimized trace reconciliation for sampling.
  • Tests

    • Added a benchmark tool to measure trace-sampling processor performance across modes and configurations.
  • Chores

    • Enabled processor-sampling benchmarks in the build.
    • Tightened commit subject prefix validation and added special mapping for benchmark-related paths.

@coderabbitai
Copy link

coderabbitai bot commented Feb 9, 2026

📝 Walkthrough

Walkthrough

Adds a new processor-sampling benchmark executable and harness, implements an optimized cache-backed reconciliation path with a legacy toggle in processor_sampling tail code, and tightens commit-prefix validation with special handling for the benchmarks path.

Changes

Cohort / File(s) Summary
Benchmark CMake
benchmarks/CMakeLists.txt
Adds new executable target flb-bench-processor_sampling, compiles with FLB_SAMPLING_BENCH=1, and links to fluent-bit-static and thread libs.
Benchmark tool
benchmarks/flb-bench-processor_sampling.c
New benchmark program: CLI parsing, synthetic trace/span generation, deterministic RNG, probabilistic and tail-reconcile modes, warmup/iteration loops, metrics output, and two weak hooks (flb_input_trace_append_skip_processor_stages, sampling_conditions_check).
Sampling tail reconciliation
plugins/processor_sampling/sampling_tail.c
Adds legacy_reconcile setting, reconcile cache types (reconcile_resource_entry, reconcile_scope_entry), cache management helpers, optimized and legacy reconcile flows, cache cleanup, and a bench-facing reconcile entrypoint; reconciler selects path by setting.
Commit prefix check
.github/scripts/commit_prefix_check.py
Tightens subject-prefix regex to require a token after :, maps benchmarks/ paths to benchmarks: prefix inference, and narrows extracted prefix capture group.

Sequence Diagram

sequenceDiagram
    autonumber
    participant CLI as Benchmark CLI
    participant Synth as Trace Synthesizer
    participant Sampling as Sampling Plugin
    participant Cache as Reconcile Cache / Registry
    participant Reporter as Metrics Reporter

    CLI->>CLI: parse args, warmup, iterations
    loop per iteration
        CLI->>Synth: request synthetic trace
        Synth-->>CLI: generated ctrace
        CLI->>Sampling: submit trace for processing
        Sampling->>Cache: consult/update reconciliation cache (optimized) or follow legacy path
        Cache-->>Sampling: mapped/created spans/ctrace
        Sampling-->>CLI: processed trace (kept/dropped)
        CLI->>Reporter: record spans processed
    end
    Reporter-->>CLI: compute elapsed, throughput, per-span time
    CLI->>CLI: output summary
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested reviewers

  • cosmo0920
  • niedbalski
  • patrick-stephens
  • celalettin1286

Poem

🐰 I hopped through traces, spans in line,

Caches nested snug, legacy or fine.
Benchmarks warmed, iterations spun,
Counting hops until the work is done.
🥕✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'processor_sampling: perf improvement in tail sampling reconciliation' accurately reflects the main change—a performance optimization in tail sampling reconciliation. It is specific, concise, and clearly summarizes the primary objective of the PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch processor_sampling-trace-perf

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
…B benchmark

Introduce a reproducible in-tree benchmark workflow to compare tail sampling
reconcile implementations and validate performance changes with perf.

changes:

  - processor_sampling (tail):
    - add `legacy_reconcile` config option (default: `false`)
    - split reconcile implementation into explicit paths:
      - `reconcile_and_create_ctrace_legacy(...)`
      - `reconcile_and_create_ctrace_optimized(...)`
    - select reconcile path at runtime in dispatch logic
    - add benchmark-only helper:
      - `sampling_tail_bench_reconcile(...)`
      - compiled under `FLB_SAMPLING_BENCH`

  - benchmarks:
    - extend `flb-bench-processor_sampling` with:
      - `-m tail-reconcile`
      - `-L 0|1` to toggle optimized vs legacy
    - keep probabilistic sampling benchmark mode
    - compile benchmark with required sampling sources and
      `FLB_SAMPLING_BENCH=1` for direct reconcile profiling

About the benchmark

  - Enable fast A/B testing of reconcile code paths without branch switching.
  - Provide perf-friendly, deterministic comparisons for regression tracking.

Measured result (same workload)

  `-m tail-reconcile -i 200000 -w 20000 -s 20 -a 4 -c 4096 -t 50000 -S 12345`

  +----------------------+----------------+----------------+---------------------+
  | Metric               | Legacy (-L 1)  | Optimized (-L0)| Delta               |
  +----------------------+----------------+----------------+---------------------+
  | Total time           | 4.132 s        | 3.765 s        | -0.367 s (-8.9%)    |
  | Per span             | 1033.02 ns     | 941.14 ns      | -91.88 ns (-8.9%)   |
  | Throughput           | 968,038/s      | 1,062,541/s    | +94,503/s (+9.8%)   |
  | perf samples         | 17,612         | 16,004         | -1,608              |
  +----------------------+----------------+----------------+---------------------+

  Perf hotspot note (legacy sample)

  - `run_tail_reconcile`:
    - `create_trace` ~59.65%
    - `sampling_tail_bench_reconcile` ~31.33%
      - `reconcile_and_create_ctrace_legacy` ~8.86%
      - destroy chain (`ctr_destroy` subtree) dominates inside reconcile path

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 26342e919f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 705 to 707
keymap_cache[keymap_idx] = flb_ra_value_from_object(&val,
rp->key->subkeys,
FLB_TRUE);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace undefined flb_ra_value_from_object call

The new cache path in flb_ra_translate_check calls flb_ra_value_from_object, but there is no definition or declaration for this symbol anywhere in the repo (only this call site). That means the build will fail with an undefined symbol (or at minimum an implicit declaration error) whenever this file is compiled. You likely intended to reuse flb_ra_key_to_value_ext or a helper from flb_ra_key.c, but as written this breaks compilation for all targets that include src/flb_record_accessor.c.

Useful? React with 👍 / 👎.

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@benchmarks/flb-bench-processor_sampling.c`:
- Around line 467-468: Guard against division by zero before computing
ns_per_span and spans_per_sec: check if spans_processed is 0 and handle that
case (e.g., set ns_per_span = 0.0 and spans_per_sec = 0.0 or skip reporting)
instead of performing the division; update the code around the existing
expressions referencing ns_per_span, spans_per_sec and spans_processed in
benchmarks/flb-bench-processor_sampling.c so the calculation only occurs when
spans_processed > 0 and the zero-case produces a safe, deterministic value.
🧹 Nitpick comments (3)
plugins/processor_sampling/sampling_tail.c (3)

523-540: Duplicate comment block (copy-paste leftover).

Lines 523–530 and 533–540 contain the same comment verbatim. Remove one.

Proposed fix
-        /*
-         * Detach the span from its previous context completely and
-         * re-attach it to the new one. If we only move the local list
-         * reference (span->_head) the span would still belong to the
-         * original ctrace context which later on might lead to use after
-         * free issues when the new context is destroyed. Make sure to
-         * update all references.
-         */
-
-        /* detach from the original scope span and global list */
         /*
          * Detach the span from its previous context completely and
          * re-attach it to the new one. If we only move the local list
          * reference (span->_head) the span would still belong to the
          * original ctrace context which later on might lead to use after
          * free issues when the new context is destroyed. Make sure to
          * update all references.
          */
+
+        /* detach from the original scope span and global list */
         cfl_list_del(&span->_head);

762-766: Stack settings is partially initialized — uninitialized list_ctraces is a latent risk.

Only span_reg, legacy_reconcile, and decision_wait are assigned. While the current reconcile functions don't access other fields, the uninitialized list_ctraces (a cfl_list with garbage pointers) could cause a crash if any future code path traverses it. A memset(&settings, 0, sizeof(settings)) would be safer.

Proposed fix
     struct sampling_settings settings;
 
+    memset(&settings, 0, sizeof(settings));
     settings.span_reg = span_reg;
     settings.legacy_reconcile = legacy_reconcile;
     settings.decision_wait = decision_wait;

552-554: Typo: "remote" → "remove".

-        /* remote t_span entry */
+        /* remove t_span entry */

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @.github/scripts/commit_prefix_check.py:
- Line 23: The regex PREFIX_RE currently captures the prefix plus trailing
whitespace/char which causes subject_prefix (from the match at line using
.group()) to be wrong; change PREFIX_RE to put a capture group around just the
prefix portion (e.g. capture up through the colon) and then update the code that
extracts the prefix (the place assigning subject_prefix) to use match.group(1)
instead of .group(); verify any other usages (like in detect_bad_squash where
PREFIX_RE.match(l) is only used as a boolean) still behave correctly and adjust
any other .group() calls to group(1) when the prefix string is needed.

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
  Update commit prefix regex handling in `.github/scripts/commit_prefix_check.py`
  so subject prefix extraction returns only the prefix token (e.g. `build:`)
  instead of including trailing content.

  Changes:
  - make `PREFIX_RE` capture just the prefix in group 1:
    - from: `^[a-z0-9_]+:\s+\S`
    - to:   `^([a-z0-9_]+:)\s+\S`
  - update subject prefix extraction to use `match.group(1)`

  Why:
  - previous extraction used `.group()` and could include extra matched text,
    causing incorrect prefix comparisons during validation.

  Notes:
  - `detect_bad_squash()` uses `PREFIX_RE.match(...)` only as a boolean;
    behavior remains unchanged there.

Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
@edsiper edsiper merged commit e253e3f into master Feb 10, 2026
61 checks passed
@edsiper edsiper deleted the processor_sampling-trace-perf branch February 10, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant