processor_sampling: perf improvement in tail sampling reconciliation #11441
processor_sampling: perf improvement in tail sampling reconciliation #11441
Conversation
📝 WalkthroughWalkthroughAdds a new processor-sampling benchmark executable and harness, implements an optimized cache-backed reconciliation path with a legacy toggle in processor_sampling tail code, and tightens commit-prefix validation with special handling for the benchmarks path. Changes
Sequence DiagramsequenceDiagram
autonumber
participant CLI as Benchmark CLI
participant Synth as Trace Synthesizer
participant Sampling as Sampling Plugin
participant Cache as Reconcile Cache / Registry
participant Reporter as Metrics Reporter
CLI->>CLI: parse args, warmup, iterations
loop per iteration
CLI->>Synth: request synthetic trace
Synth-->>CLI: generated ctrace
CLI->>Sampling: submit trace for processing
Sampling->>Cache: consult/update reconciliation cache (optimized) or follow legacy path
Cache-->>Sampling: mapped/created spans/ctrace
Sampling-->>CLI: processed trace (kept/dropped)
CLI->>Reporter: record spans processed
end
Reporter-->>CLI: compute elapsed, throughput, per-span time
CLI->>CLI: output summary
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
…B benchmark
Introduce a reproducible in-tree benchmark workflow to compare tail sampling
reconcile implementations and validate performance changes with perf.
changes:
- processor_sampling (tail):
- add `legacy_reconcile` config option (default: `false`)
- split reconcile implementation into explicit paths:
- `reconcile_and_create_ctrace_legacy(...)`
- `reconcile_and_create_ctrace_optimized(...)`
- select reconcile path at runtime in dispatch logic
- add benchmark-only helper:
- `sampling_tail_bench_reconcile(...)`
- compiled under `FLB_SAMPLING_BENCH`
- benchmarks:
- extend `flb-bench-processor_sampling` with:
- `-m tail-reconcile`
- `-L 0|1` to toggle optimized vs legacy
- keep probabilistic sampling benchmark mode
- compile benchmark with required sampling sources and
`FLB_SAMPLING_BENCH=1` for direct reconcile profiling
About the benchmark
- Enable fast A/B testing of reconcile code paths without branch switching.
- Provide perf-friendly, deterministic comparisons for regression tracking.
Measured result (same workload)
`-m tail-reconcile -i 200000 -w 20000 -s 20 -a 4 -c 4096 -t 50000 -S 12345`
+----------------------+----------------+----------------+---------------------+
| Metric | Legacy (-L 1) | Optimized (-L0)| Delta |
+----------------------+----------------+----------------+---------------------+
| Total time | 4.132 s | 3.765 s | -0.367 s (-8.9%) |
| Per span | 1033.02 ns | 941.14 ns | -91.88 ns (-8.9%) |
| Throughput | 968,038/s | 1,062,541/s | +94,503/s (+9.8%) |
| perf samples | 17,612 | 16,004 | -1,608 |
+----------------------+----------------+----------------+---------------------+
Perf hotspot note (legacy sample)
- `run_tail_reconcile`:
- `create_trace` ~59.65%
- `sampling_tail_bench_reconcile` ~31.33%
- `reconcile_and_create_ctrace_legacy` ~8.86%
- destroy chain (`ctr_destroy` subtree) dominates inside reconcile path
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
26342e9 to
c30d4f0
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 26342e919f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
src/flb_record_accessor.c
Outdated
| keymap_cache[keymap_idx] = flb_ra_value_from_object(&val, | ||
| rp->key->subkeys, | ||
| FLB_TRUE); |
There was a problem hiding this comment.
Replace undefined flb_ra_value_from_object call
The new cache path in flb_ra_translate_check calls flb_ra_value_from_object, but there is no definition or declaration for this symbol anywhere in the repo (only this call site). That means the build will fail with an undefined symbol (or at minimum an implicit declaration error) whenever this file is compiled. You likely intended to reuse flb_ra_key_to_value_ext or a helper from flb_ra_key.c, but as written this breaks compilation for all targets that include src/flb_record_accessor.c.
Useful? React with 👍 / 👎.
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@benchmarks/flb-bench-processor_sampling.c`:
- Around line 467-468: Guard against division by zero before computing
ns_per_span and spans_per_sec: check if spans_processed is 0 and handle that
case (e.g., set ns_per_span = 0.0 and spans_per_sec = 0.0 or skip reporting)
instead of performing the division; update the code around the existing
expressions referencing ns_per_span, spans_per_sec and spans_processed in
benchmarks/flb-bench-processor_sampling.c so the calculation only occurs when
spans_processed > 0 and the zero-case produces a safe, deterministic value.
🧹 Nitpick comments (3)
plugins/processor_sampling/sampling_tail.c (3)
523-540: Duplicate comment block (copy-paste leftover).Lines 523–530 and 533–540 contain the same comment verbatim. Remove one.
Proposed fix
- /* - * Detach the span from its previous context completely and - * re-attach it to the new one. If we only move the local list - * reference (span->_head) the span would still belong to the - * original ctrace context which later on might lead to use after - * free issues when the new context is destroyed. Make sure to - * update all references. - */ - - /* detach from the original scope span and global list */ /* * Detach the span from its previous context completely and * re-attach it to the new one. If we only move the local list * reference (span->_head) the span would still belong to the * original ctrace context which later on might lead to use after * free issues when the new context is destroyed. Make sure to * update all references. */ + + /* detach from the original scope span and global list */ cfl_list_del(&span->_head);
762-766: Stacksettingsis partially initialized — uninitializedlist_ctracesis a latent risk.Only
span_reg,legacy_reconcile, anddecision_waitare assigned. While the current reconcile functions don't access other fields, the uninitializedlist_ctraces(acfl_listwith garbage pointers) could cause a crash if any future code path traverses it. Amemset(&settings, 0, sizeof(settings))would be safer.Proposed fix
struct sampling_settings settings; + memset(&settings, 0, sizeof(settings)); settings.span_reg = span_reg; settings.legacy_reconcile = legacy_reconcile; settings.decision_wait = decision_wait;
552-554: Typo: "remote" → "remove".- /* remote t_span entry */ + /* remove t_span entry */
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @.github/scripts/commit_prefix_check.py:
- Line 23: The regex PREFIX_RE currently captures the prefix plus trailing
whitespace/char which causes subject_prefix (from the match at line using
.group()) to be wrong; change PREFIX_RE to put a capture group around just the
prefix portion (e.g. capture up through the colon) and then update the code that
extracts the prefix (the place assigning subject_prefix) to use match.group(1)
instead of .group(); verify any other usages (like in detect_bad_squash where
PREFIX_RE.match(l) is only used as a boolean) still behave correctly and adjust
any other .group() calls to group(1) when the prefix string is needed.
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
Update commit prefix regex handling in `.github/scripts/commit_prefix_check.py`
so subject prefix extraction returns only the prefix token (e.g. `build:`)
instead of including trailing content.
Changes:
- make `PREFIX_RE` capture just the prefix in group 1:
- from: `^[a-z0-9_]+:\s+\S`
- to: `^([a-z0-9_]+:)\s+\S`
- update subject prefix extraction to use `match.group(1)`
Why:
- previous extraction used `.group()` and could include extra matched text,
causing incorrect prefix comparisons during validation.
Notes:
- `detect_bad_squash()` uses `PREFIX_RE.match(...)` only as a boolean;
behavior remains unchanged there.
Signed-off-by: Eduardo Silva <eduardo@chronosphere.io>
This PR adds an in-tree benchmarking workflow for processor_sampling and introduces a runtime toggle to compare the legacy and optimized tail reconcile implementations without switching
branches.
What’s included
Why
Observed benchmark result (same workload)
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
Tests
Chores