processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics#11250
processor_tda: Implement Topological Data Analysis (TDA) plugin for metrics#11250
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds Ripser v1.2.1 as an optional bundled library, exposes a C wrapper and C++ integration, introduces a new TDA processor plugin that computes Betti numbers from time-series via delay embedding, and wires build, packaging, tests, and header installation to conditionally include Ripser support. Changes
Sequence Diagram(s)sequenceDiagram
participant Metrics as Metrics Stream
participant Processor as TDA Processor
participant Window as Sliding Window
participant Embed as Delay Embedding
participant DistMat as Dense→Compressed Builder
participant Ripser as Ripser Engine
participant Export as Metrics Export
Metrics->>Processor: incoming metric points
Processor->>Window: append / rotate samples
Window->>Processor: snapshot when window ready
Processor->>Embed: build embedded vectors (m, τ)
Embed->>DistMat: compute dense pairwise distances
DistMat->>Ripser: convert to compressed & run
Ripser-->>Processor: emit intervals / betti counts (via bridge)
Processor->>Export: emit betti gauges (betti0, betti1, betti2)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
47dccf4 to
327ad4a
Compare
d7c8e49 to
162f01e
Compare
7c7cad7 to
9f9d30b
Compare
99a04b4 to
78c17a2
Compare
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
78c17a2 to
0639950
Compare
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
This processor plugin performs Topological Data Analysis (TDA) on metrics using ripser, which computes persistent homology. The plugin aggregates incoming counters, gauges and untyped metrics into a 1-D time series, keeps a sliding window, builds a dense distance matrix and runs ripser through the new flb_ripser_* wrapper helpers. The resulting Betti numbers (currently betti0 and betti1) are exported as additional gauge metrics. TDA and persistent homology can help reveal hidden order or phase transitions in complex systems that are not easily visible from raw time series. Similar approaches have already been explored in condensed matter physics, for example: Donato, I., Gori, M., & Sarti, A. (2016). Persistent homology analysis of phase transitions. Physical Review E, 93, 052138. https://doi.org/10.1103/PhysRevE.93.052138 The TDA metrics processor now supports an optional delay embedding of the aggregated metric vectors before building the dense distance matrix used by Ripser. When `embed_dim > 1`, we reconstruct a Takens-style delay embedding x_t -> (x_t, x_{t-マм, ..., x_{t-(m-1)マм) over the sliding window, where `m = embed_dim` and `マ= embed_delay`. Each embedded point is a flattened vector of size feature_dim テm and we keep using an Euclidean distance on this reconstructed phase space. This makes the processor more sensitive to occasional cyclic / quasi- periodic regimes in the metric time series: loops in the reconstructed trajectory translate into H1 features in the persistent homology. When `embed_dim = 1`, the behaviour is unchanged and we fall back to the original "no embedding" mode. This change also adds two configuration options: - `embed_dim` (int, default: 3) Delay embedding dimension m. Set to 1 to disable delay embedding. - `embed_delay` (int, default: 1) Lag マin samples between successive delays. The design follows the standard delay embedding approach from Takens' theorem, which shows that (under mild conditions) the attractor of an unknown dynamical system can be reconstructed from a single observed time series via delay coordinates. Reference - F. Takens, "Detecting strange attractors in turbulence", in D. Rand and L.-S. Young (eds.), Dynamical Systems and Turbulence, Lecture Notes in Mathematics, vol. 898, Springer, 1981, pp. 366-381. Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
…tions Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
0639950 to
cde0ea1
Compare
This PR introduces a new processor plugin,
tda, which performs Topological Data Analysis (TDA) on stream metrics using persistent homology.The plugin aggregates incoming counters, gauges, and untyped metrics into a unified n-dimensional feature vector, maintains a sliding window, and utilizes a C-wrapped version of Ripser to compute Betti numbers.
Implementation Details:
Multiple metric streams are mapped to a fixed feature dimension. To handle varying magnitudes and bursty traffic:
log1p(natural logarithm of 1 + magnitude) to dampen dynamic range before distance calculation.The plugin keeps a ring buffer of these vectors. Before processing, it optionally applies Delay Embedding (see below) to reconstruct the phase space geometry.
A dense Euclidean distance matrix is computed from the window. Ripser determines the persistence intervals, which are summarized into Betti numbers exported as new gauges:
fluentbit.tda.betti0: Connected components (clusters).fluentbit.tda.betti1: Loops/Cycles (recurrence).fluentbit.tda.betti2: Voids (higher-order structures).Delay Embedding (Takens' Theorem):
This plugin supports an optional delay embedding [2] of the aggregated metric vectors. When$x_t$ as:
embed_dim > 1, we reconstruct the state space vectorsWhere:
embed_dimembed_delayThis transformation allows the processor to detect cyclic or quasi-periodic regimes (loops in the trajectory) even from limited metric dimensions. These loops translate into$H_1$ features in the persistent homology. If
embed_dim = 1(default), the behavior falls back to the original "no embedding" mode.Motivation:
TDA and persistent homology can help reveal hidden order, phase transitions, or subtle cyclic behaviors in complex systems that are not easily visible from raw time series or standard statistical aggregates. Similar approaches have been explored in condensed matter physics [1] for detecting phase transitions.
Configuration Options:
window_size(int, default: 60): Number of samples to keep in the TDA sliding window.min_points(int, default: 10): Minimum number of samples required before running Ripser.embed_dim(int, default: 3): Delay embedding dimension (embed_delay(int, default: 1): Lag (threshold(double, default: 0): Distance scale selector. 0 enables auto multi-quantile scan; (0,1) uses the specific quantile.References:
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Additional Log:
For just one-time failing case, there is no increasing betti1 and betti2 metrics.
But intermittent failing cases just like the above, this higher order of metrics would raise and detected some of the "phase transitions" which means that there's no stable phase.
This log is macOS's memory leak detector:
There's no leaks in this plugin.
Plus, there's no rules but the TDA metrics tells there's something happens with betti2 and betti1 metrics with non-zeros:
This metrics' detector is different direction to lighten in the depth of anomaly detections.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
fluent/fluent-bit-docs#2277
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
New Features
API
Documentation
Tests
Chores