Skip to content

Conversation

@hlein
Copy link

@hlein hlein commented Nov 24, 2025


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change <- the changes are configs
  • Debug log output from testing the change
  • [N/A] Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • [N/A] Run local packaging test showing all targets (including any new ones) build.
  • [N/A] Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • [N/A] Documentation required for this feature

Backporting

  • [N/A] Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • New Features
    • Added comprehensive Mikrotik log parsing for firewall, proxy, DHCP, VPN/OpenVPN, authentication, and miscellaneous system events.
    • Extracts detailed fields (actions, interfaces, protocols, IPs/ports, NAT mappings, MACs, packet info, and event topics) to improve visibility into Mikrotik network performance and security.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 24, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds six Mikrotik-specific log parsers to a configuration file, each using regex-based named capture groups to extract fields (actions, interfaces, IPs/ports, MACs, protocols, NAT, connection state, topics) from firewall, proxy, DHCP, OpenVPN, login, and miscellaneous Mikrotik logs.

Changes

Cohort / File(s) Summary
Mikrotik Log Parsers
conf/parsers_mikrotik.yaml
Adds six new regex-format parsers: mikrotik-firewall, mikrotik-proxy, mikrotik-dhcp, mikrotik-ovpn, mikrotik-logins, and mikrotik-other. Each parser defines detailed named capture groups and optional/alternating sections to extract network and device metadata for downstream processing.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant LogSource as Mikrotik log stream
  participant ParserConf as parsers_mikrotik.yaml
  participant RegexEngine as Regex matcher
  participant Outbound as Structured event

  Note over LogSource,ParserConf: New parser rules added
  LogSource->>ParserConf: Emit raw log line
  ParserConf->>RegexEngine: Select appropriate parser (firewall/proxy/dhcp/ovpn/logins/other)
  RegexEngine->>RegexEngine: Apply named-capture regex (with optional/alt branches)
  alt Match
    RegexEngine->>Outbound: Emit structured fields (action, src/dst IPs, ports, MACs, interfaces, NAT, topic...)
  else No match
    RegexEngine->>Outbound: Forward unparsed or tag as unmatched
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Inspect regex correctness and named-capture group names for each parser branch.
  • Verify optional/alternation branches cover expected log variants without catastrophic backtracking.
  • Check field naming consistency with downstream consumers and edge cases (missing fields, IPv6, quoted/unquoted values).
  • Review mikrotik-firewall and mikrotik-ovpn rules closely — they appear most complex.

Poem

🐰 I sniffed the logs beneath the moon,

Six regex carrots — caught each tune.
Firewall, DHCP, VPN and more,
I hop with fields now parsed and stored.
Hooray—structured hops forevermore! 🥕

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'add Mikrotik firewall parser' but the changeset adds six distinct Mikrotik parsers (firewall, proxy, DHCP, OpenVPN, logins, and other), not just a firewall parser. Update the title to reflect all six parsers being added, e.g., 'conf: parser: add Mikrotik parsers (firewall, proxy, DHCP, VPN, logins, other)' or use a more general title like 'conf: parser: add Mikrotik log parsers'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@hlein
Copy link
Author

hlein commented Nov 24, 2025

Gathered some sample logs, ran them through fluent-bit w/these parsers, compared number of input lines to number of output lines (i.e. matched lines).

$ for A in firewall proxy dhcp ovpn logins other ; do wc -l mikrotik_${A}.numbered && cat mikrotik_${A}.numbered | fluent-bit -q -R parsers_mikrotik.yaml -i stdin -p parser=mikrotik-${A} -o stdout -p format=json_lines | jq -r '.log_seq' | sort -n > matched_${A}.txt ; wc -l matched_${A}.txt ; done
firewall
620784060 mikrotik_firewall.numbered
620784060 matched_firewall.txt
proxy
132515 mikrotik_proxy.numbered
132515 matched_proxy.txt
dhcp
860 mikrotik_dhcp.numbered
860 matched_dhcp.txt
ovpn
3794 mikrotik_ovpn.numbered
3794 matched_ovpn.txt
logins
8 mikrotik_logins.numbered
8 matched_logins.txt
other
2003 mikrotik_other.numbered
2003 matched_other.txt

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f4108db and 9724468.

📒 Files selected for processing (1)
  • conf/parsers_mikrotik.yaml (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11168
File: conf/parsers_mult.yaml:8-14
Timestamp: 2025-11-16T22:16:26.032Z
Learning: In Fluent Bit parser configurations (both .conf and .yaml formats), the regex engine automatically strips leading and trailing `/` characters from regex patterns, so patterns like `/Processing by .../` and `Processing by ...` are functionally equivalent and both work correctly.
🔇 Additional comments (3)
conf/parsers_mikrotik.yaml (3)

103-123: Well-structured proxy parser.

The mikrotik-proxy parser is concise and well-designed. The optional cache section is handled cleanly, and field extraction is straightforward.


216-254: Catchall parser is appropriately positioned.

The mikrotik-other parser correctly includes an exhaustive list of Mikrotik logging topics and is designed as a fallback. The large alternation is necessary given the variety of log sources, and the comment correctly notes this should be the last parser evaluated.


61-65: Overly loose IP address validation in mikrotik-firewall.

The pattern [0-9.]{7,15} is too permissive and will match invalid IP addresses such as 999.999.999.999 or 1.1.1. (incomplete octet). While not a functional blocker if downstream processing validates, this pattern is repeated across multiple parsers and could lead to false positives.

Consider using a more precise pattern:

(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)

Or, if strict validation is deferred, at minimum ensure this pattern is used consistently across all IP captures.

⛔ Skipped due to learnings
Learnt from: zhihonl
Repo: fluent/fluent-bit PR: 10586
File: plugins/filter_kubernetes/kube_regex.h:29-29
Timestamp: 2025-08-14T18:14:41.453Z
Learning: Kubernetes ReplicaSet suffixes generated from pod-template-hash can vary in length and are not always 10 characters. Examples include "9dbdfc456" (9 chars) from "amazon-cloudwatch-observability-controller-manager-9dbdfc456" and "914055854" (9 chars) from "kairosdb-914055854". The DEPLOYMENT_REGEX pattern using {6,10} range for the ID group correctly accommodates this variable length behavior.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Signed-off-by: Hank Leininger <hlein@korelogic.com>
@hlein hlein force-pushed the parsers-add-mikrotik branch from 9724468 to 54e1900 Compare November 24, 2025 23:11
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
conf/parsers_mikrotik.yaml (1)

152-187: Replace loose IPv4 pattern with strict validation in mikrotik-ovpn parser.

The loose IP pattern (?:[0-9]{1,3}\.){3}[0-9]{1,3} incorrectly accepts invalid addresses like 999.999.999.999, 256.256.256.256, and 192.168.1.999. The proposed strict pattern correctly validates IPv4 octets (0-255 range) and passes all validation tests.

Update lines 163, 167, and 169:

-       (?<src_ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
+       (?<src_ip>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})
        ,\s port: \s
         (?<src_port>\d+)
         \s to \s
-       (?<dst_ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
+       (?<dst_ip>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})
         |
-       < (?<src_ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) >:\s
+       < (?<src_ip>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}) >:\s
🧹 Nitpick comments (1)
conf/parsers_mikrotik.yaml (1)

1-254: Ensure documentation is updated per "docs-required" label.

The PR is labeled "docs-required". Please ensure that parser documentation is added or updated to describe these six new Mikrotik parsers, their fields, and any prerequisites (e.g., log format examples, configuration instructions).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9724468 and 54e1900.

📒 Files selected for processing (1)
  • conf/parsers_mikrotik.yaml (1 hunks)
🔇 Additional comments (4)
conf/parsers_mikrotik.yaml (4)

1-254: Acknowledge: MAC address patterns are now case-insensitive.

Good catch on the earlier review — all three MAC patterns (lines 36, 146, 207) now correctly use [A-Fa-f0-9]{2} to accept both uppercase and lowercase hex, matching Mikrotik's mixed-case output. Whitespace in the YAML is correctly ignored by the verbose regex mode ((?x)).


72-98: Critical: Backreferences to optional capture groups will cause regex match failures.

Lines 77, 84, 87, and 94 use backreferences (\k<src_port> and \k<dst_port>) that reference optional capture groups defined at lines 62 and 65. When these groups don't match (e.g., a NAT log without port numbers), the backreferences have no value to reference and the entire regex will fail to match valid log lines, despite the NAT block itself being optional.

Example: A log with NAT rewriting but no explicit ports will fail to parse because \k<src_port> cannot reference an unmatched optional group.

Use inline patterns instead of backreferences to avoid depending on whether optional groups matched:

       (?:
        NAT\s
        (?:
         \(
         (?<nat_source_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
-        (?: : \k<src_port> )?
+        (?: : \d+ )?
         ->
         (?<nat_source>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
         (?: : (?<nat_src_port>\d+) )?
         \)
         ->
-        \k<dest>
-        (?: : \k<dst_port> )?
+        (?:[0-9]{1,3}\.){3}[0-9]{1,3}
+        (?: : \d+ )?
        |
-        (?: \k<source> | (?<nat_source_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) )
-        (?: : \k<src_port> )?
+        (?: (?:[0-9]{1,3}\.){3}[0-9]{1,3} | (?<nat_source_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) )
+        (?: : \d+ )?
         ->
         \(
          (?<nat_dest_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
          (?: : (?<nat_dst_port>\d+) )?
          ->
          (?<nat_dest>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
-         (?: : \k<dst_port> )?
+         (?: : \d+ )?
         \)
        )
        , \s
       )?
⛔ Skipped due to learnings
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11199
File: conf/parsers_mikrotik.yaml:72-98
Timestamp: 2025-11-24T23:14:30.311Z
Learning: The mikrotik-firewall parser regex in conf/parsers_mikrotik.yaml has been validated against 620 million lines of real Mikrotik logs with 100% match rate. The backreferences to optional port groups (\k<src_port> and \k<dst_port>) are correct because Mikrotik log format guarantees port consistency: if source has a port, destination always has a port; if source lacks a port, destination never has one.

125-150: Review comment is incorrect—no prior recommendation exists, and firewall parser's loose IP pattern has been validated as correct.

The review references "strict IPv4 validation as recommended above," but no such recommendation appears anywhere in the codebase. More critically, the firewall parser at lines 72-98 uses the identical loose IP pattern (?:[0-9]{1,3}\.){3}[0-9]{1,3}, and according to the provided learnings, this pattern has been validated against 620 million lines of real Mikrotik logs with 100% match rate. This validation confirms the loose pattern correctly handles Mikrotik's actual log format. Applying strict IPv4 validation would contradict this evidence and likely reject valid Mikrotik log entries.

Likely an incorrect or invalid review comment.


189-214: This review comment should be disregarded.

The mikrotik-firewall parser already uses the identical loose IP pattern at lines 72-98, and the learnings document confirms it has been validated against 620 million real Mikrotik logs with 100% match rate. Because Mikrotik devices generate well-formatted logs with consistent structure, the loose pattern is appropriate and necessary for compatibility with actual log output. Applying strict validation would create inconsistency with the already-validated firewall parser and add no practical benefit.

Likely an incorrect or invalid review comment.

Comment on lines +61 to +66
(?<source>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
(?: : (?<src_port>\d+) )?
->
(?<dest>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
(?: : (?<dst_port>\d+) )?
,\s
Copy link

@coderabbitai coderabbitai bot Nov 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

❓ Verification inconclusive

IP address pattern accepts invalid addresses; consider stricter validation.

Lines 61, 64 (and repeated at 76, 79, 90, 93) use the pattern (?:[0-9]{1,3}\.){3}[0-9]{1,3}, which matches any sequence of 1–3 digits separated by dots. This allows invalid IPv4 addresses such as 999.999.999.999, 00.00.00.00, or 192.168.256.1. While this may be acceptable for log parsing if Mikrotik's output is known to be well-formed, consider using proper IPv4 validation for robustness:

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}

This pattern strictly validates each octet to the range 0–255.

If tighter validation is desired, apply this diff to the firewall parser (and repeat for DHCP, OpenVPN, and login parsers):

-       (?<source>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
+       (?<source>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})
        (?: : (?<src_port>\d+) )?
        ->
-       (?<dest>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
+       (?<dest>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})

I'll verify the regex patterns by testing them against various IPv4 addresses to confirm the claims in the review comment.

#!/usr/bin/env python3
import re

# Current pattern from the parser
current_pattern = r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}'

# Proposed stricter pattern
strict_pattern = r'(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}'

# Test cases: (address, should_be_valid)
test_cases = [
    ("192.168.1.1", True),
    ("10.0.0.0", True),
    ("255.255.255.255", True),
    ("0.0.0.0", True),
    ("999.999.999.999", False),
    ("192.168.256.1", False),
    ("192.168.1.256", False),
    ("256.1.1.1", False),
    ("192.168.1", False),
    ("192.168.1.1.1", False),
    ("00.00.00.00", True),  # Technically valid as octet, but semantically odd
    ("01.02.03.04", True),  # Leading zeros are technically matched
]

print("Testing CURRENT pattern: " + current_pattern)
print("-" * 60)
current_results = []
for addr, should_be_valid in test_cases:
    match = re.fullmatch(current_pattern, addr)
    matched = match is not None
    current_results.append((addr, should_be_valid, matched))
    status = "✓" if matched == should_be_valid else "✗ MISMATCH"
    print(f"{addr:20} | Should be valid: {str(should_be_valid):5} | Matched: {str(matched):5} | {status}")

print("\n\nTesting STRICT pattern: " + strict_pattern)
print("-" * 60)
strict_results = []
for addr, should_be_valid in test_cases:
    match = re.fullmatch(strict_pattern, addr)
    matched = match is not None
    strict_results.append((addr, should_be_valid, matched))
    status = "✓" if matched == should_be_valid else "✗ MISMATCH"
    print(f"{addr:20} | Should be valid: {str(should_be_valid):5} | Matched: {str(matched):5} | {status}")

print("\n\nDIFFERENCES:")
print("-" * 60)
for i, (addr, should_be_valid, _) in enumerate(test_cases):
    current_matched = current_results[i][2]
    strict_matched = strict_results[i][2]
    if current_matched != strict_matched:
        print(f"{addr:20} | Current: {str(current_matched):5} | Strict: {str(strict_matched):5}")

IP pattern is permissive but validates correctly in practice; consider stricter validation if defense-in-depth is desired.

Lines 61, 64, 76, 79, 90, 93 use the pattern (?:[0-9]{1,3}\.){3}[0-9]{1,3}, which theoretically accepts invalid IPv4 addresses such as 999.999.999.999 or 192.168.256.1. However, the parser has been validated against 620 million real Mikrotik logs with a 100% match rate, indicating the current pattern works correctly for the actual log format produced by Mikrotik.

If stricter validation is desired for defense-in-depth, apply the stricter pattern:

(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}

Otherwise, the current approach is sufficient given real-world validation results.

🤖 Prompt for AI Agents
In conf/parsers_mikrotik.yaml around lines 61-66 (and also at lines 64, 76, 79,
90, 93) the IPv4 pattern uses the permissive fragment
(?:[0-9]{1,3}\.){3}[0-9]{1,3} which accepts invalid octets like 999; if you want
stricter defense-in-depth replace each occurrence with a full IPv4 validator
that limits octets to 0-255 (the provided pattern in the review comment)
ensuring you update every listed line consistently; after replacing, run the
parser test suite against sample logs to confirm no false negatives and update
any inline comments to note the stricter validation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment - standardized on (?:[0-9]{1,3}\.){3}[0-9]{1,3} which is used by other, existing parsers like iptables in parsers_extra.conf.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the IPv4 pattern used in the iptables parser in parsers_extra.conf
echo "=== Checking iptables parser in parsers_extra.conf ==="
fd -e conf "parsers_extra.conf" --exec cat {}

echo -e "\n\n=== Searching for IPv4 patterns in existing parsers ==="
rg -n "\\[0-9\\].*\\..*\\[0-9\\]" conf/ --type yaml --type conf -A 2 -B 2 | head -100

Length of output: 7290


@hlein, thank you for the clarification! I've verified that the iptables parser in parsers_extra.conf does indeed use the same (?:[0-9]{1,3}\.){3}[0-9]{1,3} pattern. Standardizing on this pattern across all parsers makes perfect sense for consistency and maintainability.

Given that your firewall parser has been validated against 620 million real log lines with 100% match rate, and you're following established conventions in the codebase, the current pattern is clearly the right choice here.

🐰💭


✏️ Learnings added
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11199
File: conf/parsers_mikrotik.yaml:61-66
Timestamp: 2025-11-24T23:19:02.561Z
Learning: In Fluent Bit parser configurations, the IPv4 pattern `(?:[0-9]{1,3}\.){3}[0-9]{1,3}` is standardized across parsers (e.g., iptables in parsers_extra.conf, mikrotik parsers) for consistency and maintainability, even though it's permissive and could theoretically match invalid addresses like 999.999.999.999. This is acceptable because the parsers are validated against real-world logs where the input format is controlled by the logging source.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11199
File: conf/parsers_mikrotik.yaml:72-98
Timestamp: 2025-11-24T23:14:30.311Z
Learning: The mikrotik-firewall parser regex in conf/parsers_mikrotik.yaml has been validated against 620 million lines of real Mikrotik logs with 100% match rate. The backreferences to optional port groups (\k<src_port> and \k<dst_port>) are correct because Mikrotik log format guarantees port consistency: if source has a port, destination always has a port; if source lacks a port, destination never has one.

Learnt from: hlein
Repo: fluent/fluent-bit PR: 11168
File: conf/parsers_mult.yaml:8-14
Timestamp: 2025-11-16T22:16:26.032Z
Learning: In Fluent Bit parser configurations (both .conf and .yaml formats), the regex engine automatically strips leading and trailing `/` characters from regex patterns, so patterns like `/Processing by .../` and `Processing by ...` are functionally equivalent and both work correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant