-
Notifications
You must be signed in to change notification settings - Fork 1.8k
conf: parser: add Mikrotik firewall parser #11199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds six Mikrotik-specific log parsers to a configuration file, each using regex-based named capture groups to extract fields (actions, interfaces, IPs/ports, MACs, protocols, NAT, connection state, topics) from firewall, proxy, DHCP, OpenVPN, login, and miscellaneous Mikrotik logs. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant LogSource as Mikrotik log stream
participant ParserConf as parsers_mikrotik.yaml
participant RegexEngine as Regex matcher
participant Outbound as Structured event
Note over LogSource,ParserConf: New parser rules added
LogSource->>ParserConf: Emit raw log line
ParserConf->>RegexEngine: Select appropriate parser (firewall/proxy/dhcp/ovpn/logins/other)
RegexEngine->>RegexEngine: Apply named-capture regex (with optional/alt branches)
alt Match
RegexEngine->>Outbound: Emit structured fields (action, src/dst IPs, ports, MACs, interfaces, NAT, topic...)
else No match
RegexEngine->>Outbound: Forward unparsed or tag as unmatched
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Gathered some sample logs, ran them through |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
conf/parsers_mikrotik.yaml(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11168
File: conf/parsers_mult.yaml:8-14
Timestamp: 2025-11-16T22:16:26.032Z
Learning: In Fluent Bit parser configurations (both .conf and .yaml formats), the regex engine automatically strips leading and trailing `/` characters from regex patterns, so patterns like `/Processing by .../` and `Processing by ...` are functionally equivalent and both work correctly.
🔇 Additional comments (3)
conf/parsers_mikrotik.yaml (3)
103-123: Well-structured proxy parser.The
mikrotik-proxyparser is concise and well-designed. The optionalcachesection is handled cleanly, and field extraction is straightforward.
216-254: Catchall parser is appropriately positioned.The
mikrotik-otherparser correctly includes an exhaustive list of Mikrotik logging topics and is designed as a fallback. The large alternation is necessary given the variety of log sources, and the comment correctly notes this should be the last parser evaluated.
61-65: Overly loose IP address validation in mikrotik-firewall.The pattern
[0-9.]{7,15}is too permissive and will match invalid IP addresses such as999.999.999.999or1.1.1.(incomplete octet). While not a functional blocker if downstream processing validates, this pattern is repeated across multiple parsers and could lead to false positives.Consider using a more precise pattern:
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)Or, if strict validation is deferred, at minimum ensure this pattern is used consistently across all IP captures.
⛔ Skipped due to learnings
Learnt from: zhihonl Repo: fluent/fluent-bit PR: 10586 File: plugins/filter_kubernetes/kube_regex.h:29-29 Timestamp: 2025-08-14T18:14:41.453Z Learning: Kubernetes ReplicaSet suffixes generated from pod-template-hash can vary in length and are not always 10 characters. Examples include "9dbdfc456" (9 chars) from "amazon-cloudwatch-observability-controller-manager-9dbdfc456" and "914055854" (9 chars) from "kairosdb-914055854". The DEPLOYMENT_REGEX pattern using {6,10} range for the ID group correctly accommodates this variable length behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: Hank Leininger <hlein@korelogic.com>
9724468 to
54e1900
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
conf/parsers_mikrotik.yaml (1)
152-187: Replace loose IPv4 pattern with strict validation in mikrotik-ovpn parser.The loose IP pattern
(?:[0-9]{1,3}\.){3}[0-9]{1,3}incorrectly accepts invalid addresses like 999.999.999.999, 256.256.256.256, and 192.168.1.999. The proposed strict pattern correctly validates IPv4 octets (0-255 range) and passes all validation tests.Update lines 163, 167, and 169:
- (?<src_ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) + (?<src_ip>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}) ,\s port: \s (?<src_port>\d+) \s to \s - (?<dst_ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) + (?<dst_ip>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}) | - < (?<src_ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) >:\s + < (?<src_ip>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}) >:\s
🧹 Nitpick comments (1)
conf/parsers_mikrotik.yaml (1)
1-254: Ensure documentation is updated per "docs-required" label.The PR is labeled "docs-required". Please ensure that parser documentation is added or updated to describe these six new Mikrotik parsers, their fields, and any prerequisites (e.g., log format examples, configuration instructions).
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
conf/parsers_mikrotik.yaml(1 hunks)
🔇 Additional comments (4)
conf/parsers_mikrotik.yaml (4)
1-254: Acknowledge: MAC address patterns are now case-insensitive.Good catch on the earlier review — all three MAC patterns (lines 36, 146, 207) now correctly use
[A-Fa-f0-9]{2}to accept both uppercase and lowercase hex, matching Mikrotik's mixed-case output. Whitespace in the YAML is correctly ignored by the verbose regex mode ((?x)).
72-98: Critical: Backreferences to optional capture groups will cause regex match failures.Lines 77, 84, 87, and 94 use backreferences (
\k<src_port>and\k<dst_port>) that reference optional capture groups defined at lines 62 and 65. When these groups don't match (e.g., a NAT log without port numbers), the backreferences have no value to reference and the entire regex will fail to match valid log lines, despite the NAT block itself being optional.Example: A log with NAT rewriting but no explicit ports will fail to parse because
\k<src_port>cannot reference an unmatched optional group.Use inline patterns instead of backreferences to avoid depending on whether optional groups matched:
(?: NAT\s (?: \( (?<nat_source_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) - (?: : \k<src_port> )? + (?: : \d+ )? -> (?<nat_source>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) (?: : (?<nat_src_port>\d+) )? \) -> - \k<dest> - (?: : \k<dst_port> )? + (?:[0-9]{1,3}\.){3}[0-9]{1,3} + (?: : \d+ )? | - (?: \k<source> | (?<nat_source_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) ) - (?: : \k<src_port> )? + (?: (?:[0-9]{1,3}\.){3}[0-9]{1,3} | (?<nat_source_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) ) + (?: : \d+ )? -> \( (?<nat_dest_orig>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) (?: : (?<nat_dst_port>\d+) )? -> (?<nat_dest>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) - (?: : \k<dst_port> )? + (?: : \d+ )? \) ) , \s )?⛔ Skipped due to learnings
Learnt from: hlein Repo: fluent/fluent-bit PR: 11199 File: conf/parsers_mikrotik.yaml:72-98 Timestamp: 2025-11-24T23:14:30.311Z Learning: The mikrotik-firewall parser regex in conf/parsers_mikrotik.yaml has been validated against 620 million lines of real Mikrotik logs with 100% match rate. The backreferences to optional port groups (\k<src_port> and \k<dst_port>) are correct because Mikrotik log format guarantees port consistency: if source has a port, destination always has a port; if source lacks a port, destination never has one.
125-150: Review comment is incorrect—no prior recommendation exists, and firewall parser's loose IP pattern has been validated as correct.The review references "strict IPv4 validation as recommended above," but no such recommendation appears anywhere in the codebase. More critically, the firewall parser at lines 72-98 uses the identical loose IP pattern
(?:[0-9]{1,3}\.){3}[0-9]{1,3}, and according to the provided learnings, this pattern has been validated against 620 million lines of real Mikrotik logs with 100% match rate. This validation confirms the loose pattern correctly handles Mikrotik's actual log format. Applying strict IPv4 validation would contradict this evidence and likely reject valid Mikrotik log entries.Likely an incorrect or invalid review comment.
189-214: This review comment should be disregarded.The mikrotik-firewall parser already uses the identical loose IP pattern at lines 72-98, and the learnings document confirms it has been validated against 620 million real Mikrotik logs with 100% match rate. Because Mikrotik devices generate well-formatted logs with consistent structure, the loose pattern is appropriate and necessary for compatibility with actual log output. Applying strict validation would create inconsistency with the already-validated firewall parser and add no practical benefit.
Likely an incorrect or invalid review comment.
| (?<source>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) | ||
| (?: : (?<src_port>\d+) )? | ||
| -> | ||
| (?<dest>(?:[0-9]{1,3}\.){3}[0-9]{1,3}) | ||
| (?: : (?<dst_port>\d+) )? | ||
| ,\s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❓ Verification inconclusive
IP address pattern accepts invalid addresses; consider stricter validation.
Lines 61, 64 (and repeated at 76, 79, 90, 93) use the pattern (?:[0-9]{1,3}\.){3}[0-9]{1,3}, which matches any sequence of 1–3 digits separated by dots. This allows invalid IPv4 addresses such as 999.999.999.999, 00.00.00.00, or 192.168.256.1. While this may be acceptable for log parsing if Mikrotik's output is known to be well-formed, consider using proper IPv4 validation for robustness:
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}This pattern strictly validates each octet to the range 0–255.
If tighter validation is desired, apply this diff to the firewall parser (and repeat for DHCP, OpenVPN, and login parsers):
- (?<source>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
+ (?<source>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})
(?: : (?<src_port>\d+) )?
->
- (?<dest>(?:[0-9]{1,3}\.){3}[0-9]{1,3})
+ (?<dest>(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})I'll verify the regex patterns by testing them against various IPv4 addresses to confirm the claims in the review comment.
#!/usr/bin/env python3
import re
# Current pattern from the parser
current_pattern = r'(?:[0-9]{1,3}\.){3}[0-9]{1,3}'
# Proposed stricter pattern
strict_pattern = r'(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}'
# Test cases: (address, should_be_valid)
test_cases = [
("192.168.1.1", True),
("10.0.0.0", True),
("255.255.255.255", True),
("0.0.0.0", True),
("999.999.999.999", False),
("192.168.256.1", False),
("192.168.1.256", False),
("256.1.1.1", False),
("192.168.1", False),
("192.168.1.1.1", False),
("00.00.00.00", True), # Technically valid as octet, but semantically odd
("01.02.03.04", True), # Leading zeros are technically matched
]
print("Testing CURRENT pattern: " + current_pattern)
print("-" * 60)
current_results = []
for addr, should_be_valid in test_cases:
match = re.fullmatch(current_pattern, addr)
matched = match is not None
current_results.append((addr, should_be_valid, matched))
status = "✓" if matched == should_be_valid else "✗ MISMATCH"
print(f"{addr:20} | Should be valid: {str(should_be_valid):5} | Matched: {str(matched):5} | {status}")
print("\n\nTesting STRICT pattern: " + strict_pattern)
print("-" * 60)
strict_results = []
for addr, should_be_valid in test_cases:
match = re.fullmatch(strict_pattern, addr)
matched = match is not None
strict_results.append((addr, should_be_valid, matched))
status = "✓" if matched == should_be_valid else "✗ MISMATCH"
print(f"{addr:20} | Should be valid: {str(should_be_valid):5} | Matched: {str(matched):5} | {status}")
print("\n\nDIFFERENCES:")
print("-" * 60)
for i, (addr, should_be_valid, _) in enumerate(test_cases):
current_matched = current_results[i][2]
strict_matched = strict_results[i][2]
if current_matched != strict_matched:
print(f"{addr:20} | Current: {str(current_matched):5} | Strict: {str(strict_matched):5}")IP pattern is permissive but validates correctly in practice; consider stricter validation if defense-in-depth is desired.
Lines 61, 64, 76, 79, 90, 93 use the pattern (?:[0-9]{1,3}\.){3}[0-9]{1,3}, which theoretically accepts invalid IPv4 addresses such as 999.999.999.999 or 192.168.256.1. However, the parser has been validated against 620 million real Mikrotik logs with a 100% match rate, indicating the current pattern works correctly for the actual log format produced by Mikrotik.
If stricter validation is desired for defense-in-depth, apply the stricter pattern:
(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(?:\.(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3}Otherwise, the current approach is sufficient given real-world validation results.
🤖 Prompt for AI Agents
In conf/parsers_mikrotik.yaml around lines 61-66 (and also at lines 64, 76, 79,
90, 93) the IPv4 pattern uses the permissive fragment
(?:[0-9]{1,3}\.){3}[0-9]{1,3} which accepts invalid octets like 999; if you want
stricter defense-in-depth replace each occurrence with a full IPv4 validator
that limits octets to 0-255 (the provided pattern in the review comment)
ensuring you update every listed line consistently; after replacing, run the
parser test suite against sample logs to confirm no false negatives and update
any inline comments to note the stricter validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment - standardized on (?:[0-9]{1,3}\.){3}[0-9]{1,3} which is used by other, existing parsers like iptables in parsers_extra.conf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the IPv4 pattern used in the iptables parser in parsers_extra.conf
echo "=== Checking iptables parser in parsers_extra.conf ==="
fd -e conf "parsers_extra.conf" --exec cat {}
echo -e "\n\n=== Searching for IPv4 patterns in existing parsers ==="
rg -n "\\[0-9\\].*\\..*\\[0-9\\]" conf/ --type yaml --type conf -A 2 -B 2 | head -100
Length of output: 7290
@hlein, thank you for the clarification! I've verified that the iptables parser in parsers_extra.conf does indeed use the same (?:[0-9]{1,3}\.){3}[0-9]{1,3} pattern. Standardizing on this pattern across all parsers makes perfect sense for consistency and maintainability.
Given that your firewall parser has been validated against 620 million real log lines with 100% match rate, and you're following established conventions in the codebase, the current pattern is clearly the right choice here.
🐰💭
✏️ Learnings added
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11199
File: conf/parsers_mikrotik.yaml:61-66
Timestamp: 2025-11-24T23:19:02.561Z
Learning: In Fluent Bit parser configurations, the IPv4 pattern `(?:[0-9]{1,3}\.){3}[0-9]{1,3}` is standardized across parsers (e.g., iptables in parsers_extra.conf, mikrotik parsers) for consistency and maintainability, even though it's permissive and could theoretically match invalid addresses like 999.999.999.999. This is acceptable because the parsers are validated against real-world logs where the input format is controlled by the logging source.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11199
File: conf/parsers_mikrotik.yaml:72-98
Timestamp: 2025-11-24T23:14:30.311Z
Learning: The mikrotik-firewall parser regex in conf/parsers_mikrotik.yaml has been validated against 620 million lines of real Mikrotik logs with 100% match rate. The backreferences to optional port groups (\k<src_port> and \k<dst_port>) are correct because Mikrotik log format guarantees port consistency: if source has a port, destination always has a port; if source lacks a port, destination never has one.
Learnt from: hlein
Repo: fluent/fluent-bit PR: 11168
File: conf/parsers_mult.yaml:8-14
Timestamp: 2025-11-16T22:16:26.032Z
Learning: In Fluent Bit parser configurations (both .conf and .yaml formats), the regex engine automatically strips leading and trailing `/` characters from regex patterns, so patterns like `/Processing by .../` and `Processing by ...` are functionally equivalent and both work correctly.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.