Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enhancement(remap transform): add parse_aws_alb_log function #5489

Merged
merged 11 commits into from
Dec 18, 2020

Conversation

fanatid
Copy link
Contributor

@fanatid fanatid commented Dec 11, 2020

Closes #5365

I also was curious how nom will be faster than regex, so create a simple benchmark:

code diff
diff --git a/Cargo.toml b/Cargo.toml
index 956583b23..63dcccf4e 100644
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -565,5 +565,9 @@ name = "remap"
 harness = false
 required-features = ["remap-benches"]
 
+[[bench]]
+name = "parse_aws_elb"
+harness = false
+
 [patch.'https://github.com/tower-rs/tower']
 tower-layer = "0.3"
diff --git a/src/remap/function.rs b/src/remap/function.rs
index 1121a922c..537b19b35 100644
--- a/src/remap/function.rs
+++ b/src/remap/function.rs
@@ -22,7 +22,7 @@ mod md5;
 mod merge;
 mod now;
 mod only_fields;
-mod parse_aws_elb;
+pub mod parse_aws_elb;
 mod parse_duration;
 mod parse_grok;
 mod parse_json;
diff --git a/src/remap/function/parse_aws_elb.rs b/src/remap/function/parse_aws_elb.rs
index 8107d4eb0..056131d51 100644
--- a/src/remap/function/parse_aws_elb.rs
+++ b/src/remap/function/parse_aws_elb.rs
@@ -59,7 +59,7 @@ impl Expression for ParseAwsElbFn {
     }
 }
 
-fn parse_log(mut input: &str) -> Result<Value> {
+pub fn parse_log(mut input: &str) -> Result<Value> {
     let mut log = BTreeMap::new();
 
     macro_rules! get_value {
@@ -189,6 +189,88 @@ fn take_list(cond: impl Fn(char) -> bool) -> impl FnOnce(&str) -> SResult<Vec<&s
     }
 }
 
+pub fn parse_log_raw(mut input: &str) {
+    macro_rules! get_value {
+        ($name:expr, $parser:expr) => {{
+            let result: IResult<&str, _, (&str, nom::error::ErrorKind)> = $parser(input);
+            match result {
+                Ok((rest, value)) => {
+                    input = rest;
+                    value
+                }
+                Err(_) => return,
+            }
+        }};
+    }
+    macro_rules! field_raw {
+        ($name:expr, $parser:expr) => {
+            get_value!($name, $parser)
+        };
+    }
+    macro_rules! field {
+        ($name:expr, $($pattern:pat)|+) => {
+            field_raw!($name, preceded(char(' '), take_while1(|c| matches!(c, $($pattern)|+))))
+        };
+    }
+
+    field_raw!("type", take_while1(|c| matches!(c, 'a'..='z' | '0'..='9')));
+    field!("timestamp", '0'..='9' | '.' | '-' | ':' | 'T' | 'Z');
+    field_raw!("elb", take_anything);
+    field!("client_host", '0'..='9' | '.' | ':' | '-');
+    field!("target_host", '0'..='9' | '.' | ':' | '-');
+    field!("request_processing_time", '0'..='9' | '.' | '-');
+    field!("target_processing_time", '0'..='9' | '.' | '-');
+    field!("response_processing_time", '0'..='9' | '.' | '-');
+    field!("elb_status_code", '0'..='9' | '-');
+    field!("target_status_code", '0'..='9' | '-');
+    field!("received_bytes", '0'..='9' | '-');
+    field!("sent_bytes", '0'..='9' | '-');
+    let request = get_value!("request", take_quoted1);
+    let mut iter = request.splitn(2, ' ');
+    iter.next().unwrap();
+    match iter.next() {
+        Some(value) => {
+            let mut iter = value.rsplitn(2, ' ');
+            iter.next().unwrap();
+            iter.next().unwrap();
+        }
+        None => return,
+    };
+    field_raw!("user_agent", take_quoted2);
+    field_raw!("ssl_cipher", take_anything);
+    field_raw!("ssl_protocol", take_anything);
+    field_raw!("target_group_arn", take_anything);
+    field_raw!("trace_id", take_quoted2);
+    field_raw!("domain_name", take_quoted2);
+    field_raw!("chosen_cert_arn", take_quoted2);
+    field!("matched_rule_priority", '0'..='9' | '-');
+    field!(
+        "request_creation_time",
+        '0'..='9' | '.' | '-' | ':' | 'T' | 'Z'
+    );
+    field_raw!("actions_executed", take_quoted2);
+    field_raw!("redirect_url", take_quoted2);
+    field_raw!("error_reason", take_quoted2);
+}
+
+fn take_quoted2(input: &str) -> SResult<()> {
+    delimited(tag(" \""), until_quote2, char('"'))(input)
+}
+
+fn until_quote2(input: &str) -> SResult<()> {
+    let mut skip_delimiter = false;
+    for (i, ch) in input.char_indices() {
+        if ch == '\\' && !skip_delimiter {
+            skip_delimiter = true;
+        } else if ch == '"' && !skip_delimiter {
+            return Ok((&input[i..], ()));
+        } else {
+            skip_delimiter = false;
+        }
+    }
+    Err(nom::Err::Incomplete(nom::Needed::Unknown))
+}
+
 #[cfg(test)]
 mod tests {
     use super::*;
diff --git a/src/remap/mod.rs b/src/remap/mod.rs
index 43fb90cf0..fd9addd9a 100644
--- a/src/remap/mod.rs
+++ b/src/remap/mod.rs
@@ -1,4 +1,4 @@
-pub(crate) mod function;
+pub mod function;
 
 pub use function::*;
 use lazy_static::lazy_static;

Results:

parse_aws_elb/nom       time:   [43.395 us 43.907 us 44.529 us]
parse_aws_elb/nom_raw   time:   [7.4265 us 7.5076 us 7.6099 us]
parse_aws_elb/regex     time:   [50.973 us 51.548 us 52.259 us]

parse_aws_elb/nom return remap::Value with allocated data
parse_aws_elb/nom_raw only consume same data as regex
parse_aws_elb/regex is regex proposed in #5365.

Data allocation should take the nearly same time for both nom/regex but parsing looks like ~7x faster.

Some fields can have "no value" as "-", should I use Value::Null in this case?

Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
@fanatid fanatid added the domain: vrl Anything related to the Vector Remap Language label Dec 11, 2020
@fanatid fanatid self-assigned this Dec 11, 2020
@fanatid fanatid changed the title enhancement(remap transform): add parse_aws_elb function [WIP] enhancement(remap transform): add parse_aws_elb function Dec 11, 2020
@fanatid fanatid marked this pull request as draft December 11, 2020 18:43
@fanatid fanatid changed the title [WIP] enhancement(remap transform): add parse_aws_elb function enhancement(remap transform): add parse_aws_elb function Dec 11, 2020
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! I appreciate the benchmarking to motivate the use of nom here.

I think mapping - to Null would be appropriate.

#[test]
fn parse_aws_elb() {
let logs = vec![
r#"http 2018-07-02T22:23:00.186641Z app/my-loadbalancer/50dc6c495c0c9188
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks for including all of their examples.

arguments: [
{
name: "value"
description: "Access log of the Application Load Balancer."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to call this parse_aws_alb (or even parse_aws_alb_log) instead then to leave room for parsing classic load balancer (ELB) logs (https://docs.aws.amazon.com/elasticloadbalancing/latest/classic/access-log-collection.html) and network load balancer (NLB) logs (https://docs.aws.amazon.com/elasticloadbalancing/latest/network/load-balancer-access-logs.html) in the future.

Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
@fanatid fanatid marked this pull request as ready for review December 16, 2020 18:47
@fanatid fanatid changed the title enhancement(remap transform): add parse_aws_elb function enhancement(remap transform): add parse_aws_alb_log function Dec 17, 2020
Copy link
Contributor

@StephenWakely StephenWakely left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Signed-off-by: Kirill Fomichev <fanatid@ya.ru>
Copy link
Member

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work!

@fanatid fanatid merged commit 6389e88 into master Dec 18, 2020
@fanatid fanatid deleted the remap-add-parse-aws-elb branch December 18, 2020 05:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain: vrl Anything related to the Vector Remap Language
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New parse_aws_elb_log Remap function
3 participants