Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multi-line matching needs documenting in more detail #410

Open
aspiers opened this issue Feb 1, 2017 · 5 comments
Open

multi-line matching needs documenting in more detail #410

aspiers opened this issue Feb 1, 2017 · 5 comments

Comments

@aspiers
Copy link
Contributor

aspiers commented Feb 1, 2017

I'm trying to write a format to parse a certain type of RabbitMQ log entry like the ones in this log file:

=WARNING REPORT==== 2-Jan-2017::21:08:18 ===
closing AMQP connection <0.6215.0> (10.76.4.136:55461 -> 10.76.4.109:5672):
connection_closed_abruptly

=WARNING REPORT==== 2-Jan-2017::21:08:19 ===
closing AMQP connection <0.6215.0> (10.76.4.138:55461 -> 10.76.4.109:5672):
connection_closed_abruptly

I've tried hundreds of different variations on:

{
  "rabbitmq": {
    "title": "RabbitMQ Log",
    "description": "RabbitMQ Log",
    "url": "",
    "multiline": true,
    "regex": {
      "abrupt_close": {
        "pattern" : "^=(?<level>[A-Z]+) REPORT==== (?<timestamp>\\d\\d?-\\w{3}-\\d{4}::\\d{2}:\\d{2}:\\d{2}) ===\\n(?<body>closing AMQP connection (?:.*?\\n)*?)$"
      }
    },
    "timestamp-format" : ["%e-%b-%Y::%H:%M:%S"],
    "level": {
      "error": "ERROR",
      "warning": "WARNING|SUPERVISOR",
      "info": "INFO|PROGRESS",
      "fatal": "CRASH"
    },
    "json": false,
    "sample": [
      {
        "line": "=WARNING REPORT==== 2-Jan-2017::21:08:16 ===\nclosing AMQP connection <0.7593.0> (10.76.4.138:43591 -> 10.76.4.109:5672):\nconnection_closed_abruptly\n"
      }
    ]
  }
}

and whilst some of them pass lnav -C, none of them are capable of parsing the testcase file above.

The sticking point seems to be what happens immediately after the final === of the first line. According to the source, lnav is not using PCRE_DOTALL or PCRE_MULTILINE in this context (which BTW I think would be well worth documenting.) So . does not match a newline, and $ means the same as \z.

OK, so if I put something like (?<body>(?:.*\\n)*?)$ then it will parse the file. But if I add \\n or even (?s:.) before the (?<body>..), or add something concrete like \\n or closing at the start of the <body> then it doesn't match. This makes absolutely no sense, unless lnav wasn't doing any multi-line matching, but then if it wasn't, how is it that the expression inside (?<body>...) has any effect?

I'm totally confused - any help gratefully received. Thanks!

--- Want to back this issue? **[Post a bounty on it!](https://www.bountysource.com/issues/41546939-multi-line-matching-needs-documenting-in-more-detail?utm_campaign=plugin&utm_content=tracker%2F449456&utm_medium=issues&utm_source=github)** We accept bounties via [Bountysource](https://www.bountysource.com/?utm_campaign=plugin&utm_content=tracker%2F449456&utm_medium=issues&utm_source=github).
@tstack
Copy link
Owner

tstack commented Feb 1, 2017

Sorry, multiline logs are not well supported in lnav. The sticking point, as you've kinda discovered, is that the log pattern is used in two different modes. When lnav is initially scanning logs to build the index, it does it a line at a time, so the pattern needs to match just a single line. That's why adding '\\n' causes the log to not be recognized, lnav is matching the pattern against:

=WARNING REPORT==== 2-Jan-2017::21:08:16 ===

It is not matching it against:

=WARNING REPORT==== 2-Jan-2017::21:08:16 ===
closing AMQP connection <0.7593.0> (10.76.4.138:43591 -> 10.76.4.109:5672):
connection_closed_abruptly\n

After the scan is done, the pattern will also be used to parse the full log message, which would include all lines in the message.

It's definitely confusing behavior and worth revisiting these decisions at some point. There are just not many log formats that I deal with that are multi-line, so it's never been a priority.

Are you able to move forward with the pattern set to something like the following?

"pattern" : "^=(?<level>[A-Z]+) REPORT==== (?<timestamp>\\d\\d?-\\w{3}-\\d{4}::\\d{2}:\\d{2}:\\d{2}) ===(?<body>(?:.|\\n)*)$"

@aspiers
Copy link
Contributor Author

aspiers commented Feb 2, 2017

Thanks very much for the reply and helpful explanation! The final pattern is almost exactly what I'm already using. Unfortunately it doesn't help me extract fields from the second line of the multiline log. But if the pattern needs to match both the multiline log and just the first line, maybe I can use some kind of trick like this?

"pattern" : "^=(?<level>[A-Z]+) REPORT==== (?<timestamp>\\d\\d?-\\w{3}-\\d{4}::\\d{2}:\\d{2}:\\d{2}) ===(?<body>...(?<field1>...)...(?<field2>...)...)?$"

where the final ? makes everything in (?<body>...) optional? I'll try that when I have time.

@aspiers
Copy link
Contributor Author

aspiers commented Feb 2, 2017

Also, it would be well worth documenting this if it isn't already (maybe I missed it).

@aspiers
Copy link
Contributor Author

aspiers commented Feb 3, 2017

OK, I got it working based on your hints and the idea in my previous comment! The key thing was to make everything after the first line (including the newline immediately after the first line) an optional part of the regexp. Thanks again!

{
    "rabbitmq": {
      "title": "RabbitMQ Log",
      "description": "RabbitMQ Log",
      "url": "",
      "multiline": true,
      "regex": {
        "connection": {
          "pattern" : "^=(?<level>[A-Z]+) REPORT==== (?<timestamp>\\d\\d?-\\w{3}-\\d{4}::\\d{2}:\\d{2}:\\d{2}) ===(\\n(?<body>(closing AMQP connection <(?<stuff>.+)> \\((?<sourceIP>\\
\d[\\d.]+\\d?):(?<sourcePort>\\d+) -> (?<destIP>\\d[\\d.]+\\d?):(?<destPort>\\d+)\\))?(.*\\n)*))?$"
        }
      },
      "timestamp-format" : ["%e-%b-%Y::%H:%M:%S"],
      "level": {
        "error": "ERROR",
        "warning": "WARNING|SUPERVISOR",
        "info": "INFO|PROGRESS",
        "fatal": "CRASH"
      },
      "value" : {
        "body" : {
          "kind" : "string",
          "identifier" : true
        },
        "sourceIP" : {
          "kind" : "string",
          "identifier" : true,
          "collate" : "ipaddress"
        },
        "sourcePort" : {
          "kind" : "integer",
          "identifier" : true
        },
        "destIP" : {
          "kind" : "string",
          "identifier" : true,
          "collate" : "ipaddress"
        },
        "destPort" : {
          "kind" : "integer",
          "identifier" : true
        }
      },
      "json": false,
      "sample": [
        {
          "line": "=ERROR REPORT==== 21-Apr-2016::21:19:01 ===\n** Node 'rabbit@red-hound-01' not responding **\n** Removing (timedout) connection **\n\n"
        },
        {
          "line": "=INFO REPORT==== 21-Apr-2016::21:19:01 ===\nrabbit on node 'rabbit@red-hound-01' down\n\n"
        },
        {
          "line": "=WARNING REPORT==== 2-Jan-2017::21:08:16 ===\nclosing AMQP connection <0.7593.0> (10.76.4.138:43591 -> 10.76.4.109:5672):\nconnection_closed_abruptly\n"
        }
      ]
    }
}

@aspiers aspiers changed the title bizarre multi-line matching bug? multi-line matching needs documenting in more detail Feb 3, 2017
@piotr-dobrogost
Copy link
Sponsor

piotr-dobrogost commented Oct 25, 2020

When lnav is initially scanning logs to build the index, it does it a line at a time, so the pattern needs to match just a single line.
(...)
After the scan is done, the pattern will also be used to parse the full log message, which would include all lines in the message.

Being able to specify two different patterns for these two different purposes would make so much sense.

Issues raised as a consequence of confusion as to multi-line matching:
#652
#816

nicolasbock added a commit to nicolasbock/lnav that referenced this issue Jun 9, 2021
Taken from tstack#410 (comment)

Co-Authored-By: Adam Spiers <adam@spiers.net>
Co-Authored-By: Nicolas Bock <nicolasbock@gmail.com>
Signed-off-by: Nicolas Bock <nicolasbock@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants